mirror of https://github.com/open-mmlab/mmpose
6.7 KiB
6.7 KiB
To utilize ViTPose, you'll need to have MMPreTrain. To install the required version, run the following command:
mim install 'mmpretrain>=1.0.0'
ViTPose (NeurIPS'2022)
@inproceedings{
xu2022vitpose,
title={Vi{TP}ose: Simple Vision Transformer Baselines for Human Pose Estimation},
author={Yufei Xu and Jing Zhang and Qiming Zhang and Dacheng Tao},
booktitle={Advances in Neural Information Processing Systems},
year={2022},
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
title={Whole-Body Human Pose Estimation in the Wild},
author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}
}
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
With classic decoder
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
ViTPose-S | 256x192 | 0.739 | 0.903 | 0.816 | 0.792 | 0.942 | ckpt | log |
ViTPose-B | 256x192 | 0.757 | 0.905 | 0.829 | 0.810 | 0.946 | ckpt | log |
ViTPose-L | 256x192 | 0.782 | 0.914 | 0.850 | 0.834 | 0.952 | ckpt | log |
ViTPose-H | 256x192 | 0.788 | 0.917 | 0.855 | 0.839 | 0.954 | ckpt | log |
ViTPose-H* | 256x192 | 0.790 | 0.916 | 0.857 | 0.840 | 0.953 | ckpt | - |
Models with * are converted from the official repo. The config files of these models are only for validation.
With simple decoder
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
ViTPose-S | 256x192 | 0.736 | 0.900 | 0.811 | 0.790 | 0.940 | ckpt | log |
ViTPose-B | 256x192 | 0.756 | 0.906 | 0.826 | 0.809 | 0.946 | ckpt | log |
ViTPose-L | 256x192 | 0.780 | 0.914 | 0.851 | 0.833 | 0.952 | ckpt | log |
ViTPose-H | 256x192 | 0.789 | 0.916 | 0.856 | 0.839 | 0.953 | ckpt | log |