mmpose/configs/body_2d_keypoint/topdown_heatmap/coco/vitpose_coco.md

6.7 KiB

To utilize ViTPose, you'll need to have MMPreTrain. To install the required version, run the following command:

mim install 'mmpretrain>=1.0.0'
ViTPose (NeurIPS'2022)
@inproceedings{
  xu2022vitpose,
  title={Vi{TP}ose: Simple Vision Transformer Baselines for Human Pose Estimation},
  author={Yufei Xu and Jing Zhang and Qiming Zhang and Dacheng Tao},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022},
}
COCO-WholeBody (ECCV'2020)
@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

With classic decoder

Arch Input Size AP AP50 AP75 AR AR50 ckpt log
ViTPose-S 256x192 0.739 0.903 0.816 0.792 0.942 ckpt log
ViTPose-B 256x192 0.757 0.905 0.829 0.810 0.946 ckpt log
ViTPose-L 256x192 0.782 0.914 0.850 0.834 0.952 ckpt log
ViTPose-H 256x192 0.788 0.917 0.855 0.839 0.954 ckpt log
ViTPose-H* 256x192 0.790 0.916 0.857 0.840 0.953 ckpt -

Models with * are converted from the official repo. The config files of these models are only for validation.

With simple decoder

Arch Input Size AP AP50 AP75 AR AR50 ckpt log
ViTPose-S 256x192 0.736 0.900 0.811 0.790 0.940 ckpt log
ViTPose-B 256x192 0.756 0.906 0.826 0.809 0.946 ckpt log
ViTPose-L 256x192 0.780 0.914 0.851 0.833 0.952 ckpt log
ViTPose-H 256x192 0.789 0.916 0.856 0.839 0.953 ckpt log