mmpose/vitpose_coco.md at main

6.7 KiB

Raw Permalink Blame History

To utilize ViTPose, you'll need to have MMPreTrain. To install the required version, run the following command:

mim install 'mmpretrain>=1.0.0'

ViTPose (NeurIPS'2022)

@inproceedings{
  xu2022vitpose,
  title={Vi{TP}ose: Simple Vision Transformer Baselines for Human Pose Estimation},
  author={Yufei Xu and Jing Zhang and Qiming Zhang and Dacheng Tao},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022},
}

COCO-WholeBody (ECCV'2020)

@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

With classic decoder

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
ViTPose-S	256x192	0.739	0.903	0.816	0.792	0.942	ckpt	log
ViTPose-B	256x192	0.757	0.905	0.829	0.810	0.946	ckpt	log
ViTPose-L	256x192	0.782	0.914	0.850	0.834	0.952	ckpt	log
ViTPose-H	256x192	0.788	0.917	0.855	0.839	0.954	ckpt	log
ViTPose-H*	256x192	0.790	0.916	0.857	0.840	0.953	ckpt	-

Models with * are converted from the official repo. The config files of these models are only for validation.

With simple decoder

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
ViTPose-S	256x192	0.736	0.900	0.811	0.790	0.940	ckpt	log
ViTPose-B	256x192	0.756	0.906	0.826	0.809	0.946	ckpt	log
ViTPose-L	256x192	0.780	0.914	0.851	0.833	0.952	ckpt	log
ViTPose-H	256x192	0.789	0.916	0.856	0.839	0.953	ckpt	log

6.7 KiB Raw Permalink Blame History

6.7 KiB

Raw Permalink Blame History