mmpose/vitpose_humanart.md at main

16 KiB

Raw Permalink Blame History

To utilize ViTPose, you'll need to have MMPreTrain. To install the required version, run the following command:

mim install 'mmpretrain>=1.0.0'

ViTPose (NeurIPS'2022)

@inproceedings{
  xu2022vitpose,
  title={Vi{TP}ose: Simple Vision Transformer Baselines for Human Pose Estimation},
  author={Yufei Xu and Jing Zhang and Qiming Zhang and Dacheng Tao},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022},
}

COCO-WholeBody (ECCV'2020)

@inproceedings{jin2020whole,
  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

Human-Art (CVPR'2023)

@inproceedings{ju2023humanart,
    title={Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes},
    author={Ju, Xuan and Zeng, Ailing and Jianan, Wang and Qiang, Xu and Lei, Zhang},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
    year={2023}}

Results on Human-Art validation dataset with detector having human AP of 56.2 on Human-Art validation dataset

With classic decoder

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
ViTPose-S-coco	256x192	0.228	0.371	0.229	0.298	0.467	ckpt	log
ViTPose-S-humanart-coco	256x192	0.381	0.532	0.405	0.448	0.602	ckpt	log
ViTPose-B-coco	256x192	0.270	0.423	0.272	0.340	0.510	ckpt	log
ViTPose-B-humanart-coco	256x192	0.410	0.549	0.434	0.475	0.615	ckpt	log
ViTPose-L-coco	256x192	0.342	0.498	0.357	0.413	0.577	ckpt	log
ViTPose-L-humanart-coco	256x192	0.459	0.592	0.487	0.525	0.656	ckpt	log
ViTPose-H-coco	256x192	0.377	0.541	0.391	0.447	0.615	ckpt	log
ViTPose-H-humanart-coco	256x192	0.468	0.594	0.498	0.534	0.655	ckpt	log

Results on Human-Art validation dataset with ground-truth bounding-box

With classic decoder

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
ViTPose-S-coco	256x192	0.507	0.758	0.531	0.551	0.780	ckpt	log
ViTPose-S-humanart-coco	256x192	0.738	0.905	0.802	0.768	0.911	ckpt	log
ViTPose-B-coco	256x192	0.555	0.782	0.590	0.599	0.809	ckpt	log
ViTPose-B-humanart-coco	256x192	0.759	0.905	0.823	0.790	0.917	ckpt	log
ViTPose-L-coco	256x192	0.637	0.838	0.689	0.677	0.859	ckpt	log
ViTPose-L-humanart-coco	256x192	0.789	0.916	0.845	0.819	0.929	ckpt	log
ViTPose-H-coco	256x192	0.665	0.860	0.715	0.701	0.871	ckpt	log
ViTPose-H-humanart-coco	256x192	0.800	0.926	0.855	0.828	0.933	ckpt	log

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

With classic decoder

Arch	Input Size	AP	AP⁵⁰	AP⁷⁵	AR	AR⁵⁰	ckpt	log
ViTPose-S-coco	256x192	0.739	0.903	0.816	0.792	0.942	ckpt	log
ViTPose-S-humanart-coco	256x192	0.737	0.902	0.811	0.792	0.942	ckpt	log
ViTPose-B-coco	256x192	0.757	0.905	0.829	0.810	0.946	ckpt	log
ViTPose-B-humanart-coco	256x192	0.758	0.906	0.829	0.812	0.946	ckpt	log
ViTPose-L-coco	256x192	0.782	0.914	0.850	0.834	0.952	ckpt	log
ViTPose-L-humanart-coco	256x192	0.782	0.914	0.849	0.835	0.953	ckpt	log
ViTPose-H-coco	256x192	0.788	0.917	0.855	0.839	0.954	ckpt	log
ViTPose-H-humanart-coco	256x192	0.788	0.914	0.853	0.841	0.956	ckpt	log

16 KiB Raw Permalink Blame History

16 KiB

Raw Permalink Blame History