RETRACTED PAPER: Enhancing 3D human pose estimation through multi-feature fusion

Xianlei GE

gex@students.national-u.edu.ph
National University, College of Computing and Information Technologies, Huainan Normal University, School of Electronic Engineering, (Philippines)
https://orcid.org/0000-0002-9353-5199

Vladimir MARIANO


National University, College of Computing and Information Technologies (Philippines)
https://orcid.org/0009-0002-3444-3195

Abstract

3D human pose estimation (3D-HPE) has emerged as a prominent research area with diverse applications. This work focuses on enhancing the accuracy of 3D-HPE by proposing a two-stage model with a multi-feature fusion approach. The proposed model utilizes convolutional kernels of different sizes to extract feature maps with diverse resolutions and dimensions. These feature maps, along with the 2D coordinates of key joint points from the input frame, are fused in the first stage. In the second stage, the fused feature map is combined with the feature points of 2D key joints to jointly predict the key joints in 3D space. Experimental evaluations demonstrate the superiority of the proposed model over representative methods. It achieves significant improvements of 9.47% and 8.55% in average MPJPE and average P-MPJPE, respectively, which are critical metrics for evaluating pose estimation accuracy. The proposed two-stage model with multi-feature fusion offers a comprehensive and accurate approach to 3D-HPE. It outperforms existing methods and showcases its effectiveness in capturing the intricate details of human poses. The results validate the significance of the proposed model in advancing the field of 3D-HPE.

Supporting Agencies

This research was funded by the University Natural Science Foundation of Anhui Province (Grant no. 2022AH051578), Key Science Research Foundation of Huainan Normal University (Grant no. 2022XJZD019) and Guiding Science and Technology Foundation of Huainan (Grant no. 2020050).


Akshatha, K. R., Karunakar, A. K., Shenoy, S. B., Pai, A. K., Nagaraj, N. H., & Rohatgi, S. S. (2022). human detection in aerial thermal images using faster r-cnn and ssd algorithms. Electronics, 11(7), 1151. https://doi.org/10.3390/ELECTRONICS11071151
  Google Scholar

Andreella, A., & Finos, L. (2022). Procrustes Analysis for High-Dimensional Data. Psychometrika, 87, 1422– 1438. https://doi.org/10.1007/S11336-022-09859-5
  Google Scholar

Ben Gamra, M. B., & Akhloufi, M. A. (2021). A review of deep learning techniques for 2D and 3D human pose estimation. Image and Vision Computing, 114, 104282. https://doi.org/10.1016/J.IMAVIS.2021.104282
  Google Scholar

Burenius, M., Sullivan, J., & Carlsson, S. (2013). 3D pictorial structures for multiple view articulated pose estimation. 2013 IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3618–3625). IEEE. https://doi.org/10.1109/CVPR.2013.464
  Google Scholar

Chen, Y., Shen, C., Chen, H., Wei, X.- S., Liu, L., & Yang, J. (2020). Adversarial learning of structure-aware fully convolutional networks for landmark localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(7), (pp. 1654–1669). IEEE. https://doi.org/10.1109/TPAMI.2019.2901875
  Google Scholar

Dubey, S., & Dixit, M. (2023). A comprehensive survey on human pose estimation approaches. Multimedia Systems, 29, 167–195. https://doi.org/10.1007/s00530-022-00980-0
  Google Scholar

Duong, H.-T., & Nguyen-Thi, T.-A. (2021). A review: preprocessing techniques and data augmentation for sentiment analysis. Computational Social Networks, 8, 1. https://doi.org/10.1186/s40649-020-00080-x
  Google Scholar

Eichner, M., & Ferrari, V. (2009). Better appearance models for pictorial structures. British Machine Vision Conference, BMVC 2009 - Proceedings. https://doi.org/10.5244/C.23.3
  Google Scholar

Fischler, M. A., & Elschlager, R. A. (1973). The Representation and Matching of Pictorial Structures Representation. IEEE Transactions on Computers, C–22(1), 67–92. https://doi.org/10.1109/TC.1973.223602
  Google Scholar

Ganapathi, V., Plagemann, C., Koller, D., & Thrun, S. (2012). Real-time human pose tracking from range data. In Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds). Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, (pp. 738–751). Springer. https://doi.org/10.1007/978- 3-642-33783-3_53
  Google Scholar

Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J., & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354-377. https://doi.org/10.1016/j.patcog.2017.10.013
  Google Scholar

Haring, M., Grotli, E. I., Riemer-Sorensen, S., Seel, K., & Hanssen, K. G. (2022). A Levenberg-Marquardt Algorithm for Sparse Identification of Dynamical Systems. IEEE Transactions on Neural Networks and Learning Systems, 1-14, https://doi.org/10.1109/TNNLS.2022.3157963
  Google Scholar

Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2014). Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1325–1339. https://doi.org/10.1109/TPAMI.2013.248
  Google Scholar

Lee, K., Lee, I., & Lee, S. (2018). Propagating LSTM: 3D pose estimation based on joint interdependency. In Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, (vol. 11211, pp. 123–141). Springer. https://doi.org/10.1007/978-3- 030-01234-2_8
  Google Scholar

Lin, T.- Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds), Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, (vol. 8693, pp. 740– 755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48
  Google Scholar

Liu, L., Blancaflor, E. B., & Abisado, M. (2023). A lightweight multi-person pose estimation scheme based on jetson nano. Applied Computer Science, 19(1) 1-14. https://doi.org/10.35784/acs-2023-01
  Google Scholar

Liu, R., Shen, J., Wang, H., Chen, C., Cheung, S. C., & Asari, V. (2020). Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5063–5072. https://doi.org/10.1109/CVPR42600.2020.00511
  Google Scholar

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
  Google Scholar

Martinez, J., Hossain, R., Romero, J., & Little, J. J. (2017). A simple yet effective baseline for 3d human pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV), (pp. 2659–2668). IEEE. https://doi.org/10.1109/ICCV.2017.288
  Google Scholar

Mehni, M. B., & Mehni, M. B. (2023). Reliability analysis with cross-entropy based adaptive markov chain importance sampling and control variates. Reliability Engineering & System Safety, 231, 109014. https://doi.org/10.1016/J.RESS.2022.109014
  Google Scholar

Mujhid, A., Surono, S., Irsalinda, N., & Thobirin, A. (2022). Comparison and combination of leaky ReLU and ReLU activation function and three optimizers on deep CNN for COVID-19 detection. Frontiers in Artificial Intelligence and Applications, 358, 50-57. https://doi.org/10.3233/FAIA220369
  Google Scholar

Şengöz, N., Yigit, T., Özmen, Ö., Isik, A.- H. (2022). importance of preprocessing in histopathology image classification using deep convolutional neural network. Advances in Artificial Intelligence Research, 2(1), 1–6. https://doi.org/10.54569/aair.1016544
  Google Scholar

Remelli, E., Han, S., Honari, S., Fua, P., & Wang, R. (2020). Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 6039–6048). IEEE. https://doi.org/10.1109/CVPR42600.2020.00608
  Google Scholar

Rohrbach, A., Torabi, A., Rohrbach, M., Tandon, N., Pal, C., Larochelle, H., Courville, A., & Schiele, B. (2017). Movie description. International Journal of Computer Vision, 123(1), 94–120. https://doi.org/10.1007/S11263-016-0987-1
  Google Scholar

Rong, Y., Shiratori, T., & Joo, H. (2021). FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), (pp. 1749–1759). IEEE. https://doi.org/10.1109/ICCVW54120.2021.00201
  Google Scholar

Sigal, L., Bhatia, S., Roth, S., Black, M. J., & Isard, M. (2004). Tracking loose-limbed people. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (pp. 1-1). IEEE. https://doi.org/10.1109/CVPR.2004.1315063
  Google Scholar

Surasak, T., Takahiro, I., Cheng, C. H., Wang, C. -e., & Sheng, P. Y. (2018). Histogram of oriented gradients for human detection in video. 2018 5th International Conference on Business and Industrial Research (ICBIR), (pp. 172–176). IEEE. https://doi.org/10.1109/ICBIR.2018.8391187
  Google Scholar

Wang, J., Tan, S., Zhen, X., Xu, S., Zheng, F., He, Z., & Shao, L. (2021). Deep 3D human pose estimation: A review. Computer Vision and Image Understanding, 210, 103225. https://doi.org/10.1016/j.cviu.2021.103225
  Google Scholar

Wang, S., Huang, K., Chen, Z., & Zhang, W. (2023). Survey on 3D Human Pose Estimation of Deep Learning. Journal of Frontiers of Computer Science and Technology, 17(1), 74–87. https://doi.org/10.3778/j.issn.1673-9418.2205070
  Google Scholar

Wang, Z., Jiang, M., Hu, Y., & Li, H. (2012). An incremental learning method based on probabilistic neural networks and adjustable fuzzy clustering for human activity recognition by using wearable sensors. IEEE Transactions on Information Technology in Biomedicine, 16(4), 691–699. https://doi.org/10.1109/TITB.2012.2196440
  Google Scholar

Yang, Y., & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. CVPR 2011, Colorado Springs, (pp. 1385–1392). IEEE. https://doi.org/10.1109/CVPR.2011.5995741
  Google Scholar

Zhang, J., Tu, Z., Yang, J., Chen, Y., & Yuan, J. (2022). MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp.13222–13232). IEEE. https://doi.org/10.1109/CVPR52688.2022.01288
  Google Scholar

Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., & He, Q. (2021). A Comprehensive Survey on Transfer Learning. IEEE, 109(1) 43–76. https://doi.org/10.1109/JPROC.2020.3004555
  Google Scholar


Published
2023-09-30

Cited by

GE, X., & MARIANO, V. (2023). RETRACTED PAPER: Enhancing 3D human pose estimation through multi-feature fusion. Applied Computer Science, 19(3), 116–132. https://doi.org/10.35784/acs-2023-28

Authors

Xianlei GE 
gex@students.national-u.edu.ph
National University, College of Computing and Information Technologies, Huainan Normal University, School of Electronic Engineering, Philippines
https://orcid.org/0000-0002-9353-5199

 

 


Authors

Vladimir MARIANO 

National University, College of Computing and Information Technologies Philippines
https://orcid.org/0009-0002-3444-3195

 

 



Statistics

Abstract views: 75
PDF downloads: 28


License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.