RETRACTED PAPER: Enhancing 3D human pose estimation through multi-feature fusion

Xianlei GE; Vladimir MARIANO

doi:10.35784/acs-2023-28

Open full text

RETRACTED PDF

Published: Sep 30, 2023

DOI: https://doi.org/10.35784/acs-2023-28

DOI

https://doi.org/10.35784/acs-2023-28

Authors

Xianlei GE

gex@students.national-u.edu.ph

National University, College of Computing and Information Technologies, Huainan Normal University, School of Electronic Engineering,

https://orcid.org/0000-0002-9353-5199

Vladimir MARIANO

vymariano@national-u.edu.ph

National University, College of Computing and Information Technologies

https://orcid.org/0009-0002-3444-3195

Abstract

3D human pose estimation (3D-HPE) has emerged as a prominent research area with diverse applications. This work focuses on enhancing the accuracy of 3D-HPE by proposing a two-stage model with a multi-feature fusion approach. The proposed model utilizes convolutional kernels of different sizes to extract feature maps with diverse resolutions and dimensions. These feature maps, along with the 2D coordinates of key joint points from the input frame, are fused in the first stage. In the second stage, the fused feature map is combined with the feature points of 2D key joints to jointly predict the key joints in 3D space. Experimental evaluations demonstrate the superiority of the proposed model over representative methods. It achieves significant improvements of 9.47% and 8.55% in average MPJPE and average P-MPJPE, respectively, which are critical metrics for evaluating pose estimation accuracy. The proposed two-stage model with multi-feature fusion offers a comprehensive and accurate approach to 3D-HPE. It outperforms existing methods and showcases its effectiveness in capturing the intricate details of human poses. The results validate the significance of the proposed model in advancing the field of 3D-HPE.

References

Akshatha, K. R., Karunakar, A. K., Shenoy, S. B., Pai, A. K., Nagaraj, N. H., & Rohatgi, S. S. (2022). human detection in aerial thermal images using faster r-cnn and ssd algorithms. Electronics, 11(7), 1151. https://doi.org/10.3390/ELECTRONICS11071151

Andreella, A., & Finos, L. (2022). Procrustes Analysis for High-Dimensional Data. Psychometrika, 87, 1422– 1438. https://doi.org/10.1007/S11336-022-09859-5

Ben Gamra, M. B., & Akhloufi, M. A. (2021). A review of deep learning techniques for 2D and 3D human pose estimation. Image and Vision Computing, 114, 104282. https://doi.org/10.1016/J.IMAVIS.2021.104282

Burenius, M., Sullivan, J., & Carlsson, S. (2013). 3D pictorial structures for multiple view articulated pose estimation. 2013 IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3618–3625). IEEE. https://doi.org/10.1109/CVPR.2013.464

Chen, Y., Shen, C., Chen, H., Wei, X.- S., Liu, L., & Yang, J. (2020). Adversarial learning of structure-aware fully convolutional networks for landmark localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(7), (pp. 1654–1669). IEEE. https://doi.org/10.1109/TPAMI.2019.2901875

Dubey, S., & Dixit, M. (2023). A comprehensive survey on human pose estimation approaches. Multimedia Systems, 29, 167–195. https://doi.org/10.1007/s00530-022-00980-0

Duong, H.-T., & Nguyen-Thi, T.-A. (2021). A review: preprocessing techniques and data augmentation for sentiment analysis. Computational Social Networks, 8, 1. https://doi.org/10.1186/s40649-020-00080-x

Eichner, M., & Ferrari, V. (2009). Better appearance models for pictorial structures. British Machine Vision Conference, BMVC 2009 - Proceedings. https://doi.org/10.5244/C.23.3

Fischler, M. A., & Elschlager, R. A. (1973). The Representation and Matching of Pictorial Structures Representation. IEEE Transactions on Computers, C–22(1), 67–92. https://doi.org/10.1109/TC.1973.223602

Ganapathi, V., Plagemann, C., Koller, D., & Thrun, S. (2012). Real-time human pose tracking from range data. In Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds). Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, (pp. 738–751). Springer. https://doi.org/10.1007/978- 3-642-33783-3_53

Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J., & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354-377. https://doi.org/10.1016/j.patcog.2017.10.013

Haring, M., Grotli, E. I., Riemer-Sorensen, S., Seel, K., & Hanssen, K. G. (2022). A Levenberg-Marquardt Algorithm for Sparse Identification of Dynamical Systems. IEEE Transactions on Neural Networks and Learning Systems, 1-14, https://doi.org/10.1109/TNNLS.2022.3157963

Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2014). Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1325–1339. https://doi.org/10.1109/TPAMI.2013.248

Lee, K., Lee, I., & Lee, S. (2018). Propagating LSTM: 3D pose estimation based on joint interdependency. In Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, (vol. 11211, pp. 123–141). Springer. https://doi.org/10.1007/978-3- 030-01234-2_8

Lin, T.- Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds), Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, (vol. 8693, pp. 740– 755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48

Liu, L., Blancaflor, E. B., & Abisado, M. (2023). A lightweight multi-person pose estimation scheme based on jetson nano. Applied Computer Science, 19(1) 1-14. https://doi.org/10.35784/acs-2023-01

Liu, R., Shen, J., Wang, H., Chen, C., Cheung, S. C., & Asari, V. (2020). Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5063–5072. https://doi.org/10.1109/CVPR42600.2020.00511

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

Martinez, J., Hossain, R., Romero, J., & Little, J. J. (2017). A simple yet effective baseline for 3d human pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV), (pp. 2659–2668). IEEE. https://doi.org/10.1109/ICCV.2017.288

Mehni, M. B., & Mehni, M. B. (2023). Reliability analysis with cross-entropy based adaptive markov chain importance sampling and control variates. Reliability Engineering & System Safety, 231, 109014. https://doi.org/10.1016/J.RESS.2022.109014

Mujhid, A., Surono, S., Irsalinda, N., & Thobirin, A. (2022). Comparison and combination of leaky ReLU and ReLU activation function and three optimizers on deep CNN for COVID-19 detection. Frontiers in Artificial Intelligence and Applications, 358, 50-57. https://doi.org/10.3233/FAIA220369

Şengöz, N., Yigit, T., Özmen, Ö., Isik, A.- H. (2022). importance of preprocessing in histopathology image classification using deep convolutional neural network. Advances in Artificial Intelligence Research, 2(1), 1–6. https://doi.org/10.54569/aair.1016544

Remelli, E., Han, S., Honari, S., Fua, P., & Wang, R. (2020). Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 6039–6048). IEEE. https://doi.org/10.1109/CVPR42600.2020.00608

Rohrbach, A., Torabi, A., Rohrbach, M., Tandon, N., Pal, C., Larochelle, H., Courville, A., & Schiele, B. (2017). Movie description. International Journal of Computer Vision, 123(1), 94–120. https://doi.org/10.1007/S11263-016-0987-1

Rong, Y., Shiratori, T., & Joo, H. (2021). FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), (pp. 1749–1759). IEEE. https://doi.org/10.1109/ICCVW54120.2021.00201

Sigal, L., Bhatia, S., Roth, S., Black, M. J., & Isard, M. (2004). Tracking loose-limbed people. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (pp. 1-1). IEEE. https://doi.org/10.1109/CVPR.2004.1315063

Surasak, T., Takahiro, I., Cheng, C. H., Wang, C. -e., & Sheng, P. Y. (2018). Histogram of oriented gradients for human detection in video. 2018 5th International Conference on Business and Industrial Research (ICBIR), (pp. 172–176). IEEE. https://doi.org/10.1109/ICBIR.2018.8391187

Wang, J., Tan, S., Zhen, X., Xu, S., Zheng, F., He, Z., & Shao, L. (2021). Deep 3D human pose estimation: A review. Computer Vision and Image Understanding, 210, 103225. https://doi.org/10.1016/j.cviu.2021.103225

Wang, S., Huang, K., Chen, Z., & Zhang, W. (2023). Survey on 3D Human Pose Estimation of Deep Learning. Journal of Frontiers of Computer Science and Technology, 17(1), 74–87. https://doi.org/10.3778/j.issn.1673-9418.2205070

Wang, Z., Jiang, M., Hu, Y., & Li, H. (2012). An incremental learning method based on probabilistic neural networks and adjustable fuzzy clustering for human activity recognition by using wearable sensors. IEEE Transactions on Information Technology in Biomedicine, 16(4), 691–699. https://doi.org/10.1109/TITB.2012.2196440

Yang, Y., & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. CVPR 2011, Colorado Springs, (pp. 1385–1392). IEEE. https://doi.org/10.1109/CVPR.2011.5995741

Zhang, J., Tu, Z., Yang, J., Chen, Y., & Yuan, J. (2022). MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp.13222–13232). IEEE. https://doi.org/10.1109/CVPR52688.2022.01288

Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., & He, Q. (2021). A Comprehensive Survey on Transfer Learning. IEEE, 109(1) 43–76. https://doi.org/10.1109/JPROC.2020.3004555

GE, X., & MARIANO, V. (2023). RETRACTED PAPER: Enhancing 3D human pose estimation through multi-feature fusion. Applied Computer Science, 19(3), 116–132. https://doi.org/10.35784/acs-2023-28

Author Biographies

RETRACTED PAPER: Enhancing 3D human pose estimation through multi-feature fusion

DOI

Authors

Abstract

References

License

Xianlei GE, National University, College of Computing and Information Technologies, Huainan Normal University, School of Electronic Engineering,

Vladimir MARIANO, National University, College of Computing and Information Technologies

Article Sidebar

Main Article Content

DOI

Authors

Abstract

References

Article Details

License

Xianlei GE, National University, College of Computing and Information Technologies, Huainan Normal University, School of Electronic Engineering,

Vladimir MARIANO, National University, College of Computing and Information Technologies