RETRACTED PAPER: Enhancing 3D human pose estimation through multi-feature fusion
Xianlei GE
gex@students.national-u.edu.phNational University, College of Computing and Information Technologies, Huainan Normal University, School of Electronic Engineering, (Philippines)
https://orcid.org/0000-0002-9353-5199
Vladimir MARIANO
National University, College of Computing and Information Technologies (Philippines)
https://orcid.org/0009-0002-3444-3195
Abstract
3D human pose estimation (3D-HPE) has emerged as a prominent research area with diverse applications. This work focuses on enhancing the accuracy of 3D-HPE by proposing a two-stage model with a multi-feature fusion approach. The proposed model utilizes convolutional kernels of different sizes to extract feature maps with diverse resolutions and dimensions. These feature maps, along with the 2D coordinates of key joint points from the input frame, are fused in the first stage. In the second stage, the fused feature map is combined with the feature points of 2D key joints to jointly predict the key joints in 3D space. Experimental evaluations demonstrate the superiority of the proposed model over representative methods. It achieves significant improvements of 9.47% and 8.55% in average MPJPE and average P-MPJPE, respectively, which are critical metrics for evaluating pose estimation accuracy. The proposed two-stage model with multi-feature fusion offers a comprehensive and accurate approach to 3D-HPE. It outperforms existing methods and showcases its effectiveness in capturing the intricate details of human poses. The results validate the significance of the proposed model in advancing the field of 3D-HPE.
Supporting Agencies
References
Akshatha, K. R., Karunakar, A. K., Shenoy, S. B., Pai, A. K., Nagaraj, N. H., & Rohatgi, S. S. (2022). human detection in aerial thermal images using faster r-cnn and ssd algorithms. Electronics, 11(7), 1151. https://doi.org/10.3390/ELECTRONICS11071151
Google Scholar
Andreella, A., & Finos, L. (2022). Procrustes Analysis for High-Dimensional Data. Psychometrika, 87, 1422– 1438. https://doi.org/10.1007/S11336-022-09859-5
Google Scholar
Ben Gamra, M. B., & Akhloufi, M. A. (2021). A review of deep learning techniques for 2D and 3D human pose estimation. Image and Vision Computing, 114, 104282. https://doi.org/10.1016/J.IMAVIS.2021.104282
Google Scholar
Burenius, M., Sullivan, J., & Carlsson, S. (2013). 3D pictorial structures for multiple view articulated pose estimation. 2013 IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3618–3625). IEEE. https://doi.org/10.1109/CVPR.2013.464
Google Scholar
Chen, Y., Shen, C., Chen, H., Wei, X.- S., Liu, L., & Yang, J. (2020). Adversarial learning of structure-aware fully convolutional networks for landmark localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(7), (pp. 1654–1669). IEEE. https://doi.org/10.1109/TPAMI.2019.2901875
Google Scholar
Dubey, S., & Dixit, M. (2023). A comprehensive survey on human pose estimation approaches. Multimedia Systems, 29, 167–195. https://doi.org/10.1007/s00530-022-00980-0
Google Scholar
Duong, H.-T., & Nguyen-Thi, T.-A. (2021). A review: preprocessing techniques and data augmentation for sentiment analysis. Computational Social Networks, 8, 1. https://doi.org/10.1186/s40649-020-00080-x
Google Scholar
Eichner, M., & Ferrari, V. (2009). Better appearance models for pictorial structures. British Machine Vision Conference, BMVC 2009 - Proceedings. https://doi.org/10.5244/C.23.3
Google Scholar
Fischler, M. A., & Elschlager, R. A. (1973). The Representation and Matching of Pictorial Structures Representation. IEEE Transactions on Computers, C–22(1), 67–92. https://doi.org/10.1109/TC.1973.223602
Google Scholar
Ganapathi, V., Plagemann, C., Koller, D., & Thrun, S. (2012). Real-time human pose tracking from range data. In Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds). Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, (pp. 738–751). Springer. https://doi.org/10.1007/978- 3-642-33783-3_53
Google Scholar
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J., & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354-377. https://doi.org/10.1016/j.patcog.2017.10.013
Google Scholar
Haring, M., Grotli, E. I., Riemer-Sorensen, S., Seel, K., & Hanssen, K. G. (2022). A Levenberg-Marquardt Algorithm for Sparse Identification of Dynamical Systems. IEEE Transactions on Neural Networks and Learning Systems, 1-14, https://doi.org/10.1109/TNNLS.2022.3157963
Google Scholar
Ionescu, C., Papava, D., Olaru, V., & Sminchisescu, C. (2014). Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1325–1339. https://doi.org/10.1109/TPAMI.2013.248
Google Scholar
Lee, K., Lee, I., & Lee, S. (2018). Propagating LSTM: 3D pose estimation based on joint interdependency. In Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, (vol. 11211, pp. 123–141). Springer. https://doi.org/10.1007/978-3- 030-01234-2_8
Google Scholar
Lin, T.- Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds), Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, (vol. 8693, pp. 740– 755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48
Google Scholar
Liu, L., Blancaflor, E. B., & Abisado, M. (2023). A lightweight multi-person pose estimation scheme based on jetson nano. Applied Computer Science, 19(1) 1-14. https://doi.org/10.35784/acs-2023-01
Google Scholar
Liu, R., Shen, J., Wang, H., Chen, C., Cheung, S. C., & Asari, V. (2020). Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5063–5072. https://doi.org/10.1109/CVPR42600.2020.00511
Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Google Scholar
Martinez, J., Hossain, R., Romero, J., & Little, J. J. (2017). A simple yet effective baseline for 3d human pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV), (pp. 2659–2668). IEEE. https://doi.org/10.1109/ICCV.2017.288
Google Scholar
Mehni, M. B., & Mehni, M. B. (2023). Reliability analysis with cross-entropy based adaptive markov chain importance sampling and control variates. Reliability Engineering & System Safety, 231, 109014. https://doi.org/10.1016/J.RESS.2022.109014
Google Scholar
Mujhid, A., Surono, S., Irsalinda, N., & Thobirin, A. (2022). Comparison and combination of leaky ReLU and ReLU activation function and three optimizers on deep CNN for COVID-19 detection. Frontiers in Artificial Intelligence and Applications, 358, 50-57. https://doi.org/10.3233/FAIA220369
Google Scholar
Şengöz, N., Yigit, T., Özmen, Ö., Isik, A.- H. (2022). importance of preprocessing in histopathology image classification using deep convolutional neural network. Advances in Artificial Intelligence Research, 2(1), 1–6. https://doi.org/10.54569/aair.1016544
Google Scholar
Remelli, E., Han, S., Honari, S., Fua, P., & Wang, R. (2020). Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 6039–6048). IEEE. https://doi.org/10.1109/CVPR42600.2020.00608
Google Scholar
Rohrbach, A., Torabi, A., Rohrbach, M., Tandon, N., Pal, C., Larochelle, H., Courville, A., & Schiele, B. (2017). Movie description. International Journal of Computer Vision, 123(1), 94–120. https://doi.org/10.1007/S11263-016-0987-1
Google Scholar
Rong, Y., Shiratori, T., & Joo, H. (2021). FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), (pp. 1749–1759). IEEE. https://doi.org/10.1109/ICCVW54120.2021.00201
Google Scholar
Sigal, L., Bhatia, S., Roth, S., Black, M. J., & Isard, M. (2004). Tracking loose-limbed people. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (pp. 1-1). IEEE. https://doi.org/10.1109/CVPR.2004.1315063
Google Scholar
Surasak, T., Takahiro, I., Cheng, C. H., Wang, C. -e., & Sheng, P. Y. (2018). Histogram of oriented gradients for human detection in video. 2018 5th International Conference on Business and Industrial Research (ICBIR), (pp. 172–176). IEEE. https://doi.org/10.1109/ICBIR.2018.8391187
Google Scholar
Wang, J., Tan, S., Zhen, X., Xu, S., Zheng, F., He, Z., & Shao, L. (2021). Deep 3D human pose estimation: A review. Computer Vision and Image Understanding, 210, 103225. https://doi.org/10.1016/j.cviu.2021.103225
Google Scholar
Wang, S., Huang, K., Chen, Z., & Zhang, W. (2023). Survey on 3D Human Pose Estimation of Deep Learning. Journal of Frontiers of Computer Science and Technology, 17(1), 74–87. https://doi.org/10.3778/j.issn.1673-9418.2205070
Google Scholar
Wang, Z., Jiang, M., Hu, Y., & Li, H. (2012). An incremental learning method based on probabilistic neural networks and adjustable fuzzy clustering for human activity recognition by using wearable sensors. IEEE Transactions on Information Technology in Biomedicine, 16(4), 691–699. https://doi.org/10.1109/TITB.2012.2196440
Google Scholar
Yang, Y., & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. CVPR 2011, Colorado Springs, (pp. 1385–1392). IEEE. https://doi.org/10.1109/CVPR.2011.5995741
Google Scholar
Zhang, J., Tu, Z., Yang, J., Chen, Y., & Yuan, J. (2022). MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (pp.13222–13232). IEEE. https://doi.org/10.1109/CVPR52688.2022.01288
Google Scholar
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., & He, Q. (2021). A Comprehensive Survey on Transfer Learning. IEEE, 109(1) 43–76. https://doi.org/10.1109/JPROC.2020.3004555
Google Scholar
Authors
Xianlei GEgex@students.national-u.edu.ph
National University, College of Computing and Information Technologies, Huainan Normal University, School of Electronic Engineering, Philippines
https://orcid.org/0000-0002-9353-5199
Authors
Vladimir MARIANONational University, College of Computing and Information Technologies Philippines
https://orcid.org/0009-0002-3444-3195
Statistics
Abstract views: 75PDF downloads: 28
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.