FEW-SHOT LEARNING WITH PRE-TRAINED LAYERS INTEGRATION APPLIED TO HAND GESTURE RECOGNITION FOR DISABLED PEOPLE

Mohamed ELBAHRI; Nasreddine TALEB; Sid Ahmed El Mehdi ARDJOUN; Chakib Mustapha Anouar ZOUAOUI

doi:10.35784/acs-2024-13

FEW-SHOT LEARNING WITH PRE-TRAINED LAYERS INTEGRATION APPLIED TO HAND GESTURE RECOGNITION FOR DISABLED PEOPLE

Article Sidebar

Open full text

PDF

Published: Jun 30, 2024

DOI: https://doi.org/10.35784/acs-2024-13

Main Article Content

DOI

https://doi.org/10.35784/acs-2024-13

Authors

Mohamed ELBAHRI

elbahri82_m@yahoo.fr

Djillali Liabes University

https://orcid.org/0000-0001-5361-1567

Nasreddine TALEB

ne_taleb@univ-sba.sz

Sid Ahmed El Mehdi ARDJOUN

elmehdi.ardjoun@univ-sba.dz

Chakib Mustapha Anouar ZOUAOUI

chakib@ipatdz.info

Abstract

Employing vision-based hand gesture recognition for the interaction and communication of disabled individuals is highly beneficial. The hands and gestures of this category of people have a distinctive aspect, requiring the adaptation of a deep learning vision-based system with a dedicated dataset for each individual. To achieve this objective, the paper presents a novel approach for training gesture classification using few-shot samples. More specifically, the gesture classifiers are fine-tuned segments of a pre-trained deep network. The global framework consists of two modules. The first one is a base feature learner and a hand detector trained with normal people hand’s images; this module results in a hand detector ad hoc model. The second module is a learner sub-classifier; it is the leverage of the convolution layers of the hand detector feature extractor. It builds a shallow CNN trained with few-shot samples for gesture classification. The proposed approach enables the reuse of segments of a pre-trained feature extractor to build a new sub-classification model. The results obtained by varying the size of the training dataset have demonstrated the efficiency of our method compared to the ones of the literature.

Keywords:

CNN segmentation, few-shot learning, hand gesture, disabled people

References

Bambach, S., Lee, S., Crandall, D. J., & Yu, C. (2015). Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. 2015 IEEE International Conference on Computer Vision (ICCV) (pp. 1949–1957). IEEE. https://doi.org/10.1109/ICCV.2015.226 DOI: https://doi.org/10.1109/ICCV.2015.226

Bandini, A., & Zariffa, J. (2020). Analysis of the hands in egocentric vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 6846-6866. https://doi.org/10.1109/TPAMI.2020.2986648 DOI: https://doi.org/10.1109/TPAMI.2020.2986648

Bao, P., Maqueda, A. I., del-Blanco, C. R., & García, N. (2017). Tiny hand gesture recognition without localization via a deep convolutional network. IEEE Transactions on Consumer Electronics, 63(3), 251–257. https://doi.org/10.1109/TCE.2017.014971 DOI: https://doi.org/10.1109/TCE.2017.014971

Barczak, A. L. C., Reyes, N. H., Abastillas, M., Piccio, A., & Susnjak, T. (2011). A new 2D static hand gesture colour image dataset for ASL gestures. Research Letters in the Information and Mathematical Sciences, 15.

El Moataz, A., Mammass, D., Mansouri, A., & Nouboud, F. (Eds.). (2020). Image and Signal Processing. 9th International Conference (ICISP 2020). Springer International Publishing. https://doi.org/10.1007/978-3-030-51935-3 DOI: https://doi.org/10.1007/978-3-030-51935-3

Bhaumik, G., Verma, M., Govil, M. C., & Vipparthi, S. K. (2022). ExtriDeNet: An intensive feature extrication deep network for hand gesture recognition. The Visual Computer, 38(11), 3853–3866. https://doi.org/10.1007/s00371-021-02225-z DOI: https://doi.org/10.1007/s00371-021-02225-z

Chattoraj, S., Karan, V., Tanmay, P., (2017). Assistive system for physically disabled people using gesture recognition. 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP) (pp. 60–65). IEEE. https://doi.org/10.1109/SIPROCESS.2017.8124506 DOI: https://doi.org/10.1109/SIPROCESS.2017.8124506

Damaneh, M. M., Mohanna, F., & Jafari, P. (2023). Static hand gesture recognition in sign language based on convolutional neural network with feature extraction method using ORB descriptor and Gabor filter. Expert Systems with Applications, 211, 118559. https://doi.org/10.1016/j.eswa.2022.118559 DOI: https://doi.org/10.1016/j.eswa.2022.118559

Dardas, N. H., & Georganas, N. D. (2011). Real-Time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Transactions on Instrumentation and Measurement, 60(11), 3592–3607. https://doi.org/10.1109/TIM.2011.2161140 DOI: https://doi.org/10.1109/TIM.2011.2161140

Deng, X., Zhang, Y., Yang, S., Tan, P., Chang, L., Yuan, Y., & Wang, H. (2018). Joint hand detection and rotation estimation using CNN. IEEE Transactions on Image Processing, 27(4), 1888–1900. https://doi.org/10.1109/TIP.2017.2779600 DOI: https://doi.org/10.1109/TIP.2017.2779600

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88, 303–338. https://doi.org/10.1007/s11263-009-0275-4 DOI: https://doi.org/10.1007/s11263-009-0275-4

Fang, L., Liang, N., Kang, W., Wang, Z., & Feng, D. D. (2020). Real-time hand posture recognition using hand geometric features and Fisher Vector. Signal Processing: Image Communication, 82, 115729. https://doi.org/10.1016/j.image.2019.115729 DOI: https://doi.org/10.1016/j.image.2019.115729

Fathi, A., Farhadi, A., & Rehg, J. M. (2011). Understanding egocentric activities. 2011 International Conference on Computer Vision (pp. 407–414). IEEE. https://doi.org/10.1109/ICCV.2011.6126269 DOI: https://doi.org/10.1109/ICCV.2011.6126269

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). IEEE. https://doi.org/10.1109/CVPR.2016.90 DOI: https://doi.org/10.1109/CVPR.2016.90

Henderson, P., & Ferrari, V. (2017). End-to-End Training of Object Class Detectors for Mean Average Precision. In S.-H. Lai, V. Lepetit, K. Nishino, & Y. Sato (Eds.), Computer Vision – ACCV 2016 (pp. 198–213). Springer International Publishing. https://doi.org/10.1007/978-3-319-54193-8_13 DOI: https://doi.org/10.1007/978-3-319-54193-8_13

Hsiao, Y.-S., Sanchez-Riera, J., Lim, T., Hua, K.-L., & Cheng, W.-H. (2014). LaRED: A large RGB-D extensible hand gesture dataset. 5th ACM Multimedia Systems Conference (pp. 53–58). https://doi.org/10.1145/2557642.2563669 DOI: https://doi.org/10.1145/2557642.2563669

Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7132–7141). IEEE. https://doi.org/10.1109/CVPR.2018.00745 DOI: https://doi.org/10.1109/CVPR.2018.00745

Huang, G., Liu, Z., Maaten, L. V. D., & Weinberger, K. Q. (2017). Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2261–2269). IEEE. https://doi.org/10.1109/CVPR.2017.243 DOI: https://doi.org/10.1109/CVPR.2017.243

Huiwei, Z., Mingqiang, Y., Zhenxing, C., & Qinghe, Z. (2017). A method for static hand gesture recognition based on non-negative matrix factorization and compressive sensing. IAENG International Journal of Computer Science, 44(1), 52–59.

Hung, C.-H., Bai, Y.-W., & Wu, H.-Y. (2015). Home appliance control by a hand gesture recognition belt in LED array lamp case. 2015 IEEE 4th Global Conference on Consumer Electronics (GCCE) (pp. 599–600). IEEE. https://doi.org/10.1109/GCCE.2015.7398611 DOI: https://doi.org/10.1109/GCCE.2015.7398611

Hung, C.-H., Bai, Y.-W., & Wu, H.-Y. (2016). Home outlet and LED array lamp controlled by a smartphone with a hand gesture recognition. 2016 IEEE International Conference on Consumer Electronics (ICCE) (pp. 5–6). IEEE. https://doi.org/10.1109/ICCE.2016.7430502 DOI: https://doi.org/10.1109/ICCE.2016.7430502

Ishiyama, H., & Kurabayashi, S. (2016). Monochrome glove: A robust real-time hand gesture recognition method by using a fabric glove with design of structured markers. 2016 IEEE Virtual Reality (VR), 187–188. https://doi.org/10.1109/VR.2016.7504716 DOI: https://doi.org/10.1109/VR.2016.7504716

Kapitanov, A., Kvanchiani, K., Nagaev, A., Kraynov, R., & Makhlyarchuk, A. (2022). HaGRID - HAnd Gesture recognition image dataset. ArXiv abs/2206.08219. https://doi.org/10.48550/arXiv.2206.08219

Li, Y., Ye, Z., & Rehg, J. M. (2015). Delving into egocentric actions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 287–295). IEEE. https://doi.org/10.1109/CVPR.2015.7298625 DOI: https://doi.org/10.1109/CVPR.2015.7298625

Li, Z., Tang, H., Peng, Z., Qi, G.-J., & Tang, J. (2023). Knowledge-guided semantic transfer network for few-shot image recognition. IEEE Transactions on Neural Networks and Learning Systems, 1–15. https://doi.org/10.1109/TNNLS.2023.3240195 DOI: https://doi.org/10.1109/TNNLS.2023.3240195

Liang, H., Yuan, J., & Thalman, D. (2015). Egocentric hand pose estimation and distance recovery in a single RGB image. 2015 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1–6). IEEE. https://doi.org/10.1109/ICME.2015.7177448 DOI: https://doi.org/10.1109/ICME.2015.7177448

Likitlersuang, J., Sumitro, E. R., Cao, T., Visée, R. J., Kalsi-Ryan, S., & Zariffa, J. (2019). Egocentric video: A new tool for capturing hand use of individuals with spinal cord injury at home. Journal of NeuroEngineering and Rehabilitation, 16, 83. https://doi.org/10.1186/s12984-019-0557-1 DOI: https://doi.org/10.1186/s12984-019-0557-1

Liu, G., Dundar, A., Shih, K. J., Wang, T.-C., Reda, F. A., Sapra, K., Yu, Z., Yang, X., Tao, A., & Catanzaro, B. (2023). Partial convolution for padding, inpainting, and image synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 6096–6110. https://doi.org/10.1109/TPAMI.2022.3209702 DOI: https://doi.org/10.1109/TPAMI.2022.3209702

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single shot MultiBox detector. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), Computer Vision – ECCV 2016 (pp. 21–37). Springer International Publishing. https://doi.org/10.1007/978-3-319-46448-0_2 DOI: https://doi.org/10.1007/978-3-319-46448-0_2

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94 DOI: https://doi.org/10.1023/B:VISI.0000029664.99615.94

Mittal, A., Zisserman, A., & Torr, P. (2011). Hand detection using multiple proposals. Procedings of the British Machine Vision Conference 2011 (pp. 75.1-75.11). https://doi.org/10.5244/C.25.75 DOI: https://doi.org/10.5244/C.25.75

Mohammed, A. A. Q., Lv, J., & Islam, M. S. (2019). A Deep Learning-Based End-to-End composite system for hand detection and gesture recognition. Sensors, 19(23), 5282. https://doi.org/10.3390/s19235282 DOI: https://doi.org/10.3390/s19235282

Nuzzi, C., Pasinetti, S., Pagani, R., Coffetti, G., & Sansoni, G. (2021, March 8). HANDS: A dataset of static Hand-Gestures for Human-Robot Interaction. https://doi.org/10.17632/ndrczc35bt.1 DOI: https://doi.org/10.1016/j.dib.2021.106791

Panwar, M. (2012). Hand gesture recognition based on shape parameters. 2012 International Conference on Computing, Communication and Applications (pp. 1-6). IEEE. https://doi.org/10.1109/ICCCA.2012.6179213 DOI: https://doi.org/10.1109/ICCCA.2012.6179213

Pirsiavash, H., & Ramanan, D. (2012). Detecting activities of daily living in first-person camera views. 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2847–2854). IEEE. https://doi.org/10.1109/CVPR.2012.6248010 DOI: https://doi.org/10.1109/CVPR.2012.6248010

Pugeault, N., & Bowden, R. (2011). Spelling it out: Real-time ASL fingerspelling recognition. 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) (pp. 1114–1119). https://doi.org/10.1109/ICCVW.2011.6130290 DOI: https://doi.org/10.1109/ICCVW.2011.6130290

Rahim, M. A., Shin, J., & Yun, K. S. (2021). Hand gesture-based sign alphabet recognition and sentence interpretation using a convolutional neural network. Annals of Emerging Technologies in Computing, 4(4), 20-27. https://doi.org/10.33166/AETiC.2020.04.003 DOI: https://doi.org/10.33166/AETiC.2020.04.003

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 779–788). IEEE. https://doi.org/10.1109/CVPR.2016.91 DOI: https://doi.org/10.1109/CVPR.2016.91

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems, 28.

Sahoo, J. P., Ari, S., & Patra, S. K. (2019). Hand gesture recognition using PCA based deep CNN reduced features and SVM Classifier. 2019 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS) (pp. 221–224). IEEE. https://doi.org/10.1109/iSES47678.2019.00056 DOI: https://doi.org/10.1109/iSES47678.2019.00056

Sahoo, J. P., Sahoo, S. P., Ari, S., & Patra, S. K. (2022). RBI-2RCNN: Residual block intensity feature using a two-stage residual convolutional neural network for static hand gesture recognition. Signal, Image and Video Processing, 16(8), 2019–2027. https://doi.org/10.1007/s11760-022-02163-w DOI: https://doi.org/10.1007/s11760-022-02163-w

Sahoo, J. P., Sahoo, S. P., Ari, S., & Patra, S. K. (2023). DeReFNet: Dual-stream dense Residual fusion network for static hand gesture recognition. Displays, 77, 102388. https://doi.org/10.1016/j.displa.2023.102388 DOI: https://doi.org/10.1016/j.displa.2023.102388

Sharma, A., Mittal, A., Singh, S., & Awatramani, V. (2020). Hand gesture recognition using image processing and feature extraction techniques. Procedia Computer Science, 173, 181–190. https://doi.org/10.1016/j.procs.2020.06.022 DOI: https://doi.org/10.1016/j.procs.2020.06.022

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. ArXiv abs/1409.1556. https://doi.org/10.48550/arXiv.1409.1556

Srividya, M., Anala, M., Dushyanth, N., & Raju, D. V. S. K. (2019). Hand recognition and motion analysis using faster RCNN. 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS) (pp. 1–4). IEEE. https://doi.org/10.1109/CSITSS47250.2019.9031033 DOI: https://doi.org/10.1109/CSITSS47250.2019.9031033

Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-ResNet and the impact of residual connections on learning. Thirty-First AAAI Conference on Artificial Intelligence (pp. 4278–4284). https://doi.org/10.1609/aaai.v31i1.11231 DOI: https://doi.org/10.1609/aaai.v31i1.11231

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–9). IEEE. https://doi.org/10.1109/CVPR.2015.7298594 DOI: https://doi.org/10.1109/CVPR.2015.7298594

Tang, H., Yuan, C., Li, Z., & Tang, J. (2022). Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognition, 130, 108792. https://doi.org/10.1016/j.patcog.2022.108792 DOI: https://doi.org/10.1016/j.patcog.2022.108792

Utaminingrum, F., Fauzi, M. A., Wihandika, R. C., Adinugroho, S., Kurniawan, T. A., Syauqy, D., Sari, Y. A., & Adikara, P. P. (2017). Development of computer vision based obstacle detection and human tracking on smart wheelchair for disabled patient. 2017 5th International Symposium on Computational and Business Intelligence (ISCBI) (pp 1–5). IEEE. https://doi.org/10.1109/ISCBI.2017.8053533 DOI: https://doi.org/10.1109/ISCBI.2017.8053533

Virender, R., Nikita, Y., & Pulkit, G. (2018). American sign language fingerspelling using hybrid discrete wavelet transform-gabor filter and convolutional neural network. Journal of Engineering Science and Technology, 13(9), 2655–2669.

Vu, A.-K. N., Nguyen, N.-D., Nguyen, K.-D., Nguyen, V.-T., Ngo, T. D., Do, T.-T., & Nguyen, T. V. (2022). Few-shot object detection via baby learning. Image and Vision Computing, 120, 104398. https://doi.org/10.1016/j.imavis.2022.104398 DOI: https://doi.org/10.1016/j.imavis.2022.104398

Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). A survey of transfer learning. Journal of Big Data, 3, 9. https://doi.org/10.1186/s40537-016-0043-6 DOI: https://doi.org/10.1186/s40537-016-0043-6

Xu, C., Cai, W., Li, Y., Zhou, J., & Wei, L. (2020). Accurate hand detection from single-color images by reconstructing hand appearances. Sensors, 20(1), 192. https://doi.org/10.3390/s20010192 DOI: https://doi.org/10.3390/s20010192

Yang, G., Wang, S., & Yang, J. (2019). Desire-Driven Reasoning for Personal Care Robots. IEEE Access, 7, 75203–75212. https://doi.org/10.1109/ACCESS.2019.2921112 DOI: https://doi.org/10.1109/ACCESS.2019.2921112

Zhang, Y., Cao, C., Cheng, J., & Lu, H. (2018). EgoGesture: A new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia, 20(5), 1038–1050. https://doi.org/10.1109/TMM.2018.2808769 DOI: https://doi.org/10.1109/TMM.2018.2808769

Zhao, A., Wu, H., Chen, M., & Wang, N. (2023). A spatio-temporal siamese neural network for multimodal handwriting abnormality screening of Parkinson’s Disease. International Journal of Intelligent Systems, 2023, 9921809. https://doi.org/10.1155/2023/9921809 DOI: https://doi.org/10.1155/2023/9921809

Zheng, Q., Yang, M., Yang, J., Zhang, Q., & Zhang, X. (2018). Improvement of generalization ability of deep CNN via implicit regularization in two-stage training process. IEEE Access, 6, 15844–15869. https://doi.org/10.1109/ACCESS.2018.2810849 DOI: https://doi.org/10.1109/ACCESS.2018.2810849

Article Details

ELBAHRI, M., TALEB, N., ARDJOUN, S. A. E. M., & ZOUAOUI , C. M. A. (2024). FEW-SHOT LEARNING WITH PRE-TRAINED LAYERS INTEGRATION APPLIED TO HAND GESTURE RECOGNITION FOR DISABLED PEOPLE . Applied Computer Science, 20(2), 1–23. https://doi.org/10.35784/acs-2024-13

ACM
ACS
APA
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver

Endnote/Zotero/Mendeley (RIS)
BibTeX

Abstract views: 475

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.