FEW-SHOT LEARNING WITH PRE-TRAINED LAYERS INTEGRATION APPLIED TO HAND GESTURE RECOGNITION FOR DISABLED PEOPLE
Mohamed ELBAHRI
elbahri82_m@yahoo.frDjillali Liabes University (Algeria)
https://orcid.org/0000-0001-5361-1567
Nasreddine TALEB
(Algeria)
Sid Ahmed El Mehdi ARDJOUN
(Algeria)
Chakib Mustapha Anouar ZOUAOUI
(Algeria)
Abstract
Employing vision-based hand gesture recognition for the interaction and communication of disabled individuals is highly beneficial. The hands and gestures of this category of people have a distinctive aspect, requiring the adaptation of a deep learning vision-based system with a dedicated dataset for each individual. To achieve this objective, the paper presents a novel approach for training gesture classification using few-shot samples. More specifically, the gesture classifiers are fine-tuned segments of a pre-trained deep network. The global framework consists of two modules. The first one is a base feature learner and a hand detector trained with normal people hand’s images; this module results in a hand detector ad hoc model. The second module is a learner sub-classifier; it is the leverage of the convolution layers of the hand detector feature extractor. It builds a shallow CNN trained with few-shot samples for gesture classification. The proposed approach enables the reuse of segments of a pre-trained feature extractor to build a new sub-classification model. The results obtained by varying the size of the training dataset have demonstrated the efficiency of our method compared to the ones of the literature.
Keywords:
CNN segmentation, few-shot learning, hand gesture, disabled peopleReferences
Bambach, S., Lee, S., Crandall, D. J., & Yu, C. (2015). Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. 2015 IEEE International Conference on Computer Vision (ICCV) (pp. 1949–1957). IEEE. https://doi.org/10.1109/ICCV.2015.226
DOI: https://doi.org/10.1109/ICCV.2015.226
Google Scholar
Bandini, A., & Zariffa, J. (2020). Analysis of the hands in egocentric vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 6846-6866. https://doi.org/10.1109/TPAMI.2020.2986648
DOI: https://doi.org/10.1109/TPAMI.2020.2986648
Google Scholar
Bao, P., Maqueda, A. I., del-Blanco, C. R., & García, N. (2017). Tiny hand gesture recognition without localization via a deep convolutional network. IEEE Transactions on Consumer Electronics, 63(3), 251–257. https://doi.org/10.1109/TCE.2017.014971
DOI: https://doi.org/10.1109/TCE.2017.014971
Google Scholar
Barczak, A. L. C., Reyes, N. H., Abastillas, M., Piccio, A., & Susnjak, T. (2011). A new 2D static hand gesture colour image dataset for ASL gestures. Research Letters in the Information and Mathematical Sciences, 15.
Google Scholar
El Moataz, A., Mammass, D., Mansouri, A., & Nouboud, F. (Eds.). (2020). Image and Signal Processing. 9th International Conference (ICISP 2020). Springer International Publishing. https://doi.org/10.1007/978-3-030-51935-3
DOI: https://doi.org/10.1007/978-3-030-51935-3
Google Scholar
Bhaumik, G., Verma, M., Govil, M. C., & Vipparthi, S. K. (2022). ExtriDeNet: An intensive feature extrication deep network for hand gesture recognition. The Visual Computer, 38(11), 3853–3866. https://doi.org/10.1007/s00371-021-02225-z
DOI: https://doi.org/10.1007/s00371-021-02225-z
Google Scholar
Chattoraj, S., Karan, V., Tanmay, P., (2017). Assistive system for physically disabled people using gesture recognition. 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP) (pp. 60–65). IEEE. https://doi.org/10.1109/SIPROCESS.2017.8124506
DOI: https://doi.org/10.1109/SIPROCESS.2017.8124506
Google Scholar
Damaneh, M. M., Mohanna, F., & Jafari, P. (2023). Static hand gesture recognition in sign language based on convolutional neural network with feature extraction method using ORB descriptor and Gabor filter. Expert Systems with Applications, 211, 118559. https://doi.org/10.1016/j.eswa.2022.118559
DOI: https://doi.org/10.1016/j.eswa.2022.118559
Google Scholar
Dardas, N. H., & Georganas, N. D. (2011). Real-Time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Transactions on Instrumentation and Measurement, 60(11), 3592–3607. https://doi.org/10.1109/TIM.2011.2161140
DOI: https://doi.org/10.1109/TIM.2011.2161140
Google Scholar
Deng, X., Zhang, Y., Yang, S., Tan, P., Chang, L., Yuan, Y., & Wang, H. (2018). Joint hand detection and rotation estimation using CNN. IEEE Transactions on Image Processing, 27(4), 1888–1900. https://doi.org/10.1109/TIP.2017.2779600
DOI: https://doi.org/10.1109/TIP.2017.2779600
Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88, 303–338. https://doi.org/10.1007/s11263-009-0275-4
DOI: https://doi.org/10.1007/s11263-009-0275-4
Google Scholar
Fang, L., Liang, N., Kang, W., Wang, Z., & Feng, D. D. (2020). Real-time hand posture recognition using hand geometric features and Fisher Vector. Signal Processing: Image Communication, 82, 115729. https://doi.org/10.1016/j.image.2019.115729
DOI: https://doi.org/10.1016/j.image.2019.115729
Google Scholar
Fathi, A., Farhadi, A., & Rehg, J. M. (2011). Understanding egocentric activities. 2011 International Conference on Computer Vision (pp. 407–414). IEEE. https://doi.org/10.1109/ICCV.2011.6126269
DOI: https://doi.org/10.1109/ICCV.2011.6126269
Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). IEEE. https://doi.org/10.1109/CVPR.2016.90
DOI: https://doi.org/10.1109/CVPR.2016.90
Google Scholar
Henderson, P., & Ferrari, V. (2017). End-to-End Training of Object Class Detectors for Mean Average Precision. In S.-H. Lai, V. Lepetit, K. Nishino, & Y. Sato (Eds.), Computer Vision – ACCV 2016 (pp. 198–213). Springer International Publishing. https://doi.org/10.1007/978-3-319-54193-8_13
DOI: https://doi.org/10.1007/978-3-319-54193-8_13
Google Scholar
Hsiao, Y.-S., Sanchez-Riera, J., Lim, T., Hua, K.-L., & Cheng, W.-H. (2014). LaRED: A large RGB-D extensible hand gesture dataset. 5th ACM Multimedia Systems Conference (pp. 53–58). https://doi.org/10.1145/2557642.2563669
DOI: https://doi.org/10.1145/2557642.2563669
Google Scholar
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7132–7141). IEEE. https://doi.org/10.1109/CVPR.2018.00745
DOI: https://doi.org/10.1109/CVPR.2018.00745
Google Scholar
Huang, G., Liu, Z., Maaten, L. V. D., & Weinberger, K. Q. (2017). Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2261–2269). IEEE. https://doi.org/10.1109/CVPR.2017.243
DOI: https://doi.org/10.1109/CVPR.2017.243
Google Scholar
Huiwei, Z., Mingqiang, Y., Zhenxing, C., & Qinghe, Z. (2017). A method for static hand gesture recognition based on non-negative matrix factorization and compressive sensing. IAENG International Journal of Computer Science, 44(1), 52–59.
Google Scholar
Hung, C.-H., Bai, Y.-W., & Wu, H.-Y. (2015). Home appliance control by a hand gesture recognition belt in LED array lamp case. 2015 IEEE 4th Global Conference on Consumer Electronics (GCCE) (pp. 599–600). IEEE. https://doi.org/10.1109/GCCE.2015.7398611
DOI: https://doi.org/10.1109/GCCE.2015.7398611
Google Scholar
Hung, C.-H., Bai, Y.-W., & Wu, H.-Y. (2016). Home outlet and LED array lamp controlled by a smartphone with a hand gesture recognition. 2016 IEEE International Conference on Consumer Electronics (ICCE) (pp. 5–6). IEEE. https://doi.org/10.1109/ICCE.2016.7430502
DOI: https://doi.org/10.1109/ICCE.2016.7430502
Google Scholar
Ishiyama, H., & Kurabayashi, S. (2016). Monochrome glove: A robust real-time hand gesture recognition method by using a fabric glove with design of structured markers. 2016 IEEE Virtual Reality (VR), 187–188. https://doi.org/10.1109/VR.2016.7504716
DOI: https://doi.org/10.1109/VR.2016.7504716
Google Scholar
Kapitanov, A., Kvanchiani, K., Nagaev, A., Kraynov, R., & Makhlyarchuk, A. (2022). HaGRID - HAnd Gesture recognition image dataset. ArXiv abs/2206.08219. https://doi.org/10.48550/arXiv.2206.08219
Google Scholar
Li, Y., Ye, Z., & Rehg, J. M. (2015). Delving into egocentric actions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 287–295). IEEE. https://doi.org/10.1109/CVPR.2015.7298625
DOI: https://doi.org/10.1109/CVPR.2015.7298625
Google Scholar
Li, Z., Tang, H., Peng, Z., Qi, G.-J., & Tang, J. (2023). Knowledge-guided semantic transfer network for few-shot image recognition. IEEE Transactions on Neural Networks and Learning Systems, 1–15. https://doi.org/10.1109/TNNLS.2023.3240195
DOI: https://doi.org/10.1109/TNNLS.2023.3240195
Google Scholar
Liang, H., Yuan, J., & Thalman, D. (2015). Egocentric hand pose estimation and distance recovery in a single RGB image. 2015 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1–6). IEEE. https://doi.org/10.1109/ICME.2015.7177448
DOI: https://doi.org/10.1109/ICME.2015.7177448
Google Scholar
Likitlersuang, J., Sumitro, E. R., Cao, T., Visée, R. J., Kalsi-Ryan, S., & Zariffa, J. (2019). Egocentric video: A new tool for capturing hand use of individuals with spinal cord injury at home. Journal of NeuroEngineering and Rehabilitation, 16, 83. https://doi.org/10.1186/s12984-019-0557-1
DOI: https://doi.org/10.1186/s12984-019-0557-1
Google Scholar
Liu, G., Dundar, A., Shih, K. J., Wang, T.-C., Reda, F. A., Sapra, K., Yu, Z., Yang, X., Tao, A., & Catanzaro, B. (2023). Partial convolution for padding, inpainting, and image synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 6096–6110. https://doi.org/10.1109/TPAMI.2022.3209702
DOI: https://doi.org/10.1109/TPAMI.2022.3209702
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single shot MultiBox detector. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), Computer Vision – ECCV 2016 (pp. 21–37). Springer International Publishing. https://doi.org/10.1007/978-3-319-46448-0_2
DOI: https://doi.org/10.1007/978-3-319-46448-0_2
Google Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
DOI: https://doi.org/10.1023/B:VISI.0000029664.99615.94
Google Scholar
Mittal, A., Zisserman, A., & Torr, P. (2011). Hand detection using multiple proposals. Procedings of the British Machine Vision Conference 2011 (pp. 75.1-75.11). https://doi.org/10.5244/C.25.75
DOI: https://doi.org/10.5244/C.25.75
Google Scholar
Mohammed, A. A. Q., Lv, J., & Islam, M. S. (2019). A Deep Learning-Based End-to-End composite system for hand detection and gesture recognition. Sensors, 19(23), 5282. https://doi.org/10.3390/s19235282
DOI: https://doi.org/10.3390/s19235282
Google Scholar
Nuzzi, C., Pasinetti, S., Pagani, R., Coffetti, G., & Sansoni, G. (2021, March 8). HANDS: A dataset of static Hand-Gestures for Human-Robot Interaction. https://doi.org/10.17632/ndrczc35bt.1
DOI: https://doi.org/10.1016/j.dib.2021.106791
Google Scholar
Panwar, M. (2012). Hand gesture recognition based on shape parameters. 2012 International Conference on Computing, Communication and Applications (pp. 1-6). IEEE. https://doi.org/10.1109/ICCCA.2012.6179213
DOI: https://doi.org/10.1109/ICCCA.2012.6179213
Google Scholar
Pirsiavash, H., & Ramanan, D. (2012). Detecting activities of daily living in first-person camera views. 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2847–2854). IEEE. https://doi.org/10.1109/CVPR.2012.6248010
DOI: https://doi.org/10.1109/CVPR.2012.6248010
Google Scholar
Pugeault, N., & Bowden, R. (2011). Spelling it out: Real-time ASL fingerspelling recognition. 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) (pp. 1114–1119). https://doi.org/10.1109/ICCVW.2011.6130290
DOI: https://doi.org/10.1109/ICCVW.2011.6130290
Google Scholar
Rahim, M. A., Shin, J., & Yun, K. S. (2021). Hand gesture-based sign alphabet recognition and sentence interpretation using a convolutional neural network. Annals of Emerging Technologies in Computing, 4(4), 20-27. https://doi.org/10.33166/AETiC.2020.04.003
DOI: https://doi.org/10.33166/AETiC.2020.04.003
Google Scholar
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 779–788). IEEE. https://doi.org/10.1109/CVPR.2016.91
DOI: https://doi.org/10.1109/CVPR.2016.91
Google Scholar
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems, 28.
Google Scholar
Sahoo, J. P., Ari, S., & Patra, S. K. (2019). Hand gesture recognition using PCA based deep CNN reduced features and SVM Classifier. 2019 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS) (pp. 221–224). IEEE. https://doi.org/10.1109/iSES47678.2019.00056
DOI: https://doi.org/10.1109/iSES47678.2019.00056
Google Scholar
Sahoo, J. P., Sahoo, S. P., Ari, S., & Patra, S. K. (2022). RBI-2RCNN: Residual block intensity feature using a two-stage residual convolutional neural network for static hand gesture recognition. Signal, Image and Video Processing, 16(8), 2019–2027. https://doi.org/10.1007/s11760-022-02163-w
DOI: https://doi.org/10.1007/s11760-022-02163-w
Google Scholar
Sahoo, J. P., Sahoo, S. P., Ari, S., & Patra, S. K. (2023). DeReFNet: Dual-stream dense Residual fusion network for static hand gesture recognition. Displays, 77, 102388. https://doi.org/10.1016/j.displa.2023.102388
DOI: https://doi.org/10.1016/j.displa.2023.102388
Google Scholar
Sharma, A., Mittal, A., Singh, S., & Awatramani, V. (2020). Hand gesture recognition using image processing and feature extraction techniques. Procedia Computer Science, 173, 181–190. https://doi.org/10.1016/j.procs.2020.06.022
DOI: https://doi.org/10.1016/j.procs.2020.06.022
Google Scholar
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. ArXiv abs/1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Google Scholar
Srividya, M., Anala, M., Dushyanth, N., & Raju, D. V. S. K. (2019). Hand recognition and motion analysis using faster RCNN. 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS) (pp. 1–4). IEEE. https://doi.org/10.1109/CSITSS47250.2019.9031033
DOI: https://doi.org/10.1109/CSITSS47250.2019.9031033
Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-ResNet and the impact of residual connections on learning. Thirty-First AAAI Conference on Artificial Intelligence (pp. 4278–4284). https://doi.org/10.1609/aaai.v31i1.11231
DOI: https://doi.org/10.1609/aaai.v31i1.11231
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–9). IEEE. https://doi.org/10.1109/CVPR.2015.7298594
DOI: https://doi.org/10.1109/CVPR.2015.7298594
Google Scholar
Tang, H., Yuan, C., Li, Z., & Tang, J. (2022). Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognition, 130, 108792. https://doi.org/10.1016/j.patcog.2022.108792
DOI: https://doi.org/10.1016/j.patcog.2022.108792
Google Scholar
Utaminingrum, F., Fauzi, M. A., Wihandika, R. C., Adinugroho, S., Kurniawan, T. A., Syauqy, D., Sari, Y. A., & Adikara, P. P. (2017). Development of computer vision based obstacle detection and human tracking on smart wheelchair for disabled patient. 2017 5th International Symposium on Computational and Business Intelligence (ISCBI) (pp 1–5). IEEE. https://doi.org/10.1109/ISCBI.2017.8053533
DOI: https://doi.org/10.1109/ISCBI.2017.8053533
Google Scholar
Virender, R., Nikita, Y., & Pulkit, G. (2018). American sign language fingerspelling using hybrid discrete wavelet transform-gabor filter and convolutional neural network. Journal of Engineering Science and Technology, 13(9), 2655–2669.
Google Scholar
Vu, A.-K. N., Nguyen, N.-D., Nguyen, K.-D., Nguyen, V.-T., Ngo, T. D., Do, T.-T., & Nguyen, T. V. (2022). Few-shot object detection via baby learning. Image and Vision Computing, 120, 104398. https://doi.org/10.1016/j.imavis.2022.104398
DOI: https://doi.org/10.1016/j.imavis.2022.104398
Google Scholar
Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). A survey of transfer learning. Journal of Big Data, 3, 9. https://doi.org/10.1186/s40537-016-0043-6
DOI: https://doi.org/10.1186/s40537-016-0043-6
Google Scholar
Xu, C., Cai, W., Li, Y., Zhou, J., & Wei, L. (2020). Accurate hand detection from single-color images by reconstructing hand appearances. Sensors, 20(1), 192. https://doi.org/10.3390/s20010192
DOI: https://doi.org/10.3390/s20010192
Google Scholar
Yang, G., Wang, S., & Yang, J. (2019). Desire-Driven Reasoning for Personal Care Robots. IEEE Access, 7, 75203–75212. https://doi.org/10.1109/ACCESS.2019.2921112
DOI: https://doi.org/10.1109/ACCESS.2019.2921112
Google Scholar
Zhang, Y., Cao, C., Cheng, J., & Lu, H. (2018). EgoGesture: A new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia, 20(5), 1038–1050. https://doi.org/10.1109/TMM.2018.2808769
DOI: https://doi.org/10.1109/TMM.2018.2808769
Google Scholar
Zhao, A., Wu, H., Chen, M., & Wang, N. (2023). A spatio-temporal siamese neural network for multimodal handwriting abnormality screening of Parkinson’s Disease. International Journal of Intelligent Systems, 2023, 9921809. https://doi.org/10.1155/2023/9921809
DOI: https://doi.org/10.1155/2023/9921809
Google Scholar
Zheng, Q., Yang, M., Yang, J., Zhang, Q., & Zhang, X. (2018). Improvement of generalization ability of deep CNN via implicit regularization in two-stage training process. IEEE Access, 6, 15844–15869. https://doi.org/10.1109/ACCESS.2018.2810849
DOI: https://doi.org/10.1109/ACCESS.2018.2810849
Google Scholar
Authors
Mohamed ELBAHRIelbahri82_m@yahoo.fr
Djillali Liabes University Algeria
https://orcid.org/0000-0001-5361-1567
Authors
Nasreddine TALEBAlgeria
Authors
Sid Ahmed El Mehdi ARDJOUNAlgeria
Authors
Chakib Mustapha Anouar ZOUAOUIAlgeria
Statistics
Abstract views: 310PDF downloads: 121
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Applied Computer Science are open-access and distributed under the terms of the Creative Commons Attribution 4.0 International License.
Similar Articles
- Hae Chan Na, Yoon Sang Kim, A STUDY ON AN AR-BASED CIRCUIT PRACTICE , Applied Computer Science: Vol. 20 No. 1 (2024)
- Ghania Zidani, Djalal DJARAH, Abdslam BENMAKHLOUF, Laid KHETTACHE, OPTIMIZING PEDESTRIAN TRACKING FOR ROBUST PERCEPTION WITH YOLOv8 AND DEEPSORT , Applied Computer Science: Vol. 20 No. 1 (2024)
- Monika KULISZ, Aigerim DUISENBEKOVA, Justyna KUJAWSKA, Danira KALDYBAYEVA, Bibigul ISSAYEVA, Piotr LICHOGRAJ, Wojciech CEL, IMPLICATIONS OF NEURAL NETWORK AS A DECISION-MAKING TOOL IN MANAGING KAZAKHSTAN’S AGRICULTURAL ECONOMY , Applied Computer Science: Vol. 19 No. 4 (2023)
You may also start an advanced similarity search for this article.