Comparison of an effectiveness of artificial neural networks for various activation functions

Daniel Florek

daniel.florek@pollub.edu.pl
Lublin University of Technology (Poland)

Marek Andrzej Miłosz


Lublin University of Technology (Poland)
https://orcid.org/0000-0002-5898-815X

Abstract

Activation functions play an important role in artificial neural networks (ANNs) because they break the linearity in the data transformations that are performed by models. Thanks to the recent spike in interest around the topic of ANNs, new improvements to activation functions are emerging. The paper presents the results of research on the effectiveness of ANNs for ReLU, Leaky ReLU, ELU, and Swish activation functions. Four different data sets, and three different network architectures were used. Results show that Leaky ReLU, ELU and Swish functions work better in deep and more complex architectures which are to alleviate vanishing gradient and dead neurons problems but at the cost of prediction speed. Swish function seems to speed up training process considerably but neither of the three aforementioned functions comes ahead in accuracy in all used datasets.


Keywords:

activation functions; artificial neural networks; artificial intelligence

A. Abraham, Artificial neural networks. Handbook of measuring system design, John Wiley and Sons Ltd., London (2005) 901-908, https://doi.org/10.1002/0471497398.mm421.
DOI: https://doi.org/10.1002/0471497398.mm421   Google Scholar

V. Nair, G. E. Hinton, Rectified Linear Units Improve Restricted Boltzmann Machines in Proceedings of the 27th International Conference on International Conference on Machine Learning, Omnipress, Madison (2010) 807-814.
  Google Scholar

P. Ramachandran, B. Zoph, Q. V. Le, Searching for activation functions, arXiv (2017), https://doi.org/10.48550/arXiv.1710.05941.
  Google Scholar

A. Krizhevsky, V. Nair, G. E. Hinton, CIFAR-10 and CIFAR-100 datasets http://www.cs.toronto.edu/~kriz/cifar.html , [14.06.2022].
  Google Scholar

D. A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), Published as a conference paper at ICLR 2016 (2015), https://doi.org/10.48550/arXiv.1511.07289.
  Google Scholar

B. Xu, N. Wang, T. Chen, M. Li, Empirical evaluation of rectified activations in convolutional network, arXiv (2015), https://doi.org/10.48550/arXiv.1505.00853.
  Google Scholar

C. Nwankpa, W. Ijomah, A. Gachagan, S. Marshall, Activation functions: Comparison of trends in practice and research for deep learning, arXiv (2018), https://doi.org/10.48550/arxiv.1811.03378.
  Google Scholar

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean et al., TensorFlow: A System for Large-Scale Machine Learning, OSDI 16 (2016) 265-283.
  Google Scholar

Keras, https://keras.io , [14.06.2022].
  Google Scholar

F. Pedregosa et al., Scikit-learn: Machine Learning in Python, JMLR 12 (2011) 2825-2830, https://doi.org/10.48550/arXiv.1201.0490.
  Google Scholar

G. Van Rossum, F. L. Drake, Python 3 Reference Manual, CA: CreateSpace, Scotts Valley, 2009.
  Google Scholar

Anaconda platform website https://anaconda.org/ , [14.06.2022].
  Google Scholar

Animals 10 dataset https://www.kaggle.com/datasets/alessiocorrado99/animals10 , [14.06.2022].
  Google Scholar

Intel Image Classification dataset https://www.kaggle.com/datasets/puneet6060/intel-image-classification , [14.06.2022].
  Google Scholar

K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification, arXiv (2015), https://doi.org/10.48550/arXiv.1502.01852.
DOI: https://doi.org/10.1109/ICCV.2015.123   Google Scholar

X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Journal of Machine Learning Research - Proceedings Track 9 (2010) 249-256.
  Google Scholar

S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, International conference on machine learning, PMLR 37 (2015) 448-456, https://doi.org/10.48550/arXiv.1502.03167.
  Google Scholar

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition (2016) 770-778, https://doi.org/10.48550/arXiv.1512.03385.
DOI: https://doi.org/10.1109/CVPR.2016.90   Google Scholar

F. Chollet, Xception: Deep learning with depthwise separable convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition (2017) 1251-1258, https://doi.org/10.1109/CVPR.2017.195.
DOI: https://doi.org/10.1109/CVPR.2017.195   Google Scholar

tf.data.Dataset API https://www.tensorflow.org/api_docs/python/tf/data/Dataset , [20.06.2022].
  Google Scholar

P. Refaeilzadeh, L. Tang, H. Liu, Cross-Validation. Encyclopedia of Database Systems. Springer, Boston (2009), https://doi.org/10.1007/978-0-387-39940-9_565.
DOI: https://doi.org/10.1007/978-0-387-39940-9_565   Google Scholar

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv (2014), https://doi.org/10.48550/arXiv.1412.6980.
  Google Scholar

Download


Published
2023-03-30

Cited by

Florek, D., & Miłosz, M. (2023). Comparison of an effectiveness of artificial neural networks for various activation functions. Journal of Computer Sciences Institute, 26, 7–12. https://doi.org/10.35784/jcsi.3069

Authors

Daniel Florek 
daniel.florek@pollub.edu.pl
Lublin University of Technology Poland

inż. Daniel Florek


Authors

Marek Andrzej Miłosz 

Lublin University of Technology Poland
https://orcid.org/0000-0002-5898-815X

dr inż. Marek Andrzej Miłosz



Statistics

Abstract views: 172
PDF downloads: 248