A STEP TOWARDS THE MAJORITY-BASED CLUSTERING VALIDATION DECISION FUSION METHOD

Taras Panskyi; Volodymyr Mosorov

doi:10.35784/iapgos.2596

A STEP TOWARDS THE MAJORITY-BASED CLUSTERING VALIDATION DECISION FUSION METHOD

Taras Panskyi

tpanski@kis.p.lodz.pl
Lodz University of Technology, Institute of Applied Computer Science, Lodz, Poland (Poland)
http://orcid.org/0000-0002-0416-8711

Volodymyr Mosorov

Lodz University of Technology, Lodz, Poland (Poland)
http://orcid.org/0000-0001-6016-8671

DOI: https://doi.org/10.35784/iapgos.2596

Abstract

A variety of clustering validation indices (CVIs) aimed at validating the results of clustering analysis and determining which clustering algorithm performs best. Different validation indices may be appropriate for different clustering algorithms or partition dissimilarity measures; however, the best suitable index to use in practice remains unknown. A single CVI is generally unable to handle the wide variability and scalability of the data and cope successfully with all the contexts. Therefore, one of the popular approaches is to use a combination of multiple CVIs and fuse their votes into the final decision. The aim of this work is to analyze the majority-based decision fusion method. Thus, the experimental work consisted of designing and implementing the NbClust majority-based decision fusion method and then evaluating the CVIs performance with different clustering algorithms and dissimilarity measures in order to discover the best validation configuration. Moreover, the author proposed to enhance the standard majority-based decision fusion method with straightforward rules for the maximum efficiency of the validation procedure. The result showed that the designed enhanced method with an invasive validation configuration could cope with almost all data sets (99%) with different experimental factors (density, dimensionality, number of clusters, etc.).

Keywords:

clustering, clustering validation index, decision fusion method

References

Akoglu L., Tong H., Koutra D.: Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery 29(3), 2015, 626–688.
DOI: https://doi.org/10.1007/s10618-014-0365-y Google Scholar

Arbelaitz O., Gurrutxaga I., Muguerza J., Pérez J., Perona I.: An extensive comparative study of cluster validity indices. Pattern Recognition 46(1), 2013, 243–256.
DOI: https://doi.org/10.1016/j.patcog.2012.07.021 Google Scholar

Bailey K.D.: Typologies and Taxonomies: An introduction to classification techniques (quantitative applications in the social sciences). SAGE Publications, Thousand Oaks 1994.
DOI: https://doi.org/10.4135/9781412986397 Google Scholar

Ball G.H., Hall D.J.: ISODATA, a Novel Method of Data Analysis and Pattern Classification. Stanford Research Institute 1965.
Google Scholar

Bandyopadhyay S., Maulik U: Nonparametric genetic clustering: comparison of validity indices. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews) 31(1), 2001, 120–125.
DOI: https://doi.org/10.1109/5326.923275 Google Scholar

Beale E.M.L.: Cluster Analysis. Scientific Control Systems, London 1969.
Google Scholar

Bezdek J., Li W., Attikiouzel Y., Windham M.: A geometric approach to cluster validity for normal mixtures. Soft Computing – A Fusion of Foundations, Methodologies and Applications 1(4), 1997, 166 –179.
DOI: https://doi.org/10.1007/s005000050019 Google Scholar

Bezdek J., Pal N.: Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics) 28(3), 1998, 301–315.
DOI: https://doi.org/10.1109/3477.678624 Google Scholar

Berkhin P.: A Survey of Clustering Data Mining Techniques. Grouping Multidimensional Data. Springer, Berlin 2006.
Google Scholar

Braune C., Besecke S., Kruse R.: Density Based Clustering: Alternatives to DBSCAN, Partitional Clustering Algorithms. Springer, Cham 2014.
DOI: https://doi.org/10.1007/978-3-319-09259-1_6 Google Scholar

Brock G., Pihur V., Datta S., Datta S.: clValid: An R Package for Cluster Validation. Journal of Statistical Software 25(4), 2008, 1–22.
DOI: https://doi.org/10.18637/jss.v025.i04 Google Scholar

Brun M., Sima C., Hua J., Lowey J., Carroll B., Suh E., Dougherty E.: Model-based evaluation of clustering validation measures. Pattern Recognition 40(3), 2007, 807–824.
DOI: https://doi.org/10.1016/j.patcog.2006.06.026 Google Scholar

Calinski T., Harabasz J.: A dendrite method for cluster analysis. Communications in Statistics – Theory and Methods 3(1), 1974, 1–27.
DOI: https://doi.org/10.1080/03610927408827101 Google Scholar

Cannataro M., Congiusta A., Mastroianni C., Pugliese A., Talia D., Trunfio P.: Grid-Based Data Mining and Knowledge Discovery. Intelligent Technologies for Information Analysis. Springer, Berlin 2004.
DOI: https://doi.org/10.1007/978-3-662-07952-2_2 Google Scholar

Celebi M.: Partitional clustering algorithms. Springer, Cham 2015.
DOI: https://doi.org/10.1007/978-3-319-09259-1 Google Scholar

Charrad M., Ghazzali N., Boiteau V., Niknafs A.: NbClust: AnRPackage for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software 61(6), 2014, 1–36.
DOI: https://doi.org/10.18637/jss.v061.i06 Google Scholar

Cho K., Lee J.: Grid-Based and Outlier Detection-Based Data Clustering and Classification. Communications in Computer and Information Science. Springer, Berlin 2011.
DOI: https://doi.org/10.1007/978-3-642-20975-8_14 Google Scholar

Chou C., Su M., Lai E.: A new cluster validity measure and its application to image compression. Pattern Analysis and Applications 7(2), 2004, 205–220.
DOI: https://doi.org/10.1007/s10044-004-0218-1 Google Scholar

Davies D., Bouldin D.: A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1(2), 1979, 224–227.
DOI: https://doi.org/10.1109/TPAMI.1979.4766909 Google Scholar

Deng M., Liu Q., Cheng T., Shi Y.: An Adaptive Spatial Clustering Algorithm Based On Delaunay Triangulation. Computers, Environment and Urban Systems 35, 2011, 320–332.
DOI: https://doi.org/10.1016/j.compenvurbsys.2011.02.003 Google Scholar

Dimitriadou E.: cclust: Convex Clustering Methods and Clustering Indexes. R package version 0.6-18, 2014.
Google Scholar

Dimitriadou E., Dolňicar S., Weingessel A.: An examination of indexes for determining the number of clusters in binary data sets. Psychometrika 67(1), 2002, 137–159.
DOI: https://doi.org/10.1007/BF02294713 Google Scholar

Dubes R.: How many clusters are best? – An experiment. Pattern Recognition 20(6), 1987, 645–663.
DOI: https://doi.org/10.1016/0031-3203(87)90034-3 Google Scholar

Duda R., Hart P: Pattern classification and scene analysis. Wiley, New York 1973.
Google Scholar

Duda R, Hart P., Stork D.: Pattern classification. Wiley, New York 2001.
Google Scholar

Dunn J.: Well-Separated Clusters and Optimal Fuzzy Partitions. Journal of Cybernetics 4(1), 1974, 95–104.
DOI: https://doi.org/10.1080/01969727408546059 Google Scholar

Embrechts E., Gatti C., Linton J., Roysam B.: Hierarchical Clustering for Large Data Sets. Advances in Intelligent Signal Processing and Data Mining. Springer, Berlin 2013.
DOI: https://doi.org/10.1007/978-3-642-28696-4_8 Google Scholar

Estivill-Castro V., Lee I.: Argument Free Clustering For Large Spatial Point-Data Sets Via Boundary Extraction From Delaunay Diagram. Computers, Environment and Urban Systems 26, 2002, 315–334.
DOI: https://doi.org/10.1016/S0198-9715(01)00044-8 Google Scholar

Fränti P., Mariescu-Istodor R., Zhong C.: XNN Graph, Lecture Notes in Computer Science, 10029, 2016, 207–217.
DOI: https://doi.org/10.1007/978-3-319-49055-7_19 Google Scholar

Frey T., van Groenewoud H.: A Cluster Analysis of the D 2 Matrix of White Spruce Stands in Saskatchewan Based on the Maximum-Minimum Principle. The Journal of Ecology 60(3), 1972, 873–886.
DOI: https://doi.org/10.2307/2258571 Google Scholar

Friedman H., Rubin J.: On Some Invariant Criteria for Grouping Data. Journal of the American Statistical Association 62(320), 1967, 1159–1178.
DOI: https://doi.org/10.1080/01621459.1967.10500923 Google Scholar

Granichin O., Volkovich Z., Toledano-Kitai D.: Cluster Validation. Intelligent Systems Reference Library. Springer, Berlin 2015.
DOI: https://doi.org/10.1007/978-3-642-54786-7_7 Google Scholar

Gurrutxaga I., Muguerza J., Arbelaitz O., Pérez J., Martín J.: Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognition Letters 32(3), 2011, 505–515.
DOI: https://doi.org/10.1016/j.patrec.2010.11.006 Google Scholar

Halim Z., J. Khattak J.: Density-based clustering of big probabilistic graphs. Evolving Systems 10, 2019, 333–350.
DOI: https://doi.org/10.1007/s12530-018-9223-2 Google Scholar

Halkidi M., Batistakis Y., Vazirgiannis M.: On Clustering Validation Techniques. Journal of Intelligent Information Systems 17(2/3), 2001, 107–145.
DOI: https://doi.org/10.1023/A:1012801612483 Google Scholar

Handl J., Knowles J.: Multi-Objective Clustering and Cluster Validation. Studies in Computational Intelligence. Springer, Berlin 2006.
Google Scholar

Halkidi M., Vazirgiannis M.: A density-based cluster validity approach using multi-representatives. Pattern Recognition Letters, 29(6), 2008, 773–786.
DOI: https://doi.org/10.1016/j.patrec.2007.12.011 Google Scholar

Halkidi M., Vazirgiannis M.: Clustering validity assessment: finding the optimal partitioning of a data set. Proceedings 2001 IEEE International Conference on Data Mining. IEEE, San Jose 2001.
Google Scholar

Halkidi M., Vazirgiannis M., Batistakis Y.: Quality Scheme Assessment in the Clustering Process. Lecture Notes in Computer Science. Springer, Berlin 2000.
DOI: https://doi.org/10.1007/3-540-45372-5_26 Google Scholar

Hartigan J.A.: Clustering Algorithms. John Wiley & Sons, New York 1975.
Google Scholar

Hennig C.: Methods for merging Gaussian mixture components. Advances in Data Analysis and Classification 4, 2010, 3–34.
DOI: https://doi.org/10.1007/s11634-010-0058-3 Google Scholar

Hornik K.: A CLUE for CLUster Ensembles. Journal of Statistical Software 14(12), 2005, 1–25.
DOI: https://doi.org/10.18637/jss.v014.i12 Google Scholar

Hubert L., Levin J.: A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin 83(6), 1976, 1072–1080.
DOI: https://doi.org/10.1037/0033-2909.83.6.1072 Google Scholar

Kryszczuk K., Hurley P.: Estimation of the Number of Clusters Using Multiple Clustering Validity Indices. Lecture Notes in Computer Science, Springer, Berlin 2010.
DOI: https://doi.org/10.1007/978-3-642-12127-2_12 Google Scholar

Krzanowski W., Lai Y.: A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering. Biometrics 44(1), 1988, 23–34.
DOI: https://doi.org/10.2307/2531893 Google Scholar

Lu J., Zhang G., Ruan D., Wu F.: Multi-objective group decision making: methods, software and applications with fuzzy set techniques. Imperial College Press, London 2007.
DOI: https://doi.org/10.1142/p505 Google Scholar

Maalel W., Zhou K., Martin A., Elouedi Z.: Belief Hierarchical Clustering, Belief Functions: Theory and Applications. Lecture Notes in Computer Science. Springer, Cham 2014.
DOI: https://doi.org/10.1007/978-3-319-11191-9_8 Google Scholar

Marriott F.: Practical Problems in a Method of Cluster Analysis. Biometrics 27(3), 1971, 501–514.
DOI: https://doi.org/10.2307/2528592 Google Scholar

McClain J., Rao V.: CLUSTISZ: A Program to Test for the Quality of Clustering of a Set of Objects. Journal of Marketing Research 12(4), 1975, 456–460.
Google Scholar

Meyer D., Dimitriadou E., Hornik K., Weingessel A., Leisch F.: E1071: Misc Functions of the Department of Statistics, Probability Theory Group. R package version 1.6-8, 2017.
Google Scholar

Milligan G.: An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45(3), 1980, 325–342.
DOI: https://doi.org/10.1007/BF02293907 Google Scholar

Milligan G., Cooper M.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 1985, 159–179.
DOI: https://doi.org/10.1007/BF02294245 Google Scholar

Nerurkar P., Pavate A., Shah M., Jacob S.: Performance of Internal Cluster Validations Measures for Evolutionary Clustering. Advances in Intelligent Systems and Computing. Springer, Singapore 2018.
DOI: https://doi.org/10.1007/978-981-13-1513-8_105 Google Scholar

Nieweglowski L.: clv: Cluster Validation Techniques. R package version 0.3-2.1, 2014.
Google Scholar

Oliveira J., Pedrycz W.: Advances in fuzzy clustering and its applications. John Wiley & Sons Ltd, Chichester 2007.
Google Scholar

Peng Q., Wang Y., Ou G., Tian Y., Huang L., Pang W.: Partitioning Clustering Based on Support Vector Ranking. Lecture Notes in Computer Science. Springer, Cham 2016.
DOI: https://doi.org/10.1007/978-3-319-49586-6_52 Google Scholar

Ratkowsky D.A., Lance G.N.: A Criterion for Determining the Number of Groups in a Classification. Australian Computer Journal 10(3), 1978, 115–117.
Google Scholar

Rezaei M., Fränti P.: Set Matching Measures for External Cluster Validity. IEEE Transactions on Knowledge and Data Engineering 28(8), 2016, 2173–2186.
DOI: https://doi.org/10.1109/TKDE.2016.2551240 Google Scholar

Rousseeuw P.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 1987, 53–65.
DOI: https://doi.org/10.1016/0377-0427(87)90125-7 Google Scholar

Roux M.: A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms. Journal of Classification 35(2), 2018, 345–366.
DOI: https://doi.org/10.1007/s00357-018-9259-9 Google Scholar

Sarle W.S.: Cubic Clustering Criterion, SAS Technical Report A-108. SAS Institute Inc, Cary 1983.
Google Scholar

Saemi B., Hosseinabadi A., Kardgar M., Balas V., Ebadi H.: Nature Inspired Partitioning Clustering Algorithms: A Review and Analysis. Advances in Intelligent Systems and Computing. Springer, Cham 2017.
DOI: https://doi.org/10.1007/978-3-319-62524-9_9 Google Scholar

Scott A., Symons M.: Clustering Methods Based on Likelihood Ratio Criteria. Biometrics 27(2), 1971, 387–397.
DOI: https://doi.org/10.2307/2529003 Google Scholar

Shim Y., Chung J., Choi I.: A Comparison Study of Cluster Validity Indices Using a Nonhierarchical Clustering Algorithm. International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06). IEEE, Vienna 2005.
Google Scholar

Steinley D., Henson R.: OCLUS: An Analytic Method for Generating Clusters with Known Overlap. Journal of Classification 22(2), 2005, 221–250.
DOI: https://doi.org/10.1007/s00357-005-0015-6 Google Scholar

Tan P., Steinbach M., Kumar V.: Introduction to data mining. Pearson, 2005.
Google Scholar

Vathy-Fogarassy A., Abonyi J.: Graph-Based Clustering and Data Visualization Algorithms. Springer, London 2013.
DOI: https://doi.org/10.1007/978-1-4471-5158-6 Google Scholar

Walesiak M., Dudek A.: clusterSim: Searching for Optimal Clustering Procedure for a Data Set. R package version 0.43-4, 2014.
Google Scholar

Yera A., Arbelaitz O., Jodra J., Gurrutxaga I., Pérez J., Muguerza J.: Analysis of several decision fusion strategies for clustering validation. Strategy definition, experiments and validation. Pattern Recognition Letters 85, 2017, 42–48.
DOI: https://doi.org/10.1016/j.patrec.2016.11.009 Google Scholar

Zahn C.: Graph-Theoretical Methods For Detecting And Describing Gestalt Clusters. IEEE Transactions on Computers C-20, 1971, 68–86.
DOI: https://doi.org/10.1109/T-C.1971.223083 Google Scholar

Žalik K., Žalik B.: Validity index for clusters of different sizes and densities. Pattern Recognition Letters 32(2), 2011, 221–234.
DOI: https://doi.org/10.1016/j.patrec.2010.08.007 Google Scholar

Zhong C., Miao D., Wang R.: A Graph-Theoretical Clustering Method Based On Two Rounds Of Minimum Spanning Trees. Pattern Recognition 43, 2010, 752–766.
DOI: https://doi.org/10.1016/j.patcog.2009.07.010 Google Scholar

Download

Published

2021-06-30

Cited by

Panskyi, T., & Mosorov, V. (2021). A STEP TOWARDS THE MAJORITY-BASED CLUSTERING VALIDATION DECISION FUSION METHOD. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 11(2), 4–13. https://doi.org/10.35784/iapgos.2596

Authors

Taras Panskyi
tpanski@kis.p.lodz.pl
Lodz University of Technology, Institute of Applied Computer Science, Lodz, Poland Poland
http://orcid.org/0000-0002-0416-8711

Authors

Volodymyr Mosorov

Lodz University of Technology, Lodz, Poland Poland
http://orcid.org/0000-0001-6016-8671

Statistics

Abstract views: 360
PDF downloads: 284

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

A STEP TOWARDS THE MAJORITY-BASED CLUSTERING VALIDATION DECISION FUSION METHOD

Taras Panskyi

Volodymyr Mosorov

Abstract

Keywords:

References

Authors

Authors

Statistics

License

Most read articles by the same author(s)

CURRENT ISSUE

Make a Submission

Lublin University of Technology Publishing House

Copyright