KROK W KIERUNKU METODY FUZJI DECYZJI OPARTEJ NA WIĘKSZOŚCI DLA WALIDACJI WYNIKÓW KLASTERYZACJI

Taras Panskyi

tpanski@kis.p.lodz.pl
Lodz University of Technology, Institute of Applied Computer Science, Lodz, Poland (Polska)
http://orcid.org/0000-0002-0416-8711

Volodymyr Mosorov


Politechnika Łódzka, Łódź, Polska (Polska)
http://orcid.org/0000-0001-6016-8671

Abstrakt

Różnorodne indeksy walidacji klasteryzacji (CVI) mają na celu walidację wyników analizy skupień i określenie, który algorytm klasteryzacji działa najlepiej. Różne indeksy walidacji mogą być odpowiednie dla różnych algorytmów klasteryzacji lub miar niepodobieństwa podziału; jednak najlepszy walidacyjny indeks do zastosowania w praktyce pozostaje nieznany. Pojedynczy CVI na ogół nie jest w stanie poradzić sobie z dużą zmiennością i skalowalnością danych oraz z powodzeniem poradzić sobie we wszystkich kontekstach. Dlatego jednym z popularnych podejść jest użycie kombinacji wielu CVIs i połączenie ich głosów w ostateczną decyzję. Celem tej pracy jest analiza metody fuzji decyzji opartej na większości. W związku z tym prace eksperymentalne polegały na zaprojektowaniu i wdrożeniu metody NbClust fuzji decyzji opartej na większości, a następnie ocenianie wydajności CVIs za pomocą różnych algorytmów klasteryzacji i miar niepodobieństwa w celu odkrycia najlepszej konfiguracji walidacji. Ponadto autor zaproponował rozszerzenie standardowej metody fuzji decyzji oparta na większości o proste reguły dla maksymalnej efektywności procedury walidacji. Wynik pokazał, że zaprojektowana ulepszona metoda z inwazyjną konfiguracją walidacji może poradzić sobie z prawie wszystkimi zbiorami danych (99%) z różnymi eksperymentalnymi parametrami (gęstość, wymiarowość, liczba klastrów, itp.).


Słowa kluczowe:

klasteryzacja, indeks walidacji klasteryzacji, metoda fuzji decyzji

Akoglu L., Tong H., Koutra D.: Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery 29(3), 2015, 626–688.
DOI: https://doi.org/10.1007/s10618-014-0365-y   Google Scholar

Arbelaitz O., Gurrutxaga I., Muguerza J., Pérez J., Perona I.: An extensive comparative study of cluster validity indices. Pattern Recognition 46(1), 2013, 243–256.
DOI: https://doi.org/10.1016/j.patcog.2012.07.021   Google Scholar

Bailey K.D.: Typologies and Taxonomies: An introduction to classification techniques (quantitative applications in the social sciences). SAGE Publications, Thousand Oaks 1994.
DOI: https://doi.org/10.4135/9781412986397   Google Scholar

Ball G.H., Hall D.J.: ISODATA, a Novel Method of Data Analysis and Pattern Classification. Stanford Research Institute 1965.
  Google Scholar

Bandyopadhyay S., Maulik U: Nonparametric genetic clustering: comparison of validity indices. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews) 31(1), 2001, 120–125.
DOI: https://doi.org/10.1109/5326.923275   Google Scholar

Beale E.M.L.: Cluster Analysis. Scientific Control Systems, London 1969.
  Google Scholar

Bezdek J., Li W., Attikiouzel Y., Windham M.: A geometric approach to cluster validity for normal mixtures. Soft Computing – A Fusion of Foundations, Methodologies and Applications 1(4), 1997, 166 –179.
DOI: https://doi.org/10.1007/s005000050019   Google Scholar

Bezdek J., Pal N.: Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics) 28(3), 1998, 301–315.
DOI: https://doi.org/10.1109/3477.678624   Google Scholar

Berkhin P.: A Survey of Clustering Data Mining Techniques. Grouping Multidimensional Data. Springer, Berlin 2006.
  Google Scholar

Braune C., Besecke S., Kruse R.: Density Based Clustering: Alternatives to DBSCAN, Partitional Clustering Algorithms. Springer, Cham 2014.
DOI: https://doi.org/10.1007/978-3-319-09259-1_6   Google Scholar

Brock G., Pihur V., Datta S., Datta S.: clValid: An R Package for Cluster Validation. Journal of Statistical Software 25(4), 2008, 1–22.
DOI: https://doi.org/10.18637/jss.v025.i04   Google Scholar

Brun M., Sima C., Hua J., Lowey J., Carroll B., Suh E., Dougherty E.: Model-based evaluation of clustering validation measures. Pattern Recognition 40(3), 2007, 807–824.
DOI: https://doi.org/10.1016/j.patcog.2006.06.026   Google Scholar

Calinski T., Harabasz J.: A dendrite method for cluster analysis. Communications in Statistics – Theory and Methods 3(1), 1974, 1–27.
DOI: https://doi.org/10.1080/03610927408827101   Google Scholar

Cannataro M., Congiusta A., Mastroianni C., Pugliese A., Talia D., Trunfio P.: Grid-Based Data Mining and Knowledge Discovery. Intelligent Technologies for Information Analysis. Springer, Berlin 2004.
DOI: https://doi.org/10.1007/978-3-662-07952-2_2   Google Scholar

Celebi M.: Partitional clustering algorithms. Springer, Cham 2015.
DOI: https://doi.org/10.1007/978-3-319-09259-1   Google Scholar

Charrad M., Ghazzali N., Boiteau V., Niknafs A.: NbClust: AnRPackage for Determining the Relevant Number of Clusters in a Data Set. Journal of Statistical Software 61(6), 2014, 1–36.
DOI: https://doi.org/10.18637/jss.v061.i06   Google Scholar

Cho K., Lee J.: Grid-Based and Outlier Detection-Based Data Clustering and Classification. Communications in Computer and Information Science. Springer, Berlin 2011.
DOI: https://doi.org/10.1007/978-3-642-20975-8_14   Google Scholar

Chou C., Su M., Lai E.: A new cluster validity measure and its application to image compression. Pattern Analysis and Applications 7(2), 2004, 205–220.
DOI: https://doi.org/10.1007/s10044-004-0218-1   Google Scholar

Davies D., Bouldin D.: A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1(2), 1979, 224–227.
DOI: https://doi.org/10.1109/TPAMI.1979.4766909   Google Scholar

Deng M., Liu Q., Cheng T., Shi Y.: An Adaptive Spatial Clustering Algorithm Based On Delaunay Triangulation. Computers, Environment and Urban Systems 35, 2011, 320–332.
DOI: https://doi.org/10.1016/j.compenvurbsys.2011.02.003   Google Scholar

Dimitriadou E.: cclust: Convex Clustering Methods and Clustering Indexes. R package version 0.6-18, 2014.
  Google Scholar

Dimitriadou E., Dolňicar S., Weingessel A.: An examination of indexes for determining the number of clusters in binary data sets. Psychometrika 67(1), 2002, 137–159.
DOI: https://doi.org/10.1007/BF02294713   Google Scholar

Dubes R.: How many clusters are best? – An experiment. Pattern Recognition 20(6), 1987, 645–663.
DOI: https://doi.org/10.1016/0031-3203(87)90034-3   Google Scholar

Duda R., Hart P: Pattern classification and scene analysis. Wiley, New York 1973.
  Google Scholar

Duda R, Hart P., Stork D.: Pattern classification. Wiley, New York 2001.
  Google Scholar

Dunn J.: Well-Separated Clusters and Optimal Fuzzy Partitions. Journal of Cybernetics 4(1), 1974, 95–104.
DOI: https://doi.org/10.1080/01969727408546059   Google Scholar

Embrechts E., Gatti C., Linton J., Roysam B.: Hierarchical Clustering for Large Data Sets. Advances in Intelligent Signal Processing and Data Mining. Springer, Berlin 2013.
DOI: https://doi.org/10.1007/978-3-642-28696-4_8   Google Scholar

Estivill-Castro V., Lee I.: Argument Free Clustering For Large Spatial Point-Data Sets Via Boundary Extraction From Delaunay Diagram. Computers, Environment and Urban Systems 26, 2002, 315–334.
DOI: https://doi.org/10.1016/S0198-9715(01)00044-8   Google Scholar

Fränti P., Mariescu-Istodor R., Zhong C.: XNN Graph, Lecture Notes in Computer Science, 10029, 2016, 207–217.
DOI: https://doi.org/10.1007/978-3-319-49055-7_19   Google Scholar

Frey T., van Groenewoud H.: A Cluster Analysis of the D 2 Matrix of White Spruce Stands in Saskatchewan Based on the Maximum-Minimum Principle. The Journal of Ecology 60(3), 1972, 873–886.
DOI: https://doi.org/10.2307/2258571   Google Scholar

Friedman H., Rubin J.: On Some Invariant Criteria for Grouping Data. Journal of the American Statistical Association 62(320), 1967, 1159–1178.
DOI: https://doi.org/10.1080/01621459.1967.10500923   Google Scholar

Granichin O., Volkovich Z., Toledano-Kitai D.: Cluster Validation. Intelligent Systems Reference Library. Springer, Berlin 2015.
DOI: https://doi.org/10.1007/978-3-642-54786-7_7   Google Scholar

Gurrutxaga I., Muguerza J., Arbelaitz O., Pérez J., Martín J.: Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognition Letters 32(3), 2011, 505–515.
DOI: https://doi.org/10.1016/j.patrec.2010.11.006   Google Scholar

Halim Z., J. Khattak J.: Density-based clustering of big probabilistic graphs. Evolving Systems 10, 2019, 333–350.
DOI: https://doi.org/10.1007/s12530-018-9223-2   Google Scholar

Halkidi M., Batistakis Y., Vazirgiannis M.: On Clustering Validation Techniques. Journal of Intelligent Information Systems 17(2/3), 2001, 107–145.
DOI: https://doi.org/10.1023/A:1012801612483   Google Scholar

Handl J., Knowles J.: Multi-Objective Clustering and Cluster Validation. Studies in Computational Intelligence. Springer, Berlin 2006.
  Google Scholar

Halkidi M., Vazirgiannis M.: A density-based cluster validity approach using multi-representatives. Pattern Recognition Letters, 29(6), 2008, 773–786.
DOI: https://doi.org/10.1016/j.patrec.2007.12.011   Google Scholar

Halkidi M., Vazirgiannis M.: Clustering validity assessment: finding the optimal partitioning of a data set. Proceedings 2001 IEEE International Conference on Data Mining. IEEE, San Jose 2001.
  Google Scholar

Halkidi M., Vazirgiannis M., Batistakis Y.: Quality Scheme Assessment in the Clustering Process. Lecture Notes in Computer Science. Springer, Berlin 2000.
DOI: https://doi.org/10.1007/3-540-45372-5_26   Google Scholar

Hartigan J.A.: Clustering Algorithms. John Wiley & Sons, New York 1975.
  Google Scholar

Hennig C.: Methods for merging Gaussian mixture components. Advances in Data Analysis and Classification 4, 2010, 3–34.
DOI: https://doi.org/10.1007/s11634-010-0058-3   Google Scholar

Hornik K.: A CLUE for CLUster Ensembles. Journal of Statistical Software 14(12), 2005, 1–25.
DOI: https://doi.org/10.18637/jss.v014.i12   Google Scholar

Hubert L., Levin J.: A general statistical framework for assessing categorical clustering in free recall. Psychological Bulletin 83(6), 1976, 1072–1080.
DOI: https://doi.org/10.1037/0033-2909.83.6.1072   Google Scholar

Kryszczuk K., Hurley P.: Estimation of the Number of Clusters Using Multiple Clustering Validity Indices. Lecture Notes in Computer Science, Springer, Berlin 2010.
DOI: https://doi.org/10.1007/978-3-642-12127-2_12   Google Scholar

Krzanowski W., Lai Y.: A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering. Biometrics 44(1), 1988, 23–34.
DOI: https://doi.org/10.2307/2531893   Google Scholar

Lu J., Zhang G., Ruan D., Wu F.: Multi-objective group decision making: methods, software and applications with fuzzy set techniques. Imperial College Press, London 2007.
DOI: https://doi.org/10.1142/p505   Google Scholar

Maalel W., Zhou K., Martin A., Elouedi Z.: Belief Hierarchical Clustering, Belief Functions: Theory and Applications. Lecture Notes in Computer Science. Springer, Cham 2014.
DOI: https://doi.org/10.1007/978-3-319-11191-9_8   Google Scholar

Marriott F.: Practical Problems in a Method of Cluster Analysis. Biometrics 27(3), 1971, 501–514.
DOI: https://doi.org/10.2307/2528592   Google Scholar

McClain J., Rao V.: CLUSTISZ: A Program to Test for the Quality of Clustering of a Set of Objects. Journal of Marketing Research 12(4), 1975, 456–460.
  Google Scholar

Meyer D., Dimitriadou E., Hornik K., Weingessel A., Leisch F.: E1071: Misc Functions of the Department of Statistics, Probability Theory Group. R package version 1.6-8, 2017.
  Google Scholar

Milligan G.: An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45(3), 1980, 325–342.
DOI: https://doi.org/10.1007/BF02293907   Google Scholar

Milligan G., Cooper M.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 1985, 159–179.
DOI: https://doi.org/10.1007/BF02294245   Google Scholar

Nerurkar P., Pavate A., Shah M., Jacob S.: Performance of Internal Cluster Validations Measures for Evolutionary Clustering. Advances in Intelligent Systems and Computing. Springer, Singapore 2018.
DOI: https://doi.org/10.1007/978-981-13-1513-8_105   Google Scholar

Nieweglowski L.: clv: Cluster Validation Techniques. R package version 0.3-2.1, 2014.
  Google Scholar

Oliveira J., Pedrycz W.: Advances in fuzzy clustering and its applications. John Wiley & Sons Ltd, Chichester 2007.
  Google Scholar

Peng Q., Wang Y., Ou G., Tian Y., Huang L., Pang W.: Partitioning Clustering Based on Support Vector Ranking. Lecture Notes in Computer Science. Springer, Cham 2016.
DOI: https://doi.org/10.1007/978-3-319-49586-6_52   Google Scholar

Ratkowsky D.A., Lance G.N.: A Criterion for Determining the Number of Groups in a Classification. Australian Computer Journal 10(3), 1978, 115–117.
  Google Scholar

Rezaei M., Fränti P.: Set Matching Measures for External Cluster Validity. IEEE Transactions on Knowledge and Data Engineering 28(8), 2016, 2173–2186.
DOI: https://doi.org/10.1109/TKDE.2016.2551240   Google Scholar

Rousseeuw P.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 1987, 53–65.
DOI: https://doi.org/10.1016/0377-0427(87)90125-7   Google Scholar

Roux M.: A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms. Journal of Classification 35(2), 2018, 345–366.
DOI: https://doi.org/10.1007/s00357-018-9259-9   Google Scholar

Sarle W.S.: Cubic Clustering Criterion, SAS Technical Report A-108. SAS Institute Inc, Cary 1983.
  Google Scholar

Saemi B., Hosseinabadi A., Kardgar M., Balas V., Ebadi H.: Nature Inspired Partitioning Clustering Algorithms: A Review and Analysis. Advances in Intelligent Systems and Computing. Springer, Cham 2017.
DOI: https://doi.org/10.1007/978-3-319-62524-9_9   Google Scholar

Scott A., Symons M.: Clustering Methods Based on Likelihood Ratio Criteria. Biometrics 27(2), 1971, 387–397.
DOI: https://doi.org/10.2307/2529003   Google Scholar

Shim Y., Chung J., Choi I.: A Comparison Study of Cluster Validity Indices Using a Nonhierarchical Clustering Algorithm. International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06). IEEE, Vienna 2005.
  Google Scholar

Steinley D., Henson R.: OCLUS: An Analytic Method for Generating Clusters with Known Overlap. Journal of Classification 22(2), 2005, 221–250.
DOI: https://doi.org/10.1007/s00357-005-0015-6   Google Scholar

Tan P., Steinbach M., Kumar V.: Introduction to data mining. Pearson, 2005.
  Google Scholar

Vathy-Fogarassy A., Abonyi J.: Graph-Based Clustering and Data Visualization Algorithms. Springer, London 2013.
DOI: https://doi.org/10.1007/978-1-4471-5158-6   Google Scholar

Walesiak M., Dudek A.: clusterSim: Searching for Optimal Clustering Procedure for a Data Set. R package version 0.43-4, 2014.
  Google Scholar

Yera A., Arbelaitz O., Jodra J., Gurrutxaga I., Pérez J., Muguerza J.: Analysis of several decision fusion strategies for clustering validation. Strategy definition, experiments and validation. Pattern Recognition Letters 85, 2017, 42–48.
DOI: https://doi.org/10.1016/j.patrec.2016.11.009   Google Scholar

Zahn C.: Graph-Theoretical Methods For Detecting And Describing Gestalt Clusters. IEEE Transactions on Computers C-20, 1971, 68–86.
DOI: https://doi.org/10.1109/T-C.1971.223083   Google Scholar

Žalik K., Žalik B.: Validity index for clusters of different sizes and densities. Pattern Recognition Letters 32(2), 2011, 221–234.
DOI: https://doi.org/10.1016/j.patrec.2010.08.007   Google Scholar

Zhong C., Miao D., Wang R.: A Graph-Theoretical Clustering Method Based On Two Rounds Of Minimum Spanning Trees. Pattern Recognition 43, 2010, 752–766.
DOI: https://doi.org/10.1016/j.patcog.2009.07.010   Google Scholar


Opublikowane
2021-06-30

Cited By / Share

Panskyi, T., & Mosorov, V. (2021). KROK W KIERUNKU METODY FUZJI DECYZJI OPARTEJ NA WIĘKSZOŚCI DLA WALIDACJI WYNIKÓW KLASTERYZACJI. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 11(2), 4–13. https://doi.org/10.35784/iapgos.2596

Autorzy

Taras Panskyi 
tpanski@kis.p.lodz.pl
Lodz University of Technology, Institute of Applied Computer Science, Lodz, Poland Polska
http://orcid.org/0000-0002-0416-8711

Autorzy

Volodymyr Mosorov 

Politechnika Łódzka, Łódź, Polska Polska
http://orcid.org/0000-0001-6016-8671

Statystyki

Abstract views: 291
PDF downloads: 257


Inne teksty tego samego autora

1 2 > >>