Feature Selection using Genetic Algorithm for Clustering high Dimensional Data

Authors

  • Kahkashan Kouser

  • Amrita Priyam

How to Cite

Kouser, K., & Priyam, A. (2018). Feature Selection using Genetic Algorithm for Clustering high Dimensional Data. International Journal of Engineering and Technology, 7(2.11), 27-30. https://doi.org/10.14419/ijet.v7i2.11.11001

Received date: April 3, 2018

Accepted date: April 3, 2018

Published date: April 3, 2018

DOI:

https://doi.org/10.14419/ijet.v7i2.11.11001

Keywords:

feature selection, clustering, high dimensional data, Genetic algorithm.

Abstract

One of the open problems of modern data mining is clustering high dimensional data. For this in the paper a new technique called GA-HDClustering is proposed, which works in two steps. First a GA-based feature selection algorithm is designed to determine the optimal feature subset; an optimal feature subset is consisting of important features of the entire data set next, a K-means algorithm is applied using the optimal feature subset to find the clusters. On the other hand, traditional K-means algorithm is applied on the full dimensional feature space.    Finally, the result of GA-HDClustering  is  compared  with  the  traditional  clustering  algorithm.  For comparison different validity  matrices  such  as  Sum  of  squared  error  (SSE),  Within  Group average distance (WGAD), Between group distance (BGD), Davies-Bouldin index(DBI),   are used .The GA-HDClustering uses genetic algorithm for searching an effective feature subspace in a large feature space. This large feature space is made of all dimensions of the data set. The experiment performed on the standard data set revealed that the GA-HDClustering is superior to traditional clustering algorithm.

 

References

  1. [1] Sun, M., Xiong, L., Sun, H., & Jiang, D. (2009, October), A GA-based feature selection for high-dimensional data clustering. In 3rd International Conference on Genetic and Evolutionary Computing WGEC'09, pp. 769-772.

    [2] Sun, H. J., & Xiong, L. H. (2009, August), Genetic algorithm-based high-dimensional data clustering technique. In Sixth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD'09, Vol. 1, pp. 485-489.

    [3] Parsons, L., Haque, E., & Liu, H. (2004), Subspace clustering for high dimensional data: a review. Acm Sigkdd Explorations Newsletter 6, 90-105.

    [4] Alzubaidi, A., Cosma, G., Brown, D., & Pockley, A. G. (2016, October), Breast cancer diagnosis using a hybrid genetic algorithm for feature selection based on mutual information. In International Conference on Interactive Technologies and Games (iTAG), pp. 70-76.

    [5] Tiwari, R., & Singh, M. P. (2010), Correlation-based attribute selection using genetic algorithm. International Journal of Computer Applications 4, 28-34.

    [6] Li, J. (2015, December), A feature subset selection algorithm based on feature activity and improved GA. In 11th International Conference on Computational Intelligence and Security (CIS), pp. 206-210.

    [7] Chaimontree, S., Atkinson, K., & Coenen, F. (2010, November). Best clustering configuration metrics: towards multiagent based clustering. In International Conference on Advanced Data Mining and Applications (pp. 48-59). Springer, Berlin, Heidelberg.

    [8] David Bouldin Index, Available at: https://en.wikipedia.org/wiki/DavieBouldin_index

    [9] Hall, M. A. (1999). Correlation-based feature selection for machine learning.

    [10] Rostami, M., & Moradi, P. (2014, May), A clustering based genetic algorithm for feature selection. In 6th Conference on Information and Knowledge Technology (IKT), pp. 112-116.

    [11] Desale, K. S., & Ade, R. (2015, January), Genetic algorithm based feature selection approach for effective intrusion detection system. In International Conference on Computer Communication and Informatics (ICCCI), pp. 1-6.

    [12] Song, Q., Ni, J., & Wang, G. (2013), A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering 25, 1-14.

    [13] Chandrashekar, G., & Sahin, F. (2014), A survey on feature selection methods. Computers & Electrical Engineering 40, 16-28.

    [14] Goldberg, D. E. (1989), Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA: Addison-Wesley.

    [15] Han, J., Pei, J., & Kamber, M. (2011), Data mining: concepts and techniques. Elsevier.

    [16] Dunham, M. H. (2006), Data mining: Introductory and advanced topics. Pearson Education India..

Downloads

How to Cite

Kouser, K., & Priyam, A. (2018). Feature Selection using Genetic Algorithm for Clustering high Dimensional Data. International Journal of Engineering and Technology, 7(2.11), 27-30. https://doi.org/10.14419/ijet.v7i2.11.11001

Received date: April 3, 2018

Accepted date: April 3, 2018

Published date: April 3, 2018