K-Nearest Neighbor Regression for Predicting Song Popularity Using Gower Distance

Authors

  • Hazmira Yozza Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900 Tanjong Malim, Perak, Malaysia
  • Riswan Efendi Department of Mathematics, Universiti Pendidikan Sultan Idris, Malaysia
  • Nor Azah Samot @ Samat Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900 Tanjong Malim, Perak, Malaysia
  • Izzati Rahmi Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900 Tanjong Malim, Perak, Malaysia
  • Aqil Burney S.M. Department of Computer Science, University of Karachi, Pakistan

DOI:

https://doi.org/10.37134/ejsmt.vol12.sp.3.2025

Keywords:

song popularity, k-nearest neighbor regression, audio feature, Gower distance, weighting method

Abstract

The machine learning approach is widely used to investigate human activities, such as in the art field. In the music industry, a song's popularity is essential to predict before it is released. In this paper, we were interested in predicting the popularity of songs using the -nearest neighbor regression. The Spotify app was used to gather some information related to the audio features of a song, i.e., song duration, instrumentalness, loudness,  acousticness, danceability, energy, liveness, speechiness, audio valence, key, audio mode, tempo, and time signature. This research used mixed-type variables; thus, the dissimilarity is measured using the Gower distance. In addition, two weighting methods were also compared to predict song popularity. Using 10-fold cross-validation, we found that the inversely proportional weights-distance showed better prediction performance when compared with equal weight. Moreover, we also found the best performance in predicting the song popularity is obtained when = 5 nearest neighbors were used, with mean square error (MSE) of 636.75 and mean absolute percentage error (MAPE) of 41.58% that implies a reasonable prediction result.

Downloads

Download data is not yet available.

Author Biographies

  • Hazmira Yozza, Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900 Tanjong Malim, Perak, Malaysia

    Department of Mathematics and Data Science, Universitas Andalas, Indonesia

  • Izzati Rahmi, Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900 Tanjong Malim, Perak, Malaysia

    Department of Mathematics and Data Science, Universitas Andalas, Indonesia

References

[1] Araujo, C.V.S., Cristo, M.A.P., & Giusti, R. (2019). Predicting music popularity using music charts. Proceeding of 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 859-864. https://doi: 10.1109/ICMLA.2019.00149.

[2] Yang, L.-C., Chou, S.-Y., Liu, J.-Y., Yang, Y.-H., & Chen, Y.-A. (2017). Revisiting the problem of audio-based hit song prediction using convolutional neural networks. Proceeding of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 621-625.

https://doi.org/10.48550/arXiv.1704.01280

[3] Al-Beitawi, Z., Salehan, M., & Zhang, S. (2020). What makes a song trend? Cluster analysis of musical attributes for Spotify top trending songs. Journal of Marketing Development and Competitiveness, 14(3), 79-91. https://doi.org/10.33423/jmdc.v14i3.3065

[4] Pham, J., Kyauk, E., & Park, E. (2016). Predicting song popularity (Tech. Rep. Vol. 26). Dept. Comput. Sci., Stanford Univ., Stanford, CA, USA. https://cs229.stanford.edu/proj2015/140_report.pdf

[5] Askin, N., & Mauskapf, M. (2017). What makes popular culture popular? Product features and optimal differentiation in music. American Sociological Review, 82(5), 910–944.

https://doi.org/10.1177/ 0003122417728662

[6] Pareek, P., Shankar, P., Pathak, P., & Sakariya, N. (2022). Predicting music popularity using machine learning algorithm and music metrics available in spotify. Journal of Development Economics and Management Research Studies (JDMS), 9(11), 10 -19. http://doi.org/10.53422/JDMS.2022.91102

[7] Suh, B. J. (2019). International music preferences: an analysis of the determinants of song popularity on Spotify for the US, Norway, Taiwan, Ecuador, and Costa Rica. CMC Senior Theses.

https://scholarship.claremont.edu/cmc_theses/2271.

[8] Saragih, H.S. (2023). Predicting song popularity based on Spotify's audio features: insights from the Indonesian streaming users. Journal of Management Analytics, 10(4), 693-709.

https://doi.org/10.1080/23270012.2023.2239824

[9] Dong, A., Qiu, R., & Ye, Z. (2023). Regression analysis of song popularity based on ridge, K-nearest neighbors and multiple-layers neural networks. Highlights in Science, Engineering and Technology, 39, 609-617. https://doi.org/10.54097/hset.v39i.6602

[10] Song, Y., Liang, J., Lu, J., & Zhao, X. (2017). An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing, 251, 26–34. https://doi.org/10.1016/j.neucom.2017.04.018

[11] Chen, G.H. & Shah, D. (2018). Explaining the success of nearest neighbor methods in prediction. Foundations and Trends in Machine Learning, 10(5-6), 337–588. https://doi.org/10.1561/2200000064.

Cosenza, D. N., Korhonen, L., Maltamo, M., Packalen, P., Strunk, J. L., Næsset, E., ... & Tomé, M. (2021). Comparison of linear regression, k-nearest neighbour and random forest methods in airborne laser-scanning-based prediction of growing stock. Forestry: An International Journal of Forest Research, 94(2), 311-323. https://doi.org/10.1093/forestry/cpaa034

[12] Shataee, S., Kalbi, S., Fallah, A., & Pelz, D. (2012). Forest attribute imputation using machine-learning methods and ASTER data: comparison of k-NN, SVR and random forest regression algorithms. International Journal of Remote Sensing, 33(19), 6254–6280.

https://doi.org/10.1080/01431161.2012.682661

[13] Zhang, F., & O'Donnell, L. J. (2019). Support vector regression. Machine Learning, 123-140. https://doi.org/10.1016/B978-0-12-815739-8.00007-9

[14] Haykin, S. (2009). Neural networks and learning machines (3rd ed.). Pearson Education, Inc., McMaster University, Hamilton. http://dai.fmph.uniba.sk/courses/NN/haykin.neural-networks.3ed.2009.pdf

[15] Fathabadi, A., Seyedian, S.M., & Malekian, A. (2022). Comparison of bayesian, k-nearest neighbor and gaussian process regression methods for quantifying uncertainty of suspended sediment concentration prediction. Science of The Total Environment, 818, article151760.

https://doi.org/10.1016/j.scitotenv.2021.151760

[16] Liu, W., Wang, P., Meng, Y., Zhao C., and Zhang Z. (2020). Cloud spot instance price prediction using kNN regression. Hum. Cent. Comput. Inf. Sci. 10, 34. https://doi.org/10.1186/s13673-020-00239-5

[17] Paryudi, I. 2019. What affects k value selection In K-nearest neighbor? Int. J. Sci. Technol. Res., 8(7) 86-92. https://www.ijstr.org/research-paper-publishing.php?month=july2019

[18] Kataria, A., Singh, M. (2013). A review of data classification using k-nearest neighbour algorithm. Int. J. Emerg. Technol. Adv. Eng. 3(6), 354–360.

https://www.ijetae.com/files/Volume3Issue6/IJETAE_0613_60.pdf

[19] Van de Velden, M., D’Enza, A. I., Markos, A., & Cavicchia, C. (2024). A general framework for implementing distances for categorical variables. Pattern Recognition, 153, 110547.

https://doi.org/10.1016/j.patcog.2024.110547

[20] Tuerhong, G., Kim, S.B. (2014). Gower distance-based multivariate control charts for a mixture of continuous and categorical variables. Expert Syst. Appl., 41(4), 1701–1707.

https://doi.org/10.1016/j.eswa.2013.08.068.

[21] Sulc, Z., Procházka, J., and Matějkaz, M. (2016). Modifications of the Gower similarity coefficient. The Proceeding of 19th Appl. Math. Stat. Econ. 2016; Banská Štiavnica, Slovakia; Matej Bel University [Online]. https://www.researchgate.net/publication/313387106.

[22] Van de Velden, M., D'Enza, A. I., Markos, A., & Cavicchia, C. (2024). Unbiased mixed variables distance. arXiv preprint arXiv:2411.00429. https://arxiv.org/abs/2411.00429

[23] Kadhim, M.N, Al-Shammary, D., & Sufi, F. (2024). A novel voice classification based on Gower distance for Parkinson disease detection. International Journal of Medical Informatics, 191, 105583. https://doi.org/10.1016/j.ijmedinf.2024.10558

[24] Coombes, C. E., Liu, X., Abrams, Z. B., Coombes, K. R., & Brock, G. (2021). Simulation-derived best practices for clustering clinical data. Journal of Biomedical Informatics, 118, 103788. https://doi.org/10.1016/j.jbi.2021.103788

[25] Yozza. H., Azizah, N.M., Yulianti, L., and Rahmi, I. (2023). The classification of "Program Sembako" recipients in Payobasung West Sumatra based on k-nearest neighbor classifier. Jurnal Natural (in Bahasa). 23(2), 83-91. https://doi.org/ 10.24815/jn.v23i2.29738

[26] Yasser, M. (2021). Song popularity dataset. Available at https://www.kaggle.com/datasets/yasserh/song-popularity-dataset/data

[27] Araujo, V.S., Cristo, M.A.P., & Giusti, R. (2020). Predicting music popularity on streaming platform. Revista de Inform, 27(04), 108-117. http://dx.doi.org/10.22456/2175-2745.107021

[28] Van de Velden, M., D'Enza, A.I., & Markos, A. (2019). Distance‐based clustering of mixed data. Wiley Interdisciplinary Reviews: Computational Statistics, 11(3), e1456. DOI: 10.1002/wics.1456

[29] Kumbure, M.M., & Luukka, P. (2022). A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance. Granul. Comput, 7, 657–671. https://doi.org/10.1007/s41066-021-00288-w

[30] Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 3(1), 91-99. https://doi.org/10.1016/j.gltp.2022.04.020

[31] Nijkamp, R. (2018). Prediction of product success: explaining song popularity by audio features from Spotify data [paper presentation]. 11th IBA Thesis Conference, University of Twente, Enschede, The Netherlands

[32] Jamdar, A., Abraham, J., Khanna, K., & Dubey, R. (2015). Emotion analysis of songs based on lyrical and audio features. Int. J. Artif. Intell. Appl., 6(3), 35–50. https://doi.org/10.5121/ijaia.2015.6304

[33] Kowald, D., Schedl, M., & Lex, E. (2019). The unfairness of popularity bias in music recommendation: A reproducibility study. arXiv preprint arXiv:1912.04696. https://doi.org/10.48550/arXiv.1912.04696

Downloads

Published

2025-04-28

How to Cite

Yozza, H., Efendi, R., Samot @ Samat, N. A., Rahmi, I. ., & S.M., A. B. (2025). K-Nearest Neighbor Regression for Predicting Song Popularity Using Gower Distance. EDUCATUM Journal of Science, Mathematics and Technology, 12, 17-32. https://doi.org/10.37134/ejsmt.vol12.sp.3.2025