ANALISIS PERFORMA ALGORITMA SUPERVISED LEARNING TERHADAP DATA DESKRIPSI DENGAN REPRESENTASI DAN PARAMETER TUNING

Authors

  • Rafael Austin Universitas Esa Unggul
  • Alfi Syahrian Universitas Esa Unggul

DOI:

https://doi.org/10.36595/jire.v8i2.1717

Keywords:

Algoritma Klasifikasi, Klasifikasi Teks Medis, Representasi Teks, TF-IDF, Word2Vec

Abstract

Teks medis yang didapatkan dalam bentuk narasi sering kali memiliki sifat yang tidak terstruktur, sehingga diperlukan solusi yang dapat dimanfaatkan secara optimal untuk klasifikasi teks medis tersebut. Permasalahan ini menjadi landasan dilakukannya penelitian yang bertujuan untuk mengevaluasi performa berbagai algoritma klasifikasi dalam mengolah narasi keluhan pasien menggunakan sejumlah pendekatan representasi teks. Dataset yang digunakan terdiri dari deskripsi medis yang telah diberi label secara seimbang dan melalui proses pra-pemrosesan untuk membersihkan serta menstandarkan teks sebelum dimasukkan ke dalam model pembelajaran mesin. Empat metode representasi teks, yaitu Bag of Words, TF-IDF, Word2Vec, dan Hybrid, digunakan untuk mengubah teks menjadi fitur numerik. Lima algoritma klasifikasi diuji dan dibandingkan berdasarkan metrik evaluasi meliputi akurasi, precision, recall, dan F1-score. Hasil penelitian menunjukkan bahwa pendekatan berbasis frekuensi seperti Bag of Words dan TF-IDF, ketika dipadukan dengan algoritma linier, mampu memberikan performa terbaik. Selain itu, proses tuning parameter terbukti penting dalam meningkatkan hasil klasifikasi. Penelitian ini menegaskan bahwa pemilihan kombinasi representasi fitur dan algoritma yang tepat sangat mempengaruhi keberhasilan klasifikasi teks medis berbasis narasi.

Downloads

Download data is not yet available.

References

[1] D. E. Cahyani and I. Patasik, “Performance comparison of tf-idf and word2vec models for emotion text classification,” Bulletin of Electrical Engineering and Informatics, vol. 10, no. 5, pp. 2780–2788, Oct. 2021, doi: 10.11591/eei.v10i5.3157.

[2] L. Almazaydeh, M. Abuhelaleh, A. Al Tawil, and K. Elleithy, “Clinical Text Classification with Word Representation Features and Machine Learning Algorithms,” International journal of online and biomedical engineering, vol. 19, no. 4, pp. 65–76, 2023, doi: 10.3991/ijoe.v19i04.36099.

[3] Y. Shao, S. Taylor, N. Marshall, C. Morioka, and Q. Zeng-Treitler, “Clinical Text Classification with Word Embedding Features vs. Bag-of-Words Features,” in Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, Institute of Electrical and Electronics Engineers Inc., Jul. 2018, pp. 2874–2878. doi: 10.1109/BigData.2018.8622345.

[4] M. Kavitha and P. Prabhavathy, “A review on machine learning techniques for text classification,” in Proceedings of the 2021 4th International Conference on Computing and Communications Technologies, ICCCT 2021, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 605–610. doi: 10.1109/ICCCT53315.2021.9711858.

[5] T.-D. Le, R. Noumeir, J. Rambaud, G. Sans, and P. Jouvet, “Machine Learning Based on Natural Language Processing to Detect Cardiac Failure in Clinical Narratives,” Dec. 2021, [Online]. Available: http://arxiv.org/abs/2104.03934

[6] “TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study,” 2020. [Online]. Available: https://n2c2.dbmi.hms.harvard.edu/

[7] M. C. Untoro, M. Praseptiawan, M. Widianingsih, I. F. Ashari, A. Afriansyah, and Oktafianto, “Evaluation of Decision Tree, K-NN, Naive Bayes and SVM with MWMOTE on UCI Dataset,” in Journal of Physics: Conference Series, Institute of Physics Publishing, 2020. doi: 10.1088/1742-6596/1477/3/032005.

[8] A. M. Aubaid, A. Mishra, and A. Mishra, “Machine learning and rule-based embedding techniques for classifying text documents,” International Journal of System Assurance Engineering and Management, vol. 15, no. 12, pp. 5637–5652, Dec. 2024, doi: 10.1007/s13198-024-02555-w.

[9] S. Das, K. Bhattacharyya, and S. Sarkar, “Performance Analysis of Logistic Regression, Naive Bayes, KNN, Decision Tree, Random Forest and SVM on Hate Speech Detection from Twitter,” International Research Journal of Innovations in Engineering and Technology, vol. 07, no. 03, pp. 07–03, 2023, doi: 10.47001/irjiet/2023.703004.

[10] Nikhil Sanjay Suryawanshi, “Sentiment analysis with machine learning and deep learning: A survey of techniques and applications,” International Journal of Science and Research Archive, vol. 12, no. 2, pp. 005–015, Jul. 2024, doi: 10.30574/ijsra.2024.12.2.1205.

[11] S. Alagarsamy, V. James, and R. S. P. Raj, “An Experimental Analysis of Optimal Hybrid Word Embedding Methods for Text Classification Using a Movie Review Dataset,” Brazilian Archives of Biology and Technology, vol. 65, 2022, doi: 10.1590/1678-4324-2022210830.

[12] I. Sariah et al., “Nanotechnology Perceptions ISSN 1660-6795 www.nano-ntp,” 2024. [Online]. Available: www.nano-ntp.com

[13] B. Putra Aryadi and N. Hendrastuty, “PENERAPAN ALGORITMA K-MEANS UNTUK MELAKUKAN KLASTERISASI PADA VARIETAS PADI,” 2024. [Online]. Available: http://e-journal.stmiklombok.ac.id/index.php/jireISSN.2620-6900

[14] K. Kannan and A. Menaga, “Risk Factor Prediction by Naive Bayes Classifier, Logistic Regression Models, Various Classification and Regression Machine Learning Techniques,” Proceedings of the National Academy of Sciences India Section B - Biological Sciences, vol. 92, no. 1, pp. 63–79, Mar. 2022, doi: 10.1007/s40011-021-01278-3.

[15] M. Das, S. Kamalanathan, and P. Alphonse, “A Comparative Study on TF-IDF feature Weighting Method and its Analysis using Unstructured Dataset,” 2020.

[16] K. G. Ramadhan et al., “KOMPARASI DETEKSI PENYAKIT GINJAL KRONIS MENGGUNAKAN ALGORITMA SUPPORT VECTOR MACHINE DAN RANDOM FOREST,” 2025. [Online]. Available: http://e-journal.stmiklombok.ac.id/index.php/jireISSN.2620-6900

Downloads

Published

2025-11-03

How to Cite

1.
Rafael Austin, Alfi Syahrian. ANALISIS PERFORMA ALGORITMA SUPERVISED LEARNING TERHADAP DATA DESKRIPSI DENGAN REPRESENTASI DAN PARAMETER TUNING. JIRE [Internet]. 2025 Nov. 3 [cited 2025 Nov. 6];8(2):262-73. Available from: http://e-journal.stmiklombok.ac.id/index.php/jire/article/view/1717