Identifikasi Konten Kasar Pada Tweet Bahasa Indonesia

  • Ahmad Fathan Hidayatullah Universitas Islam Indonesia
  • Aufa Aulia Fadila
  • Kiki Purnama Juwairi
  • Royan Abida Nayoan Program Studi Sarjana Teknik Informatika, Universitas Islam Indonesia

Abstract

This study aims to identify tweets containing abusive or offensive content. To do this, we performed five steps, such as, data collection, preprocessing, feature extraction, classification, and evaluation. We employed Multinomial Naïve Bayes and Support Vector Machine with linear kernel as our classification algorithm. Based on the experiment, it is known that the performance of the Support Vector Machine algorithm with linear kernel is superior overall compared to the Multinomial Naïve Bayes algorithm. It can be seen from the result of the values ​​of accuracy, precision, recall, and F1-score for the SVM algorithm, respectively 0.9928; 0.9914; 0.9946; and 0.9930. Whereas the value of accuracy, precision, recall, and F1-score of the Multinomial Naïve Bayes algorithm are 0.9834; 0.9912; 0.9762; and 0.9836. However, it can be concluded that the Support Vector Machine and Multinomial Naïve Bayes algorithm have almost the same performance. This is evidenced by the difference in performance achievements that are not too striking from both algorithm.

References

[1] A. H. Razavi, D. Inkpen, S. Uritsky, and S. Matwin, “Offensive Language Detection Using Multi-level Classification,” in Advances in Artificial Intelligence, vol. 6085, A. Farzindar and V. Kešelj, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 16–27.
[2] U. Bretschneider and R. Peters, “Detecting Offensive Statements towards Foreigners in Social Media,” p. 10.
[3] Y. Chen, Y. Zhou, S. Zhu, and H. Xu, “Detecting Offensive Language in Social Media to Protect Adolescent Online Safety,” in 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, Amsterdam, Netherlands, 2012, pp. 71–80.
[4] H. Mubarak, K. Darwish, and W. Magdy, “Abusive Language Detection on Arabic Social Media,” in Proceedings of the First Workshop on Abusive Language Online, Vancouver, BC, Canada, 2017, pp. 52–56.
[5] S. Malmasi and M. Zampieri, “Detecting Hate Speech in Social Media,” ArXiv171206427 Cs, Dec. 2017.
[6] A. F. Hidayatullah and M. R. Ma’arif, “Pre-processing Tasks in Indonesian Twitter Messages,” J. Phys. Conf. Ser., vol. 801, p. 012072, Jan. 2017.
[7] P.-Y. Zhang, “A HowNet-Based Semantic Relatedness Kernel for Text Classification,” TELKOMNIKA Indones. J. Electr. Eng., vol. 11, no. 4, Apr. 2013.
[8] D. Li-guo, D. Peng, and L. Ai-ping, “A New Naive Bayes Text Classification Algorithm,” TELKOMNIKA Indones. J. Electr. Eng., vol. 12, no. 2, Feb. 2014.
[9] A. F. Hidayatullah and M. R. Ma’arif, “Penerapan Text Mining dalam Klasifikasi Judul Skripsi,” p. 4, 2016.
[10] D. H. Kalokasari, I. M. Shofi, and A. H. Setyaningrum, “Implementasi Algoritma Multinomial Naive Bayes Classifier Pada Sistem Klasifikasi Surat Keluar (Studi Kasus : Diskominfo Kabupaten Tangerang),” J. Tek. Inform., vol. 10, no. 2, Oct. 2017.
[11] I. M. Yulietha and S. A. Faraby, “Klasifikasi Sentimen Review Film Menggunakan Algoritma Support Vector Machine,” p. 11.
[12] T. Joachims, “Text categorization with Support Vector Machines: Learning with many relevant features,” in Machine Learning: ECML-98, vol. 1398, C. Nédellec and C. Rouveirol, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998, pp. 137–142.
[13] A. F. Hidayatullah and A. Sn, “Analisis Sentimen Dan Klasifikasi Kategori Terhadap Tokoh Publik Pada Twitter,” p. 8, 2014.
Published
2019-03-25
How to Cite
HIDAYATULLAH, Ahmad Fathan et al. Identifikasi Konten Kasar Pada Tweet Bahasa Indonesia. Jurnal Linguistik Komputasional, [S.l.], v. 2, n. 1, p. 1 - 5, mar. 2019. ISSN 2621-9336. Available at: <http://inacl.id/journal/index.php/jlk/article/view/15>. Date accessed: 21 oct. 2019. doi: https://doi.org/10.26418/jlk.v2i1.15.
Section
Articles