Perkembangan Part-of-Speech Tagger Bahasa Indonesia

  • Mia Kamayani Universitas Muhammadiyah Prof. Dr. HAMKA

Abstract

Tujuan dari artikel ini adalah membuat kajian literatur terhadap metode pelabelan part-of-speech (POS tagger) untuk Bahasa Indonesia yang telah dilakukan selama 11 tahun terakhir (sejak tahun 2008). Artikel ini dapat menjadi roadmap POS tagger Bahasa Indonesia dan juga dasar pertimbangan untuk pengembangan selanjutnya agar menggunakan dataset dan tagset yang standar sebagai benchmark metode. Terdapat 15 publikasi yang dibahas, pembahasan meliputi dataset, tagset dan metode yang digunakan untuk POS tag Bahasa Indonesia. Dataset yang paling banyak digunakan dan paling mungkin menjadi corpus standar adalah IDN Tagged Corpus terdiri dari lebih dari 250.000 token. Tagset Bahasa Indonesia hingga saat ini belum terstandarisasi dengan jumlah label bervariasi dari 16 tag hingga 37 tag. Metode yang paling banyak dikembangkan dan berpotensi menjadi state-of-the-art adalah neural network, dengan varian metode biLSTM dan CRF dan sejauh ini memberikan skor F1 dan akurasi tertinggi (>96%).

References

[1] S. D. Larasati, “IDENTIC Corpus: Morphologically Enriched Indonesian - English Parallel Corpus,” in LREC, 2012, pp. 902–906.
[2] A. Nurwidyantoro and E. Winarko, “Parallelization of Maximum Entropy POS Tagging for Bahasa Indonesia with MapReduce,” Int. J. Comput. Sci. Issues, vol. 9, no. 4, pp. 1–6, 2012.
[3] A. F. Wicaksono and P. Ayu, “HMM Based Part-of-Speech Tagger for Bahasa Indonesia,” in Prooceedings of 4th International MALINDO (Malay and Indonesian Language) Workshop, 2010, no. August, pp. 1–7.
[4] A. Dinakaramani, F. Rashel, A. Luthfi, and R. Manurung, “Designing an Indonesian Part of Speech Tagset and Manually Tagged Indonesian Corpus,” in Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014, 2014, pp. 66–69.
[5] F. Rashel, A. Luthfi, A. Dinakaramani, and R. Manurung, “Building an Indonesian Rule-Based part-of-speech Tagger,” in Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014, 2014, pp. 70–73.
[6] F. Pisceldo, M. Adriani, and R. Manurung, “Probabilistic Part Of Speech Tagging for Bahasa Indonesia,” in Third International MALINDO Workshop, 2009, no. May, pp. 1–6.
[7] S. D. Larasati, V. Kuboň, and D. Zeman, “Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus,” in International Workshop on Systems and Frameworks for Computational Morphology, 2011, pp. 119–129.
[8] S. Fu, N. Lin, G. Zhu, and S. Jiang, “Towards Indonesian Part-of-Speech Tagging : Corpus and Models,” in Proceedings of LREC 2018 Workshop on Belt and Road LRE, 2018, vol. 1, pp. 2–7.
[9] S. Sari, H. Hayurani, M. Adriani, and S. Bressan, “Developing part of speech tagger for bahasa indonesia using brill tagger,” in The International Second MALINDO Workshop, 2008.
[10] F. Pisceldo, R. Mahendra, R. Manurung, and I. W. Arka, “A Two-Level Morphological Analyser for the Indonesian Language,” in Proceedings of the 2008 Australasian Language Technology Association Workshop (ALTA 2008), 2008, vol. 6, pp. 142–150.
[11] S. Larasati, V. Kuboň, and D. Zeman, “Indonesian morphology tool (morphind): Towards an indonesian corpus,” Int. Work. Syst., 2011.
[12] R. S. Yuwana, A. R. Yuliani, and H. F. Pardede, “On Part of Speech Tagger for Indonesian Language,” in 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), 2017, pp. 369–372.
[13] H. Sujaini, K. Kuspriyanto, A. Akhmad Arman, and A. Purwarianti, “A Novel Part-of-Speech Set Developing Method for Statistical Machine Translation,” TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 12, no. 3, p. 581, 2014.
[14] A. Hamzah and N. Widyastuti, “Document Subjectivity and Target Detection in Opinion Mining using HMM POS-Tagger,” in 2015 International Conference on Information \& Communication Technology and Systems (ICTS), 2015, pp. 83–88.
[15] A. Purwarianti, A. Andhika, A. F. Wicaksono, I. Afif, and F. Ferdian, “InaNLP: Indonesia natural language processing toolkit, case study: Complaint tweet classification,” in 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA), 2016, pp. 1–5.
[16] Muljono, U. Afini, C. Supriyanto, and R. A. Nugroho, “The Development of Indonesian POS Tagging System for Computer-Aided Independent Language Learning,” Int. J. Emerg. Technol. Learn., vol. 12, no. 11, pp. 138–150, 2017.
[17] K. Widhiyanti and A. Harjoko, “POS Tagging for Bahasa Indonesia dengan HMM dan Rule Based,” INFORMATIKA, vol. 8, no. 2, pp. 151–167, 2012.
[18] Muljono, U. Afini, and C. Supriyanto, “Morphology Analysis for Hidden Markov Model based Indonesian part-of-speech Tagger,” in 2017 1st International Conference on Informatics and Computational Sciences (ICICoS), 2017, pp. 237–240.
[19] F. Ramadhanti, Y. Wibisono, and R. A. Sukamto, “Analisis Morfologi untuk Menangani Out-of-Vocabulary Words pada Part-of-Speech Tagger Bahasa Indonesia Menggunakan Hidden Markov Model,” J. Linguist. Komputasional, vol. 2, no. 1, p. 6, 2019.
[20] A. F. Abka, “Evaluating the use of word embeddings for part-of-speech tagging in Bahasa Indonesia,” in 2016 International Conference on Computer, Control, Informatics and its Applications (IC3INA), 2016, pp. 209–214.
[21] L. P. Manik, A. Ferti Syafiandini, H. F. Mustika, A. Fatchuttamam Abka, and Y. Rianto, “Evaluating the Morphological and Capitalization Features for Word Embedding-Based POS Tagger in Bahasa Indonesia,” in 2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA), 2018, pp. 49–53.
[22] X. Ma and E. Hovy, “End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF,” arXiv Prepr. arXiv1603.01354, 2016.
[23] R. S. Yuwana, E. Suryawati, and H. F. Pardede, “On Empirical Evaluation of Deep Architectures for Indonesian POS Tagging Problem,” in 2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA), 2018, pp. 204–208.
[24] K. Kurniawan and A. F. Aji, “Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging,” in 2018 International Conference on Asian Language Processing (IALP), 2018, pp. 303–307.
Published
2019-09-24
How to Cite
KAMAYANI, Mia. Perkembangan Part-of-Speech Tagger Bahasa Indonesia. Jurnal Linguistik Komputasional, [S.l.], v. 2, n. 2, p. 34 - 38, sep. 2019. ISSN 2621-9336. Available at: <http://inacl.id/journal/index.php/jlk/article/view/20>. Date accessed: 22 nov. 2019. doi: https://doi.org/10.26418/jlk.v2i2.20.
Section
Articles