Perbaikan Kualitas Korpus Untuk Meningkatkan Kualitas Mesin Penerjemah Statistik (Studi Kasus : Bahasa Indonesia – Jawa Krama)

  • Muhammad Gerdy Asparilla Tanjungpura University
  • Herry Sujaini Universitas Tanjungpura
  • Rudy Dwi Nyoto Universitas Tanjungpura

Abstract

Language is a communication tool that is used as a means to interact with the surrounding community. The ability to master many languages will certainly make it easier to interact with other people from different regions. Therefore, translators are needed to increase knowledge of various languages. Statistical Machine Translation (Statistical Machine Translation) is a machine translation approach with translation results produced on the basis of statistical models whose parameters are taken from the results of parallel corpus analysis. Parallel body is a pair of corpus containing sentences in a language and translation. One feature that is used to improve the quality of translation results is with corpus optimization. The aim to be achieved in this study is to look at the influence of the quality of the corpus by filtering out pairs of sentences with quality translation. The filter used is the minimum value of each sentence that is tested by the Bilingual Evaluation Understudy (BLEU) method. Testing is done by comparing the accuracy of the results of the translation before and after corpus optimization. From the results of the research, the use of corpus optimization can improve the quality of translation for Indonesian translation machines to Javanese manners. This can be seen from the results of testing by adding corpus optimization to 15 test sentences outside the corpus, there is an average increase in BLEU values of 10.53% and by using 100 test sentences derived from corpus optimization there is an average increase in BLEU values of 11.63% in automated testing and 0.03% on testing by linguists. Based on this, the machine translating Indonesian statistics into Javanese language using the corpus optimization feature can increase the accuracy of the translation results

References

[1] Sujaini, H. & Bijaksana, A., Strategi Memperbaiki Kualitas Korpus untuk Meningkatkan Kualitas Mesin Penerjemah Statistik.Seminar Nasional Teknologi Informasi XI, 2016
[2] Nugroho, R.A., Adji, T.B. & Hantono, B.S., “Penerjemahan Bahasa Indonesia dan Bahasa Jawa Menggunakan Metode Statistik Berbasis Frasa”, Seminar Nasional Teknologi Informasi dan Komunikasi 2015 (SENTIKA 2015), 2015, hal. 51
[3] Apriani, T., “Pengaruh Kuantitas Korpus Terhadap Akurasi Mesin Penerjemah Statistik Bahasa Bugis Wajo ke Bahasa Indonesia”, Jurnal Sistem dan Teknologi Informasi (JustIN), Vol.1, No. 1, hal. 1-6, 2016
[4] Mandira, S., Sujaini, H. & Putra, A.B., “Perbaikan Probabilitas Lexical Model Untuk Meningkatkan Akurasi Mesin Penerjemah Statistik”, Jurnal Edukasi dan Penelitian Informatika (JEPIN), Vol 2, No. 1, hal. 1-5, 2016.
[5] Jarob, Y., Sujaini, H. & Safriadi, N., “Uji akurasi Penerjemahan Bahasa Indonesia – Dayak Taman Dengan Penandaan Kata Dasar Dan Imbuhan”, Jurnal Edukasi dan Penelitian Informatika (JEPIN), Vol.2, No. 2, hal. 78-83, 2016.
[6] Hasbiansyah, M., “Tuning for Quality untuk Uji Akurasi Mesin Penerjemah Statistik (MPS) Bahasa Indonesia – Bahasa Dayak Kanayatn”, Jurnal Sistem dan Teknologi Informasi (JustIN), Vol. 1, No.1, hal. 1-5, 2016.
[7] Yohanes, B.W., Robert, T., dan Nugroho, S., “Sistem Penerjemah Bahasa Jawa – Aksara Jawa Berbasis Finite State Automata”, Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI), Vol. 6, No. 2, hal. 38-44, Oktober 2014.
[8] Manning, Christopher D., Schutze, Hinrich. 2000. Foundations Of Statistical Natural Language Processing.London : The MIT Press Cambridge Massachusetts.
[9] Sujaini, Herry., Negara, Arif Bijaksana Putra. 2015. Analysis of Extended Word Similarity Clustering based Algorithm on Cognate Language.Gujarat: ESRSA Publications Pvt. Ltd.
[10] Koehn, Philipp. 2007. Moses: Open Source Toolkit for Statistical Machine Translation.Annual Meeting of the Association for Computational Linguistics (ACL), demonstration session, Prague, Czech Republic.
[11] Hunston, S. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press.
[12] Sujaini, H., 2018. Peningkatan Akurasi Penerjemah Bahasa Daerah dengan Optimasi Korpus Paralel.Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI). Vol 7, No. 1
Published
2018-09-28
How to Cite
ASPARILLA, Muhammad Gerdy; SUJAINI, Herry; NYOTO, Rudy Dwi. Perbaikan Kualitas Korpus Untuk Meningkatkan Kualitas Mesin Penerjemah Statistik (Studi Kasus : Bahasa Indonesia Jawa Krama). Jurnal Linguistik Komputasional, [S.l.], v. 1, n. 2, p. 66 - 74, sep. 2018. ISSN 2621-9336. Available at: <http://inacl.id/journal/index.php/jlk/article/view/12>. Date accessed: 31 mar. 2020. doi: https://doi.org/10.26418/jlk.v1i2.12.
Section
Articles