Dense Word Representation Utilization in Indonesian Dependency Parsing

  • Arief Rahman
  • Ayu Purwarianti Institut Teknologi Bandung

Abstract

Available Indonesian dependency parsers can be considered worse than other languages’ parsers that have been researched thoroughly. Currently, Indonesia dependency parsers can’t reliably parse sentences with gerund(s) and/or ellipsis correctly. This is because of the sparse feature representation that causes difficulty in parsing these types of sentences. In this research, dense representation is proposed for Indonesian dependency parser. The use of dense word representation may allow better generalization and gives more information regarding the words to be parsed, which allows a more accurate parsing. The scope of the dependency parsing in this research is limited to well-formed Indonesian sentences, using the local transition-based parsing. Based on our experiments, we found that using word embedding instead of sparse word representation increases parsing accuracy significantly.

References

[1] J. Nivre et al., “Universal Dependencies v1: A Multilingual Treebank Collection,” Proc. 10th Int. Conf. Lang. Resour. Eval. (LREC 2016), pp. 1659–1666, 2016.
[2] M. Nizami and A. Purwarianti, “Modification of Chu-Liu/Edmonds algorithm and MIRA learning algorithm for dependency parser on Indonesian language,” Proc. - 2017 Int. Conf. Adv. Informatics Concepts, Theory Appl. ICAICTA 2017, 2017.
[3] A. Kuncoro, “Pemanfaatan Pengurai Dependensi Ensemble dan Teknik Self-Learning untuk Meningkatkan Akurasi Pengurai Bahasa Indonesia,” 2013.
[4] A. Rahman and A. Purwarianti, “Ensemble Technique Utilization for Indonesian Dependency Parser,” 31st Pacific Asia Conf. Lang. Inf. Comput., 2017.
[5] J. Nivre, J. Hall, and J. Nilsson, “MaltParser: A data-driven parser-generator for dependency parsing,” Lr. 2006, vol. 6, pp. 2216–2219, 2006.
[6] J. Nivre, “An efficient algorithm for projective dependency parsing,” Proc. 8th Int. Work. Parsing Technol. IWPT, pp. 149–160, 2003.
[7] M. Kuhlmann, C. Gómez-Rodríguez, and G. Satta, “Dynamic programming algorithms for transition-based dependency parsers,” ACL-HLT 2011 - Proc. 49th Annu. Meet. Assoc. Comput. Linguist. Hum. Lang. Technol., vol. 1, no. 1974, pp. 673–682, 2011.
[8] R. Johansson and P. Nugues, “Investigating multilingual dependency parsing,” Proc. Tenth Conf. Comput. Nat. Lang. Learn. (CoNLL-X 2006), no. June, pp. 206–210, 2006.
[9] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” arXiv Prepr. arXiv1301.3781, pp. 1–12, 2013.
[10] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” pp. 1–9, 2013.
[11] W. Ling, C. Dyer, A. W. Black, and I. Trancoso, “Two/Too Simple Adaptations of Word2Vec for Syntax Problems,” Proc. 2015 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol., pp. 1299–1304, 2015.
[12] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” 2016.
[13] E. Kiperwasser and Y. Goldberg, “Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations,” Acl, vol. 4, pp. 313–327, 2016.
[14] G. Genthial, “Sequence Tagging with Tensorflow.” [Online]. Available: https://guillaumegenthial.github.io/sequence-tagging-with-tensorflow.html. [Accessed: 14-Oct-2018].
[15] H. Alwi, H. Lapoliwa, A. M. Moeliono, and S. Dardjowidjojo, Tata Bahasa Baku bahasa Indonesia, 3rd ed. Balai Pustaka, 2000.
Published
2020-03-31
How to Cite
RAHMAN, Arief; PURWARIANTI, Ayu. Dense Word Representation Utilization in Indonesian Dependency Parsing. Jurnal Linguistik Komputasional, [S.l.], v. 3, n. 1, p. 12 - 19, mar. 2020. ISSN 2621-9336. Available at: <http://inacl.id/journal/index.php/jlk/article/view/33>. Date accessed: 10 aug. 2020. doi: https://doi.org/10.26418/jlk.v3i1.33.
Section
Articles