Short Answer Grading Using Contextual Word Embedding and Linear Regression

  • Muh Habibi Haidir Institut Teknologi Bandung
  • Ayu Purwarianti Institut Teknologi Bandung


Abstract—One of the obstacles in an efficient MOOC is the evaluation of student answers, including the short answer grading which requires large effort from instructors to conduct it manually. Thus, NLP research in short answer grading has been conducted in order to support the automation, using several techniques such as rule and machine learning based. Here, we’ve conducted experiments on deep learning based short answer grading to compare the answer representation and answer assessment method. In the answer representation, we compared word embedding and sentence embedding models such as BERT, and its modification. In the answer assessment method, we use linear regression. There are 2 datasets that we used, available English short answer grading dataset with 80 questions and 2442 to get the best configuration for model and Indonesian short answer grading dataset with 36 questions and 9165 short answers as testing data. Here, we’ve collected Indonesian short answers for Biology and Geography subjects from 534 respondents where the answer grading was done by 7 experts. The best root mean squared error for both dataset was achieved by using BERT pretrained, 0.880 for English dataset dan 1.893 for Indonesian dataset.


[1] Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. International Journal of Artificial Intelligence in Education, 25(1), 60-117.
[2] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[3] Hassan, S., Fahmy, A. A., & El-Ramly, M. (2018). Automatic Short Answer Scoring based on Paragraph Embeddings. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 9(10), 397-402.
[4] Gong, T., & Yao, X. (2019). An Attention-based Deep Model for Automatic Short Answer Score. International Journal of Computer Science and Software Engineering, 8(6), 127-132.
[5] Gomaa, W. H., & Fahmy, A. A. (2019, March). Ans2vec: A Scoring System for Short Answers. In International Conference on Advanced Machine Learning Technologies and Applications (pp. 586-595). Springer, Cham.
[6] Le, Q., & Mikolov, T. (2014, January). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196).
[7] Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
[8] Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135-146.
[9] Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
[10] Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks.
[11] Conneau, A., Kiela, D., Schwenk, H., Barrault, L., & Bordes, A. (2017). Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364.
[12] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[13] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
[14] Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
[15] Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
[16] Wang, Bin, and C-C. Jay Kuo. "SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models." arXiv preprint arXiv:2002.06652 (2020).
[17] Mohler, M., Bunescu, R., & Mihalcea, R. (2011, June). Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 752-762). Association for Computational Linguistics.
How to Cite
HAIDIR, Muh Habibi; PURWARIANTI, Ayu. Short Answer Grading Using Contextual Word Embedding and Linear Regression. Jurnal Linguistik Komputasional, [S.l.], v. 3, n. 2, p. 54 - 61, sep. 2020. ISSN 2621-9336. Available at: <>. Date accessed: 03 aug. 2021. doi: