Kalibrasi Rasio kemungkinan pada Sistem Rekognisi Pengucap Otomatis untuk Aplikasi Forensik di Indonesia

  • Miranti Indar Mandasari Institut Teknologi Bandung
  • Angga Dwi Firmanto Program Studi Teknik Fisika, Fakultas Teknologi Industri, Institut Teknologi Bandung
  • Fadjar Fathurrahman Program Studi Teknik Fisika, Fakultas Teknologi Industri, Institut Teknologi Bandung

Abstract

Kalibrasi LR merupakan tahapan yang sangat penting saat akan mengaplikasikan sistem rekognisi pengucap otomatis pada bidang forensik. Artikel ini memuat tahapan dan evaluasi terhadap sistem rekognisi pengucap yang dibangun menggunakan basis data suara ucap berbahasa Indonesia. Sistem dikembangkan menggunakan fitur MFCC, pemodelan GMM-UBM, dan normalisasi Z. Sistem dievaluasi kinerjanya berdasarkan gender laki-laki dan perempuan, serta dua skenario, yakni percakapan natural dan wawancara. Evaluasi sistem dilakukan menggunakan indikator performa dalam hal kemampuan diskriminasi dan kalibrasi sistem. Hasil evaluasi dengan berbagai indikator menunjukkan bahwa sistem rekognisi pengucap otomatis yang dibangun telah menunjukkan hasil yang sangat baik. Hal ini ditunjukkan dengan nilai EER terbaik sebesar 4.66%, dan nilai Cmc sebesar 0.04. Dengan begitu, sistem yang dikembangkan telah siap untuk dipakai sebagai alat analisis rekognisi pengucap otomatis untuk aplikasi forensik di Indonesia.  

References

[1]. Kinnunen, T. and Li, H., 2010. An overview of text-independent speaker recognition: From features to supervectors. Speech communication, 52(1), pp.12-40.
[2]. Beigi, H., 2011. Speaker recognition. Springer US.
[3]. Campbell, J. P., Shen, W., Campbell, W. M., Schwartz, R., Bonastre, J. F. and Matrouf, D., 2009. Forensic speaker recognition. Institute of Electrical and Electronics Engineers.
[4]. Neustein, A. and Patil, H. A., 2010. Forensic speaker recognition, Vol. 1., Springer.
[5]. Meuwly, D. and Veldhuis, R., 2012. Forensic biometrics: From two communities to one discipline. 2012 BIOSIG-Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG), IEEE.
[6]. van Leeuwen, D. A. and Brümmer, N., 2013. The distribution of calibrated likelihood-ratios in speaker recognition. arXiv preprint arXiv:1304.1199.
[7]. Mandasari, M. I., McLaren, M. L. and van Leeuwen, D. A., 2011. Evaluation of i-vector speaker recognition systems for forensic application. Interspeech conference, ISCA, Florence, Italy.
[8]. Sarwono, J., Mandasari, M. I., and Suprijanto, 2010. Forensic speaker identification: an experience in Indonesians court, Proceedings of 20th International Congress on Acoustics, Sydney, Australia.
[9]. Stefanus, I., Sarwono, R. J. and Mandasari, M. I., 2017. GMM based automatic speaker verification system development for forensics in Bahasa Indonesia. 2017 5th International Conference on Instrumentation, Control, and Automation (ICA), pp. 56-61, IEEE.
[10]. Firmanto, A. D., Mandasari, M. I., Suprijanto, and Fathurrahman, F., 2019. Applying GMM-UBM framework for Indonesian forensic speaker verification. AIP Conference Proceedings, Vol. 2088, No. 1, p. 050013, AIP Publishing.
[11]. Matějka, P., et al., 2011. Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE.
[12]. Greenberg, C. S., Bansé, D., Doddington, G. R., Garcia-Romero, D., Godfrey, J. J., Kinnunen, T., & Reynolds, D. A., 2014, June. The NIST 2014 speaker recognition i-vector machine learning challenge. In Odyssey: The Speaker and Language Recognition Workshop, pp. 224-230.
[13]. Garcia-Romero, D. and McCree, A., 2014, May. Supervised domain adaptation for i-vector based speaker recognition. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4047-4051, IEEE.
[14]. Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P., 2010. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788-798.
[15]. Matějka, P., Glembek, O., Castaldo, F., Alam, M. J., Plchot, O., Kenny, P., & Černocky, J., 2011. Full-covariance UBM and heavy-tailed PLDA in i-v,ector speaker verification. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4828-4831.
[16]. Kanagasundaram, A., Vogt, R. J., Dean, D. B., & Sridharan, S., 2012. PLDA based speaker recognition on short utterances. Speaker and Language Recognition Workshop (Odyssey 2012), ISCA.
[17]. Vasilakakis, V., Cumani, S., Laface, P., & Torino, P., 2013. Speaker recognition by means of deep belief networks. Proceedings of Biometric Technologies in Forensic Science, pp. 52-57.
[18]. Liu, Y., Qian, Y., Chen, N., Fu, T., Zhang, Y., & Yu, K., 2015. Deep feature for text-dependent speaker verification. Speech Communication, vol. 73, pp. 1-13.
[19]. Mandasari, M. I., 2018. Speaker Recognition System in Forensic Conditions: The Calibration and Evaluation of the Likelihood Ratio. Doctoral dissertation, Radboud University Nijmegen, the Netherlands.
[20]. Drygajlo, A., & Haraksim, R., 2017. Biometric Evidence in Forensic Automatic Speaker Recognition. Handbook of Biometrics for Forensic Science, pp. 221-239, Springer, Cham.
[21]. Mandasari, M. I., Saeidi, R., McLaren, M., and van Leeuwen, D. A., 2013. Quality measure functions for calibration of speaker recognition systems in various duration conditions. IEEE Transactions on Audio, Speech, and Language Processing, 21(11), pp.2425-2438.
[22]. Sarwono, R. S. J., Mandasari, M. I., and Stefanus, I., 2016. Bahasa Speech Database for Automatic Speaker Recognition System Development in Indonesia. Reports of research assisted by the Asahi Glass Foundation, pp.1-10.
[23]. Firmanto, A. D., Stefanus, I., Ikhwanudin, R., Mandasari, M. I., 2017. Desain Perekaman Basis Data Suara Ucap untuk Pengembangan Sistem Rekognisi Pengucap Otomatis Forensik Berbahasa Indonesia, Seminar Instrumentasi dan Kontrol, Yogyakarta.
[24]. Mandasari, M. I., Sudarsono, A. S., Sarwono, R. S. J., and Firmanto, A. D., 2018, July. The effect of recording devices towards MFCC based speech features in a typical forensic scenario found in Indonesia. Proceedings of 25th International Congress on Sound and Vibration (ICSV25), Hiroshima, Japan.
[25]. Tiwari, V., 2010. MFCC and its applications in speaker recognition. International journal on emerging technologies, 1(1), pp. 19-22.
Published
2019-09-24
How to Cite
MANDASARI, Miranti Indar; FIRMANTO, Angga Dwi; FATHURRAHMAN, Fadjar. Kalibrasi Rasio kemungkinan pada Sistem Rekognisi Pengucap Otomatis untuk Aplikasi Forensik di Indonesia. Jurnal Linguistik Komputasional, [S.l.], v. 2, n. 2, p. 39 - 46, sep. 2019. ISSN 2621-9336. Available at: <http://inacl.id/journal/index.php/jlk/article/view/24>. Date accessed: 22 nov. 2019. doi: https://doi.org/10.26418/jlk.v2i2.24.
Section
Articles