Uji Coba Korpus Data Wicara BPPT sebagai Data Latih Sistem Pengenalan Wicara Bahasa Indonesia
Abstract
Kami menyajikan hasil uji coba pengenalan wicara menggunakan Korpus Data Wicara BPPT yang dikembangkan tahun 2013 (KDW-BPPT-2013) dengan menggunakan anggaran DIPA tahun 2013. Korpus ini digunakan sebagai data latih dan data uji. Korpus ini berisi ujaran dari 200 pembicara yang terdiri dari 50 laki-laki dewasa, 50 laki-laki remaja, 50 perempuan dewasa, dan 50 perempuan remaja dengan masing-masing mengucapkan 250 kalimat. Total lama ujaran data wicara ini sekitar 92 jam. Uji coba dilakukan dengan menggunakan Kaldi dan menghasilkan Word Error Rate (WER) GMM 2,52 % dan DNN 1,64%.
References
[2] B. Popovi?, et al. "Deep neural network based continuous speech recognition for Serbian using the Kaldi toolkit." International Conference on Speech and Computer. Springer, Cham, 2015.
[3] H. Geoffrey, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." IEEE Signal processing magazine 29.6 (2012): 82-97.
[4] J. Yamagishi (2010) “CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit”. [Online]. Available: http://homepages.inf.ed.ac.uk/jyamagis/
[5] D. Povey (2015). “LibriSpeech ASR corpus”. [Online]. Available: www.openslr.org/12/
[6] F. Fernandez, et al (2018). “Corpus: TED-LIUM Release 3”. [Online]. Available: https://lium.univ-lemans.fr/en/ted-lium3
[7] UCSB Linguistics (2010). “Resources”. [Online]. Available: http://www.linguistics.ucsb.edu/resources
[8] VoxForge (2009). “VoxForge Download” [Online]. Available: http://www.voxforge.org
[9] N Halabi (2018) "Arabic Speech Corpus" online available http://en.arabicspeechcorpus.com
[10] Speech Resources Consortium (2001), "University of Tsukuba Multilingual Speech Corpus (UT-ML)", online available http://research.nii.ac.jp/src/en/UT-ML.html
[11] Speech Resources Consortium (2015) "Spoken Language" and the DSR Projects Speech Corpus (PASL-DSR), online available: "http://research.nii.ac.jp/src/en/PASL-DSR.html
[12] St?nescu, Miruna, et al. "ASR for low-resourced languages: building a phonetically balanced Romanian speech corpus." Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European. IEEE, 2012.
[13] The Production of Speech Corpora, Florian Schiel, Christoph Draxler, Version 2.5: June 1, 2004 https://www.bas.uni-muenchen.de/forschung/BITS/TP1/Cookbook/Tp1.html
[14] M. Habib, F. Alam, R. Sultana, S.A. Chowdhury, M. Khan, "Phonetically balanced Bangla speech corpus," HLTD 2011, Alexandria, Egypt, 2011.
[15] S. Mandal, B. Das, P. Mitra, A. Bas, "Developing Bengali Speech Corpus for Phone Recognizer Using Optimum Text Selection Technique," IALP 2011, Penang, Malaysia, 2011.
[16] V. Pylypenko, V. Robeiko, M. Sazhok, N.Vasylieva, O. Radoutsky, "Ukrainian Broadcast Speech Corpus Development," SPECOM 2011, Kazan, Russia, 2011.
[17] A.A. Raza, S. Hussain, H. Sarfraz, I. Ullah, Z. Sarfraz, "Design and development of phonetically rich Urdu speech corpus," Oriental COCOSDA 2009, Urumqi, China, 2009.
[18] Suyanto, "Modified Least-to-Most Greedy Algorithm to Search a Minimum Sentence Set" in proc. TENCON, Hong Kong, 2006
[19] Lee, Akinobu, and Tatsuya Kawahara. "Recent development of open-source speech recognition engine julius." Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, International Organizing Committee, 2009.
[20] Peluncuran "Perisalah". 2010. Retrieved July 10, 2016 from http://www.bumn.go.id/inti/berita/37/Peluncuran.
[21] Kaldi (2018). “kaldi-asr/kaldi”. [Online]. Available: https://github.com/kaldi-asr/kaldi/tree/master/egs/wsj/s5/local