Penjernihan Derau pada Suara Kanal Tunggal dengan Pembelajaran Faktorisasi Matriks Non-negatif tanpa Pengawasan

  • Tirtadwipa Manunggal PT. Bahasa Kinerja Utama
  • Oskar Riandi PT. Bahasa Kinerja Utama
  • Ardhi Ma’arik PT. Bahasa Kinerja Utama
  • Lalan Suryantoro PT. Bahasa Kinerja Utama
  • Achmad Satria Putera PT. Bahasa Kinerja Utama
  • Izzul Al-Hakam PT. Bahasa Kinerja Utama

Abstract

This article examines an approach of denoising method on single channel using Non-negative Matrix Factorization (NMF)  on unsupervised-learning scheme. This technique utilizes the property of NMF which unravels spectrogram matrices of noise-interfered speech and noise   itself into their building-block vector. As extension for NMF, Wiener filter is applied in the end of steps. This method is designated to run in low latency system, hence preparing certain noise model for particular condition beforehand is impractical. Thus the noise model is taken automatically from the unvoiced part of noise-interfered speech. The contribution achieved in this research is the kind of NMF learning using linear and non-linear constraint which is done without explicitly providing noise models. Therefore the denoising process could be undergone flexibly in any noise condition.

References

[1] Mohammadiha, Nasser, Paris Smaragdis, and Arne Leijon. "Supervised and unsupervised speech enhancement using nonnegative matrix factorization." IEEE Transactions on Audio, Speech, and Language Processing 21.10 (2013): 2140-2151.
[2] Boll, Steven. "Suppression of acoustic noise in speech using spectral subtraction." IEEE Transactions on acoustics, speech, and signal processing 27.2 (1979): 113-120.
[3] Ephraim, Yariv, and David Malah. "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator." IEEE Transactions on Acoustics, Speech, and Signal Processing 32.6 (1984): 1109-1121.
[4] Y. Ephraim, “A Bayesian estimation approach for speech enhancement using hidden Markov models,” IEEE Trans. Signal Process., vol. 40, no. 4, pp. 725–735, Apr. 1992.
[5] H. Sameti, H. Sheikhzadeh, L. Deng, and R. L. Brennan, “HMM- based strategies for enhancement of speech signals embedded in nonstationary noise,” IEEE Trans. Speech Audio Process., vol. 6, no. 5, pp. 445–455, Sep. 1998.
[6] D. Y. Zhao and W. B. Kleijn, “HMM-based gain modeling for enhancement of speech in noise,” IEEE Trans. Audio, Speech, and Language Process., vol. 15, no. 3, pp. 882–892, Mar. 2007.
[7] N. Mohammadiha, R. Martin, and A. Leijon, “Spectral domain speech enhancement using HMM state-dependent super-Gaussian priors,” IEEE Signal Process. Letters, vol. 20, no. 3, pp. 253–256, Mar. 2013.
[8] H. Veisi and H. Sameti, “Speech enhancement using hidden Markov models in Mel-frequency domain,” Speech Communication, vol. 55, no. 2, pp. 205–220, Feb. 2013.
[9] S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook driven short-term predictor parameter estimation for speech enhancement,” IEEE Trans. Audio, Speech, and Language Process., vol. 14, no. 1, pp. 163–176, Jan. 2006.
[10] T. V. Sreenivas and P. Kirnapure, “Codebook constrained Wiener filtering for speech enhancement,” IEEE Trans. Speech Audio Process., vol. 4, no. 5, pp. 383–389, Sep. 1996.
[11] Bracewell, Ronald Newbold, and Ronald N. Bracewell. The Fourier transform and its applications. Vol. 31999. New York: McGraw-Hill, 1986.
[12] Harris, Fredric J. "On the use of windows for harmonic analysis with the discrete Fourier transform." Proceedings of the IEEE 66.1 (1978): 51-83.
[13] Welch, Peter. "The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms." IEEE Transactions on audio and electroacoustics 15.2 (1967): 70-73.
[14] Gold, Ben, Nelson Morgan, and Dan Ellis. Speech and audio signal processing: processing and perception of speech and music. John Wiley & Sons, 2011.
[15] McAulay, Robert, and Thomas Quatieri. "Speech analysis/synthesis based on a sinusoidal representation." IEEE Transactions on Acoustics, Speech, and Signal Processing 34.4 (1986): 744-754.
[16] Févotte, Cédric, and Jérôme Idier. "Algorithms for nonnegative matrix factorization with the β-divergence." Neural computation 23.9 (2011): 2421-2456.
[17] Schmidt, Mikkel N., Jan Larsen, and Fu-Tien Hsiao. "Wind noise reduction using non-negative sparse coding." Machine Learning for Signal Processing, 2007 IEEE Workshop on. IEEE, 2007.
[18] Cauchi, Benjamin, Stefan Goetze, and Simon Doclo. "Reduction of non-stationary noise for a robotic living assistant using sparse non-negative matrix factorization." Proceedings of the 1st Workshop on Speech and Multimodal Interaction in Assistive Environments. Association for Computational Linguistics, 2012.
[19] Lyubimov, Nikolay, and Mikhail Kotov. "Non-negative matrix factorization with linear constraints for single-channel speech enhancement." arXiv preprint arXiv:1309.6047 (2013).
[20] Loizou, Philipos C. Speech enhancement: theory and practice. CRC press, 2013.
[21] Recht, Ben, et al. "Factoring nonnegative matrices with linear programs." Advances in Neural Information Processing Systems. 2012.
[22] Sri Sediyaningsih (Oktober 2017). UT101 Public Speaking. Diambil dari www.indonesiax.co.id
Published
2018-03-05
How to Cite
MANUNGGAL, Tirtadwipa et al. Penjernihan Derau pada Suara Kanal Tunggal dengan Pembelajaran Faktorisasi Matriks Non-negatif tanpa Pengawasan. Jurnal Linguistik Komputasional, [S.l.], v. 1, n. 1, p. 1 - 10, mar. 2018. ISSN 2621-9336. Available at: <http://inacl.id/journal/index.php/jlk/article/view/2>. Date accessed: 20 jan. 2020. doi: https://doi.org/10.26418/jlk.v1i1.2.
Section
Articles