Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia

  • Maryamah Maryamah Institut Teknologi Sepuluh Nopember
  • Made Agus Putra Subali Department of Informatics, Institut Teknologi Sepuluh Nopember
  • Lailly Qolby Department of Informatics, Institut Teknologi Sepuluh Nopember
  • Agus Zainal Arifin Department of Informatics, Institut Teknologi Sepuluh Nopember
  • Ali Fauzi Department of Informatics, Institut Teknologi Sepuluh Nopember

Abstract

Clustering of news documents manually depends on the ability and accuracy of the human so that it can lead to errors in the grouping process of documents. Therefore, it is necessary to group the news document automatically. In this clustering, we need a weighting method that includes TF.IDF.ICF. In this paper we propose a new weighting algorithm is TF.IDF.ICF.ITF to automatically clustering documents automatically through statistical data patterns so that errors in manual grouping of documents can be reduced and more efficient. K-Means ++ is an algorithm for classification and is the development of the K-Means algorithm in the initial cluster initialization stage which is easy to implement and has more stable results. K-Means ++ classifies documents at the weighting stages of Inverse Class Frequency (ICF). ICF is developed from the use of class-based weighting for the term weighting term in the document. The terms that often appear in many classes will have a small but informative value. The proposed weighting is calculated. Testing is done by using a certain query on some number of best features, the results obtained by TF.IDF.ICF.ITF method gives less optimal results.

References

[1] A. Z. Arifin and A. N. Novan, “Klasifikasi Dokumen Berita Kejadian Berbahasa Indonesia dengan Algoritma Single Pass Clustering,” Pros. Semin. Intell. Technol. its Appl. (SITIA), Tek. Elektro, Inst. Teknol. Sepuluh Nop. Surabaya, 2002.
[2] J. W. Reed, J. Yu, T. E. Potok, B. A. Klump, M. T. Elmore, and A. R. Hurson, “TF-ICF: A new term weighting scheme for clustering dynamic data streams,” Proc. - 5th Int. Conf. Mach. Learn. Appl. ICMLA 2006, no. May 2014, pp. 258 - 263, 2006.
[3] M. A. Fauzi et al., “Term Weighting Berbasis Indeks Buku Dan Kelas Untuk Perangkingan Dokumen Berbahasa Arab,” Lontar Komput., vol. 5, no. 2, pp. 110 - 117, 2015.
[4] Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of. Reading: Addison-Wesley
[5] Ren, Fuji, and Mohammad Golam Sohrab. 2013. "Class-indexing-based term weighting for automatic text classification." Information Sciences 109-125
[6] D. Arthur and S. Vassilvitskii, “K-Means++: The Advantages of Careful Seeding,” Proceedings of the 18th Annu. ACM - SIAM Symposium on Discrete Algorithms, pp. 1027 - 1035, 2007.
[7] Y. Xu, W. Qu, Z. Li, G. Min, K. Li and Z. Liu, “Efficient K-Means++ Approximation with MapReduce,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no.12, pp. 3135 - 3144, 2014.
[8] Shathi S.P, Hossain Md.Delowar, Nadim MD, Riayadh Sayed G.R, Sultana Tangina, “Enhancing Performance of Naïve Bayes in Text Classification by Introduction an Extra Weight using less Number of Training Examples”, no. December, pp. 12–13, 2016.
[9] Kurniawati and A. Syauqi, “Term weighting based class indexes using space density for Al-Qur’an relevant meaning ranking,” 2016 Int. Conf. Adv. Comput. Sci. Inf. Syst. ICACSIS 2016, pp. 460–463, 2017.
[10] Chen. C.H., “Improved TF.IDF in Big News Retrieval: An Empirical Study,” Pattern Recognition Letters, vol. 93, pp. 113 - 122, 2017.
Published
2018-03-05
How to Cite
MARYAMAH, Maryamah et al. Metode Pembobotan Berbasis Topik dan Kelas untuk Berita Online Berbahasa Indonesia. Jurnal Linguistik Komputasional, [S.l.], v. 1, n. 1, p. 11 - 16, mar. 2018. ISSN 2621-9336. Available at: <http://inacl.id/journal/index.php/jlk/article/view/4>. Date accessed: 20 jan. 2020. doi: https://doi.org/10.26418/jlk.v1i1.4.
Section
Articles