Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen

  • Fatra Nonggala Putra Institut Teknologi Sepuluh Nopember
  • Ari Effendi Jurusan Teknik Informatika Institut Teknologi Sepuluh Nopember
  • Agus Zainal Arifin Jurusan Teknik Informatika Institut Teknologi Sepuluh Nopember

Abstract

AbstractMulti-document  summarization  is  a  technique  for getting  information.  The  information  consists  of  several  lines of  sentences  that  aim  to  describe  the contents  of  the  entire document  relevantly.  Several  algorithms  with  various  criteria have  been  carried  out.  In  general,  these  criteria  are  the preprocessing, cluster, and representative sentence selection to produce summaries   that   have   high   relevance.   In   some conditions,  the  cluster  stage  is  one  of  the  important  stages  to produce  summarization.  Existing  research  cannot  determine the  number  of  clusters  to  be  formed.  Therefore,  we  propose clustering  techniques  using  cluster  hierarchy.  This  technique measures the   similarity   between   sentences   using   cosine similarity. These   sentences   are   clustered   based   on   their similarity   values.   Clusters   that   have   the   highest   level   of similarity  with  other  clusters  will  be  merged  into  one  cluster. This  merger  process  will  continue  until  one  cluster  remains. Experimental  results  on  the  2004  Document  Understanding Document  (DUC)  dataset  and  using  two  scenarios  that  use  132, 135,  137 and  140  clusters  resulting  in  fluctuating  values. The  smaller  the  number  of  clusters  does  not  guara ntee  an increase  in  the  value  of  ROUGE-1.  The  method  proposed using  the  same  number  of  clusters  has  a  lower  ROUGE-1 value than the previous method. This is because in cluster 140 the similarity values in each cluster experienced a decrease in similarity values.


 


Keywordscluster,     cosine    similarity,     multi-document, summarization

References

[1] R. M. Aliguliyev, “A new sentence similarity measure and sentence based extractive technique for automatic text summarization,” Expert Syst. Appl., vol. 36, no. 4, pp. 7764–7772, 2009.
[2] R. Rautray and R. C. Balabantaray, “An evolutionary framework for multi document summarization using Cuckoo search approach: MDSCSA,” Appl. Comput. Informatics, vol. 14, no. 2, pp. 134–144, 2018.
[3] R. Rautray and R. C. Balabantaray, “Cat swarm optimization based evolutionary framework for multi document summarization,” Phys. A Stat. Mech. its Appl., vol. 477, pp. 174–186, 2017.
[4] A. Wahib, Arifin Z.A, and D. Purwitasari, “Peringkasan Dokumen Berbahasa Inggris Menggunakan Sebaran Local Sentence,” J. Buana Inform., vol. 7, pp. 33–42, 2016.
[5] A. Z. Arifin and A. Asano, “Image segmentation by histogram thresholding using hierarchical cluster analysis,” Pattern Recognit. Lett., vol. 27, no. 13, pp. 1515–1521, 2006.
[6] H. P. Luhn, “The Automatic Creation of Literature Abstracts,” IBM J. Res. Dev., vol. 2, no. 2, pp. 159–165, Apr. 1958.
[7] H. P. Edmundson, “New Methods in Automatic Extracting,” J. ACM, vol. 16, pp. 264–285, 1969.
[8] P. B. Baxendale, “Machine-Made Index for Technical Literature—An Experiment,” IBM J. Res. Dev., vol. 2, no. 4, pp. 354–361, 1958.
[9] E. Liddy, “Advances in Automatic Text Summarization,” Inf. Retr. Boston., vol. 4, no. 1, pp. 82–83, Apr. 2001.
[10] C.-Y. Lin, “Training a Selection Function for Extraction,” in Proceedings of the Eighth International Conference on Information and Knowledge Management, 1999, pp. 55–62.
[11] D. Das and A. F. T. Martins, “A Survey on Automatic Text Summarization,” Eighth ACIS Int. Conf. Softw. Eng. Artif. Intell. Netw. ParallelDistributed Comput. SNPD 2007, vol. 4, pp. 574–578, 2007.
[12] K. Mckeown and D. R.Radev, “Generating Summaries of Multiple News Articles,” Proc. 18th Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., vol. 3, pp. 74–82, 1995.
[13] J. Carbonell and J. Goldstein, “The use of MMR, diversity-based reranking for reordering documents and producing summaries,” Proc. 21st Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. - SIGIR ’98, pp. 335–336, 1998.
[14] D. R. Radev, H. Jing, and M. Budzikowska, “Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies,” Inf. Process. Manag. 40.6 919-938., vol. 40, no. 6, p. 10, 2000.
[15] T. Xia and Y. Chai, “An improvement to TF-IDF: Term distribution based term weight algorithm,” J. Softw., vol. 6, no. 3, pp. 413–420, 2011.
Published
2018-03-12
How to Cite
PUTRA, Fatra Nonggala; EFFENDI, Ari; ARIFIN, Agus Zainal. Pembobotan Kata berdasarkan Kluster untuk Peringkasan Otomatis Multi Dokumen. Jurnal Linguistik Komputasional, [S.l.], v. 1, n. 1, p. 17-22, mar. 2018. ISSN 2621-9336. Available at: <http://inacl.id/journal/index.php/jlk/article/view/5>. Date accessed: 20 apr. 2024. doi: https://doi.org/10.26418/jlk.v1i1.5.
Section
Articles