Rahmawati, Yunianita and Gunawan, Gunawan (2021) SEGMENTASI DOKUMEN BAHASA INDONESIA MENGGUNAKAN TEXT TILING. JIKA (Jurnal Informatika) Universitas Muhammadiyah Tangerang, 5 (3). pp. 368-378. ISSN e-issn : 2722-2713 | p-issn : 2549-0710
Text
SEGMENTASI DOKUMEN BAHASA INDONESIA MENGGUNAKAN TEXT.pdf Download (752kB) |
Abstract
Text tiling aims to split long documents into multiple related paragraphs. In this study, the documents are used as data by omitting the reading format as inputs in the segmentation. Text tiling method has three stages, namely tokenisation, determination of similarity, and the introduction of limits. In this study, the results of the segmentation algorithm using tiling text has not yet reached the objective. This is because the segmentation of the document is strongly influenced by a common word file, the determined number of tokens in a token-sequence, and the determination of the number token-sequence within a block.Tthe writing of a word and text tiling algorithm is very sensitive to the reading format, such as titles and subtitles, so that the reading format must be removed to have the body of the text only. Segmentation results increased after the trials. From the experiment of the 15 reading segmentation results show that an accuracy of precision is 59,3% and of recall is 80%. These trials used 4140 common words. The total coefficient score for similarity is 5, the number of tokens in a token-sequence is 20, and the number of tokensequence within a block is 3.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | text tiling, segmentation, multiparagraph segmentation |
Subjects: | A General Works > AI Indexes (General) Q Science > Q Science (General) |
Divisions: | Faculty of Engineering > School of Computer Engineering |
Depositing User: | yuni anita |
Date Deposited: | 03 Dec 2021 08:14 |
Last Modified: | 03 Dec 2021 08:14 |
URI: | http://eprints.umsida.ac.id/id/eprint/9017 |
Actions (login required)
View Item |