SEGMENTASI DOKUMEN BAHASA INDONESIA MENGGUNAKAN TEXT TILING

Rahmawati, Yunianita and Gunawan, Gunawan (2021) SEGMENTASI DOKUMEN BAHASA INDONESIA MENGGUNAKAN TEXT TILING. JIKA (Jurnal Informatika) Universitas Muhammadiyah Tangerang, 5 (3). pp. 368-378. ISSN e-issn : 2722-2713 | p-issn : 2549-0710

[img] Text
SEGMENTASI DOKUMEN BAHASA INDONESIA MENGGUNAKAN TEXT.pdf

Download (752kB)

Abstract

Text tiling aims to split long documents into multiple related paragraphs. In this study, the documents are used as data by omitting the reading format as inputs in the segmentation. Text tiling method has three stages, namely tokenisation, determination of similarity, and the introduction of limits. In this study, the results of the segmentation algorithm using tiling text has not yet reached the objective. This is because the segmentation of the document is strongly influenced by a common word file, the determined number of tokens in a token-sequence, and the determination of the number token-sequence within a block.Tthe writing of a word and text tiling algorithm is very sensitive to the reading format, such as titles and subtitles, so that the reading format must be removed to have the body of the text only. Segmentation results increased after the trials. From the experiment of the 15 reading segmentation results show that an accuracy of precision is 59,3% and of recall is 80%. These trials used 4140 common words. The total coefficient score for similarity is 5, the number of tokens in a token-sequence is 20, and the number of tokensequence within a block is 3.

Item Type: Article
Uncontrolled Keywords: text tiling, segmentation, multiparagraph segmentation
Subjects: A General Works > AI Indexes (General)
Q Science > Q Science (General)
Divisions: Faculty of Engineering > School of Computer Engineering
Depositing User: yuni anita
Date Deposited: 03 Dec 2021 08:14
Last Modified: 03 Dec 2021 08:14
URI: http://eprints.umsida.ac.id/id/eprint/9017

Actions (login required)

View Item View Item