Abdul, Fadlil and Sunardi, Sunardi and Rezki, Ramdhani (2022) Similarity identification based on word trigrams using exact string matching algorithms. Intensif : Jurnal Ilmiah Penelitian Teknologi dan Penerapan Sistem Informasi, 6 (2): 8. pp. 253-270. ISSN 2549-6824 (Online) 2580-409X (Print)
Jurnal_Abdul Fadlil,_Universitas Ahmad Dahlan_2022-8.pdf
Download (587kB) | Preview
Abstract
—Several studies regarding excellent exact string matching algorithms can be used to identify similarity, including the Rabin-Karp, Winnowing, and Horspool Boyer-Moore algorithms. In determining similarities, the Rabin-Karp and Winnowing algorithms use fingerprints, while the Horspool Boyer-Moore algorithm uses a bad-character table. However, previous research focused on identifying similarities using these algorithms based on character n-gram. In contrast, identification based on the word n-gram to determine the similarity based on its linguistic meaning, especially for longer strings, had not been covered yet. Therefore, a word-level trigram was proposed to identify similarities based on the word trigrams using the three algorithms and compare each performance. Based on precision, recall, and running time comparison, the Rabin-Karp algorithm results were 100%, 100%, and 0.19 ms, respectively; the Winnowing algorithm results with the smallest window were 100%, 56%, and 0.18 ms, respectively; and the Horspool algorithm results were 100%, 100%, and 0.06 ms. From these results, it can be concluded that the performance of the Horspool Boyer-Moore algorithm is better in terms of precision, recall, and running time.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | String-matching, Algorithm, Performance, N-Gram, Similarity |
Subjects: | Computers, Control & Information Theory > Applications Software |
Depositing User: | - Dina - |
Date Deposited: | 11 Jul 2023 07:19 |
Last Modified: | 11 Jul 2023 07:19 |
URI: | https://karya.brin.go.id/id/eprint/19147 |