Twitter dataset on public sentiments towards biodiversity policy in Indonesia

Uliniansyah, Mohammad Teduh and Budi, Indra and Nurfadhilah, Elvira and Afra, Dian Isnaeni Nurul and Santosa, Agung and Latief, Andi Djalal and Jarin, Asril and Gunarso, Gunarso and Jiwanggi, Meganingrum Arista and Hidayati, Nuraisa Novia and Fajri, Radhiyatul and Suryono, Ryan Randy and Pebiana, Siska and Shaleha, Siti and Ramdhani, Tosan Wiar and Sampurno, Tri (2024) Twitter dataset on public sentiments towards biodiversity policy in Indonesia. Data in Brief, 52. p. 109890. ISSN 23523409

Full text not available from this repository. (Request a copy)

Abstract

In recent years, biodiversity has emerged as a prominent and pressing topic due to the urgent need to address biodiversity loss and the recognition of its connections to climate change and sustainable development. Additionally, increased public awareness and the consideration of economic factors have further underscored the significance of biodiversity conservation. To investigate the sentiment of the Indonesian people towards biodiversity, we conducted a comprehensive data collection on Twitter, focusing on keywords we have set. We amassed a substantial dataset of 500,000 Indonesian tweets from January 2020 to March 2023. These tweets encompassed a wide range of discussions on biodiversity, including its subdomains such as food security, health, and environmental management. Three annotators labeled each tweet with a sentiment class (positive, negative, neutral), or label none for unrelated tweet. The final label was determined using the majority voting method. The tweets with the final label none and those with undecided sentiment class were considered invalid and excluded in the subsequent process. Before labeling, a team of 18 experts jointly developed a labeling guide. This document served as a reference in labeling. After going through a series of processes, including cleaning (removing duplications, irrelevant tweets, and tweets written other than in Indonesian) and preprocessing, we prepared a dataset containing 13,435 tweets. We measured the inter-annotator agreement level, made several models using different algorithms and the K-Fold cross-validation method, and evaluated the models. The Fleiss' Kappa value of the dataset was 0.62187 as the value of the inter-annotator agreement level, and the F1-score value with the best model using the pre-trained IndoBERT model was 0.7959. The Fleiss' Kappa and F1-score values suggest that the annotators have a substantial comprehension and agreement of how to label a tweet, thus ensuring consistency and reliability of our dataset, and the reusability of our dataset is quite suitable for further research on sentiment analysis on biodiversity, respectively. This dataset will benefit various research, including topic modeling, sentiment analysis, public opinion analysis on Twitter, etc., especially biodiversity-related policies.

Item Type: Article
Uncontrolled Keywords: Sentiment analysis, Natural language processing, Indonesian, Health, Environmental management, Food security
Subjects: Environmental Pollution & Control
Social and Political Sciences
Depositing User: Rizzal Rosiyan
Date Deposited: 10 Dec 2025 22:19
Last Modified: 10 Dec 2025 22:19
URI: https://karya.brin.go.id/id/eprint/55946

Actions (login required)

View Item
View Item