The TweetSentBR Dataset is a valuable resource for sentiment analysis in Brazilian Portuguese. Let me provide you with some details about it:
Description:
- The dataset consists of 15,000 manually annotated sentences extracted from tweets in Brazilian Portuguese.
- These sentences are specifically related to the TV show domain.
- Each sentence has been labeled into one of three classes: positive, neutral, or negative sentiment.
- The annotation process followed literature guidelines to ensure reliability.
Purpose:
- Researchers and practitioners in the field of Natural Language Processing (NLP) use this dataset for sentiment analysis tasks.
- It serves as a benchmark for developing and evaluating novel methods and approaches for sentiment classification.
Performance:
- Baseline experiments on polarity classification using three machine learning methods achieved the following results:
Source: Conversation with Bing, 3/16/2024
(1) Building a Sentiment Corpus of Tweets in Brazilian Portuguese. https://arxiv.org/abs/1712.08917.
(2) 7 Best Portuguese Language Speech Datasets of 2022 | Twine. https://www.twine.net/blog/portuguese-language-speech-datasets/.
(3) A survey and study impact of tweet sentiment analysis via ... - Springer. https://link.springer.com/article/10.1007/s10579-023-09687-8.
(4) Top 25 Twitter Datasets for NLP and Machine Learning | iMerit. https://imerit.net/blog/top-25-twitter-datasets-for-natural-language-processing-and-machine-learning-all-pbm/.
(5) Building a Sentiment Corpus of Tweets in Brazilian Portuguese - arXiv.org. https://arxiv.org/pdf/1712.08917v1.pdf.
(6) undefined. https://doi.org/10.48550/arXiv.1712.08917.
Variants: tweetSentBR
This dataset is used in 1 benchmark:
No recent benchmark submissions available for this dataset.
No papers with results on this dataset found.