tweetSentBR

Dataset Information
Introduced
2017
License
Unknown

Overview

The TweetSentBR Dataset is a valuable resource for sentiment analysis in Brazilian Portuguese. Let me provide you with some details about it:

  1. Description:
    - The dataset consists of 15,000 manually annotated sentences extracted from tweets in Brazilian Portuguese.
    - These sentences are specifically related to the TV show domain.
    - Each sentence has been labeled into one of three classes: positive, neutral, or negative sentiment.
    - The annotation process followed literature guidelines to ensure reliability.

  2. Purpose:
    - Researchers and practitioners in the field of Natural Language Processing (NLP) use this dataset for sentiment analysis tasks.
    - It serves as a benchmark for developing and evaluating novel methods and approaches for sentiment classification.

  3. Performance:
    - Baseline experiments on polarity classification using three machine learning methods achieved the following results:

    • Binary classification (positive vs. negative): 80.99% F-Measure and 82.06% accuracy.
    • Three-point classification (positive, neutral, negative): 59.85% F-Measure and 64.62% accuracy.

Source: Conversation with Bing, 3/16/2024
(1) Building a Sentiment Corpus of Tweets in Brazilian Portuguese. https://arxiv.org/abs/1712.08917.
(2) 7 Best Portuguese Language Speech Datasets of 2022 | Twine. https://www.twine.net/blog/portuguese-language-speech-datasets/.
(3) A survey and study impact of tweet sentiment analysis via ... - Springer. https://link.springer.com/article/10.1007/s10579-023-09687-8.
(4) Top 25 Twitter Datasets for NLP and Machine Learning | iMerit. https://imerit.net/blog/top-25-twitter-datasets-for-natural-language-processing-and-machine-learning-all-pbm/.
(5) Building a Sentiment Corpus of Tweets in Brazilian Portuguese - arXiv.org. https://arxiv.org/pdf/1712.08917v1.pdf.
(6) undefined. https://doi.org/10.48550/arXiv.1712.08917.

Variants: tweetSentBR

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

No recent benchmark submissions available for this dataset.

Research Papers

No papers with results on this dataset found.