FinSen

Dataset Information
License
MIT

Overview

Enhancing Financial Market Predictions: Causality-Driven Feature Selection

This paper introduces FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset’s extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability.

Our FinSen Dataset

arXiv
Pytorch 1.5
License: MIT

This repository contains the dataset for Enhancing Financial Market Predictions:
Causality-Driven Feature Selection
, which has been accepted in ADMA 2024.

If the dataset or the paper has been useful in your research, please add a citation to our work:

@article{liang2024enhancing,
  title={Enhancing Financial Market Predictions: Causality-Driven Feature Selection},
  author={Liang, Wenhao and Li, Zhengyang and Chen, Weitong},
  journal={arXiv e-prints},
  pages={arXiv--2408},
  year={2024}
}

Datasets

[FinSen] can be downloaded manually from the repository as csv file. Sentiment and its score are generated by FinBert model from the Hugging Face Transformers library under the identifier "ProsusAI/finbert". (Araci, Dogu. "Finbert: Financial sentiment analysis with pre-trained language models." arXiv preprint arXiv:1908.10063 (2019).)

We only provide US for research purpose usage, please contact [email protected] for other countries (total 197 included) if necessary.

We also provide other NLP datasets for text classification tasks here, please cite them correspondingly once you used them in your research if any.

  1. 20Newsgroups. Joachims, T., et al.: A probabilistic analysis of the rocchio algorithm with tfidf for
    text categorization. In: ICML. vol. 97, pp. 143–151. Citeseer (1997)
  2. AG News. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text
    classification. Advances in neural information processing systems 28 (2015)
  3. Financial PhraseBank. Malo, P., Sinha, A., Korhonen, P., Wallenius, J., Takala, P.: Good debt or bad debt:
    Detecting semantic orientations in economic texts. Journal of the Association for
    Information Science and Technology 65(4), 782–796 (2014)

Dataloader for FinSen

We provide the preprocessing file finsen.py for our FinSen dataset under dataloaders directory for more convienient usage.

Models - Text Classification

  1. DAN-3.

  2. Gobal Pooling CNN.

Models - Regression Prediction

  1. LSTM

Using Sentiment Score from FinSen Predict Result on S&P500

Dependencies

The code is based on PyTorch under code frame of https://github.com/torrvision/focal_calibration, please cite their work if you found it is useful.

:smiley: ☺ Happy Research !

Associated Benchmarks

This dataset is used in 1 benchmark:

Recent Benchmark Submissions

Task Model Paper Date
Time Series Regression LSTM Enhancing Financial Market Predictions: Causality-Driven … 2024-08-02

Research Papers

Recent papers with results on this dataset: