FinSen

Name: FinSen
License: MIT

Dataset Information

License

MIT

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Enhancing Financial Market Predictions: Causality-Driven Feature Selection

This paper introduces FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset’s extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability.

Our FinSen Dataset

This repository contains the dataset for Enhancing Financial Market Predictions:
Causality-Driven Feature Selection, which has been accepted in ADMA 2024.

If the dataset or the paper has been useful in your research, please add a citation to our work:

@article{liang2024enhancing,
  title={Enhancing Financial Market Predictions: Causality-Driven Feature Selection},
  author={Liang, Wenhao and Li, Zhengyang and Chen, Weitong},
  journal={arXiv e-prints},
  pages={arXiv--2408},
  year={2024}
}

Datasets

[FinSen] can be downloaded manually from the repository as csv file. Sentiment and its score are generated by FinBert model from the Hugging Face Transformers library under the identifier "ProsusAI/finbert". (Araci, Dogu. "Finbert: Financial sentiment analysis with pre-trained language models." arXiv preprint arXiv:1908.10063 (2019).)

We only provide US for research purpose usage, please contact [email protected] for other countries (total 197 included) if necessary.

We also provide other NLP datasets for text classification tasks here, please cite them correspondingly once you used them in your research if any.

20Newsgroups. Joachims, T., et al.: A probabilistic analysis of the rocchio algorithm with tfidf for
text categorization. In: ICML. vol. 97, pp. 143–151. Citeseer (1997)
AG News. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text
classification. Advances in neural information processing systems 28 (2015)
Financial PhraseBank. Malo, P., Sinha, A., Korhonen, P., Wallenius, J., Takala, P.: Good debt or bad debt:
Detecting semantic orientations in economic texts. Journal of the Association for
Information Science and Technology 65(4), 782–796 (2014)

Dataloader for FinSen

We provide the preprocessing file finsen.py for our FinSen dataset under dataloaders directory for more convienient usage.

Models - Text Classification

DAN-3.
Gobal Pooling CNN.

Models - Regression Prediction

LSTM

Using Sentiment Score from FinSen Predict Result on S&P500

Dependencies

The code is based on PyTorch under code frame of https://github.com/torrvision/focal_calibration, please cite their work if you found it is useful.

:smiley: ☺ Happy Research !

Associated Benchmarks

This dataset is used in 1 benchmark:

Time Series Regression - Metrics: Mean MSE

Recent Benchmark Submissions

Task	Model	Paper	Date
Time Series Regression	LSTM	Enhancing Financial Market Predictions: Causality-Driven …	2024-08-02

Research Papers

Recent papers with results on this dataset:

Enhancing Financial Market Predictions: Causality-Driven Feature Selection (2024) -

External Links:

Papers with Code Entry