This paper introduces FinSen dataset that revolutionizes financial market analysis by integrating economic and financial news articles from 197 countries with stock market data. The dataset’s extensive coverage spans 15 years from 2007 to 2023 with temporal information, offering a rich, global perspective 160,000 records on financial market news. Our study leverages causally validated sentiment scores and LSTM models to enhance market forecast accuracy and reliability.
This repository contains the dataset for Enhancing Financial Market Predictions:
Causality-Driven Feature Selection, which has been accepted in ADMA 2024.
If the dataset or the paper has been useful in your research, please add a citation to our work:
@article{liang2024enhancing,
title={Enhancing Financial Market Predictions: Causality-Driven Feature Selection},
author={Liang, Wenhao and Li, Zhengyang and Chen, Weitong},
journal={arXiv e-prints},
pages={arXiv--2408},
year={2024}
}
[FinSen] can be downloaded manually from the repository as csv file. Sentiment and its score are generated by FinBert model from the Hugging Face Transformers library under the identifier "ProsusAI/finbert". (Araci, Dogu. "Finbert: Financial sentiment analysis with pre-trained language models." arXiv preprint arXiv:1908.10063 (2019).)
We only provide US for research purpose usage, please contact [email protected] for other countries (total 197 included) if necessary.
We also provide other NLP datasets for text classification tasks here, please cite them correspondingly once you used them in your research if any.
We provide the preprocessing file finsen.py for our FinSen dataset under dataloaders directory for more convienient usage.
DAN-3.
Gobal Pooling CNN.
The code is based on PyTorch under code frame of https://github.com/torrvision/focal_calibration, please cite their work if you found it is useful.
:smiley: ☺ Happy Research !
This dataset is used in 1 benchmark:
Task | Model | Paper | Date |
---|---|---|---|
Time Series Regression | LSTM | Enhancing Financial Market Predictions: Causality-Driven … | 2024-08-02 |
Recent papers with results on this dataset: