Problem and opportunity
Environmental, social and governance (ESG) data is crucial to financial markets as investors take a more sustainable approach to achieving their investment goals.
Investors can review companies’ ESG disclosures, but they also typically want to access information that is not reported, and may indicate ESG controversies.
These controversies can be a toxic waste spill (environmental), human rights violations (social), or corrupt CEOs (governance).
Unlike traditional financial information, ESG data is often unstructured and sourced from companies’ self-reported data and news articles. Right now, analyzing this data is a challenging and manual task, even though ESG controversies could have a significantly negative impact on investment performance
Tim Nugent, Senior Research Scientist at Refinitiv Labs tells us how his team is using state-of- the-art NLP to automate the hunt for ESG controversy data, and deliver investment-ready ESG data for our customers.
Detecting ESG controversies in unstructured data
Refinitiv Lab’s ESG Controversy Prediction uses a combination of supervised machine learning and natural language processing (NLP) to train an algorithm.
The algorithm automatically classifies whether articles contain reference to 20 ESG controversy topics defined in-house, and - where they do - provides a probability score for each of the topics.
Where the probability sits above a confidence threshold, it proceeds directly through the ESG pipeline, while low confidence predictions are sent to human analysts for further review.
With a clear view of controversy predictions, investors can make informed, sustainable and proactive investment decisions quickly and efficiently.
ESG Controversy Prediction in action
The supervised machine learning element of the research requires high-quality data.
This prototype includes:
- 31,600 news articles already annotated with the 20 ESG controversy topics by Refinitiv analysts
- The NLP element is Google’s open-source NLP model, BERT (Bidirectional Encoder Representation from Transformers), a neural language model that generates language representations
- BERT is pretrained on 3.3 billion words from a general domain corpus such as Wikipedia and the open BookCorpus dataset
- Refinitiv Labs further trained the model to make it domain-specific to finance and business, by using content from Reuters News Archive, which adds 715 million words from about two million articles covering business and financial news
- The team then fine-tuned the model with the annotated news articles to classify ESG controversy topics
These steps improve the performance of the prototype beyond that of the basic BERT model.
ESG Controversy Prediction is a practical solution that is being iteratively developed by Refinitiv Labs to solve the real problem of gaining insight into ESG controversy data. By automating the detection and classification of the data, we can give our customers real advantage when making investment decisions.
A collaborative approach
Refinitiv Labs takes a collaborative, customer-focused approach to building solutions to real problems in financial markets by combining customer feedback, extensive data capability and exceptional partner technologies.
Collaborating with our customers:
Working towards a shared goal of accessible ESG controversy predictions
Incorporating customer feedback into every stage of the development process
Joining forces to review and direct development sprints and releases
We work alongside:
- Google’s open-source NLP model BERT
What Refinitiv Labs is thinking next...
Refinitiv Labs plans to extend its use of machine learning and NLP by.
- Adding an API to give customers access to the domain-specific BERT model trained with Reuters News Archive data
- Fine-tuning the domain-specific model to tackle different tasks such as predicting financial sentiment
- Training the BERT model to transform earnings call transcripts into text