The phenomenon of ‘citizen coders’ has been growing steadily, with many major banks teaching their investment bankers and traders how to code in the popular R and Python programming languages in order to better serve their clients, driving costs down and efficiency up.
For more data-driven insights in your Inbox, subscribe to the Refinitiv Perspectives weekly newsletter.
“Coding is no longer just for tech people, it’s for anyone who wants to run a competitive company in the 21st century.” – Mary Callahan Erdoes, Head of JPMorgan Asset Management.
Refinitiv is at the forefront of this tech evolution with our open platform capabilities that encourage financial professionals to build their own apps or leverage our suite of Eikon Data APIs for data integration and interoperability of apps across the individual user workflow on their desktop.
Refinitiv Developer Community Portal is your ultimate toolkit.
In our Developer Community Portal, you will find the tools, documentation, sample code, learning materials and community Q&A forums to help you work effectively and get the results you need from our APIs, SDKs, tools, data and capabilities.
It is your ultimate toolkit to get the most out of our huge range of financial market data, gathered from a rich network of data provider partners, delivered through our feeds and connected to your workflow.
Below, we will demonstrate how you can conduct a simple sentiment analysis of news delivered via our Eikon Data API. To do this really well is a non-trivial task, and most universities and financial companies will have departments and teams looking at this. We ourselves provide machine readable news products with News Analytics (such as sentiment) over our Elektron platform in real time at very low latency. Don’t forget to check out these services for industrial strength news sentiment analysis, especially for enterprise-wide use cases.
For this article, we will try to do a similar thing as simply as possible to illustrate the key elements – our task is significantly eased by not having to do this in a low latency environment. We will be abstracting most of the complexities to do with the mechanics of actually analysing the text to various packages. You can then easily replace the modules such as the sentiment engine etc. to improve your results as your understanding increases.
So let’s get started.
Natural Language Processing (NLP) is a field which enables computers to understand human language (voice or text) and is a big area of interest for those looking to gain insight and new sources of value from the vast quantities of unstructured data available.
Here, we will be focussing on one application of NLP called Sentiment Analysis. In our case, we will be taking news articles (unstructured text) for a particular company, IBM, and we will attempt to grade this news to see how positive, negative or neutral it is. We will then try to see if this news has had an impact on the share price of IBM.
First, let’s load the packages that we will need to use and set our App Key.
There are two Eikon API calls for news:
get_news_headlines : returns a list of news headlines satisfying a query
get_news_story : returns the full news article
We will need to use get_news_headlines API call to request a list of headlines.
You can see here I have typed IBM, for the company.
And the code below gets us 100 news headlines for IBM prior to 4th Dec 2017, and stores them in a dataframe, df for us. Dataframes are like mini spreadsheet tables in which we can store data to process during our analysis.
So we have our dataframe with the most recent 100 news headline items. The storyID, which we will now use to pull down the actual articles themselves, is stored in the storyID column. We will iterate through the headline dataframe and pull down the news articles using the second of our news API calls, get_news_story. We simply pass the storyID to this API call.
Once we have the text of these articles we can pass them to our sentiment engine which will give us a sentiment score for each article. The sentiment engine we will be using is the simple TextBlob package. TextBlob is a higher level abstraction package that sits on top of NLTK (Natural Language Toolkit) which is a widely used package for this type of task.
NLTK is quite a complex package which gives you a lot of control over the whole analytical process. TextBlob shields us from this complexity, but we should at some stage understand what is going on under the hood. Thankfully there is plenty of information to guide us in this.
Looking at our dataframe we can now see 3 new columns on the right, Polarity, Subjectivity and Score. As we have seen Polarity is the actual sentiment polarity returned from TextBlob (ranging from -1(negative) to +1(positive), Subjectivity is a measure (ranging from 0 to 1) where 0 is very objective and 1 is very subjective, and Score is simply a Positive, Negative or Neutral rating based on the strength of the polarities.
We would now like to see what, if any, impact this news has had on the share price of IBM. There are many ways of doing this – but to make things simple, I would like to see what the average return is at various points in time AFTER the news has broken. I want to check if there are aggregate differences in the average returns from the Positive, Neutral and Negative buckets we created earlier.
To get the average returns, we need share prices of IBM at various times after the news headlines are released, and for that, we use another Eikon Data API.
get_timeseries : returns historical pricing data at various intervals
We are interested in share prices at 2, 5, 10, and 30 minutes after the news headlines, so we will retrieve the historical data in minute-intervals, storing it in our MinuteSeries dataframe.
We will need to create some new columns for the next part of this analysis. Each of these columns will hold the share prices at 2, 5, 10, and 30 minute marks after the news headline was released.
We now just need to get the timestamp of each news headline and get the base share price of IBM at that time, and at several intervals after that time, in our case t+2 mins,t+5 mins, t+10 mins, t+30 mins, calculating the % change for each interval.
We will loop through each news headline in the dataframe, calculate and store the derived performance numbers in the columns we created earlier: twoM…thirtyM.
Fantastic. We have now completed the analytical part of our study. Finally, we just need to aggregate our results by Score buckets in order to draw some conclusions.
From our initial results, there might be some small directional differences in returns between the positive and neutral groups over shorter time frames (twoM and fiveM) after news broke. This is a pretty good basis for further investigation. So where could we go from here?
We have a relatively small number of headlines, so we might want to increase the size of the study.
We might also want to try to separate out more positive or negative news – i.e. change the threshold from ± 0.05 – to try to identify more prominent sentiment articles which maybe could have more of an impact on performance.
Alternatively, we could use the same process looking at all news for an index future – say the S&P500 emini – as this trades on Globex round the clock . However, would each news article be able to influence a whole index? Are index futures more sensitive to some types of articles than others? Is there a temporal element to this? These are all excellent questions. What about cryptocurrencies? They trade 24/7. And so on.
We could also investigate what is going on with our sentiment engine. We might be able to generate more meaningful results by tinkering with the underlying processes and parameters. Using a different more domain-specific engine might help us to generate more relevant scores.
You will see there is plenty of scope to get much more involved here.
This article was intended as an introduction to this most interesting of areas. I hope to have demystified this area for you somewhat and shown how it is possible to get started with this type of complex analysis using only a few lines of code, a simple yet powerful API, and some really fantastic open source packages, to generate some meaningful results for your individual use.
To discover more about our market-tested, industrial-strength textual sentiment products which deliver low latency sentiment analytics on the cloud, and across your entire enterprise, check out Refinitiv Machine Readable News.
For more learning resources and information on how to get started with our Eikon Data API, register for a free account on the Refinitiv Developer Community. You can also find the actual Python notebook with this analysis here.