Building a Structured Financial Newsfeed using Python, SpaCy and Streamlit

Getting started with NLP by building a Named Entity Recognition(NER) application

Harshit Tyagi

--

One of the very interesting and widely used applications of NLP is Named Entity Recognition(NER).

Getting insights from raw and unstructured data is of vital importance. Uploading a document and getting the important bits of information from it is called information retrieval.

Information retrieval has been a major task/challenge in NLP. And NER(or NEL — Named Entity Linking) is used in several domains(finance, drugs, e-commerce, etc.) for information retrieval purposes.

In this tutorial post, I’ll show you how you can leverage NEL to develop a custom stock market news feed that lists down the buzzing stocks on the internet.

Pre-requisites

There are no such pre-requisites as such. You might need to have some familiarity with python and the basic tasks of NLP like tokenization, POS tagging, dependency parsing, et cetera.

I’ll cover the important bits in more detail, so even if you’re a complete beginner you’ll be able to wrap your head around what’s going on.

So, let’s get on with it, follow along and you’ll have a minimal stock news feed that you can start researching.

Tools/setup you’ll need:

  1. Google Colab for initial testing and exploration of data and the SpaCy library.
  2. VS Code(or any editor) to code the Streamlit application.
  3. Source of stock market information(news) on which we’ll perform NER and later NEL.
  4. A virtual python environment(I am using conda) along with libraries like Pandas, SpaCy, Streamlit, Streamlit-Spacy(if you want to show some SpaCy renders.)

Objective

The goal of this project is to learn and apply Named Entity Recognition to extract important entities(publicly traded companies in our example) and then link each entity with some information using a knowledge base(Nifty500 companies list).

--

--