loader

Text analysis for business purposes

Contract Analytics

Client

Qatar Financial Centre Regulatory Authority (QFCRA)

Industry

Financial supervision

About the Company

QFC Regulatory Authority is the independent regulator of the Qatar Financial Centre (QFC). Their role is to authorize and regulate firms and individuals conducting financial services in or from the QFC. The Regulatory Authority works closely with a number of public entities and professional organizations on the joint efforts of strengthening Qatar’s regulatory framework and building a legacy of regulation for the State.

Challenge

The client aimed to build for their customers a personalized news feed that would meet specific requirements. People should get only relevant and real-time information from trusted news sources, so they could make well-informed conclusions and invest money wisely with minimal risk.  

Before starting working with us on a custom solution, the client also considered the following options:

  • Modeling the news feed in a traditional way, i.e. giving to customers data from trusted sources like Financial Times, newsletters, morning news etc.
  • Problem:
    With this method, people get overwhelmed with reading irrelevant news and miss information they should have seen.
  • Subscription to a media monitoring service. After being provided with the list of keywords and topics of interest, the service offers a custom news feed.
  • Problems:
    – High cost of the service
    – No control over the algorithm
    –  system is more static and not relevant over time
    – Disclosure of sensitive information to a third-party company.

Opportunity

Financial supervisors already used KNIME to extract end crawls from a number of reports. The initial idea was to use two or three-word phrases against the API to get precise results, about 50% of the received articles were irrelevant due to different context around the keyphrase, while the main goal was to provide as accurate data as possible.

Financial supervision is not about day trading, so the articles about stock prices are too timely and irrelevant for supervisors and must be removed from the feed.

With KNIME, financial supervisors can use crowdsourcing. For example, if all supervisors collectively label enough news articles, the result would be a good corporate view of what financial supervisors care about. And if there are enough collected labels, financial supervisors can start differentiating services to different teams or individuals depending on their specific interests.

What was done by in-house team of QFCRA:

  • An app built and running on the KNIME server was pre-populated with 50,000 news articles using 800 phrases It can be used to search for keywords, sentences, or paragraphs to find the news they are interested in.
  • Another option is to do 60 articles in the labeling view of KNIME and look at the news and choose whether these are the articles they would like to see in their daily headlines. Supervisors get units of 60 articles, a number that is considered that is not too much, and there are enough views in every round.
  • Gamification stimulated the process of labeling the news: supervisors receive an email with the information on where they stand in the labeling effort for the entire organization. For example, the report says the supervisor is currently the third labeler, and there are 200 articles to label to reach the next person.

How Did Our Solution Work

Step 1

Use classic NLP methods

The input data is presented as a collection of the texts with labels that define the relevance of the paper according to the customer opinion, these labels were provided by the customer. To process these texts we used the Spacy extension for Knime. Spacy is a well-known NLP framework for Python that now can easily be used in Knime with no code at all.

Spacy includes multiple language models (23 languages) and standard utilities such as: tokenization, lemmatization, morphology analysis, stop word filtering, NER and POS tagging and text vectorization. Another benefit is that the Spacy extension for Knime is completely compatible with Knime Text Processing nodes, which were also used in the project.

Step 2

Text Analysis

Once the texts are cleaned, they are ready to be investigated with such algorithms as topic modeling, terms co-occurrence, TF-IDF analysis. The first three algorithms provide descriptive information regarding the collection of texts, they are useful to see the frequency and importance of the terms, build a simple graph based on terms co-occurrence and build a tag cloud.

Step 3

Text classification

The later algorithm is useful to resolve the problem of text classification, defining the relevance of the texts. For this purpose two solutions were developed – based on Spacy and based on BERT. As long as Spacy models also can vectorize the texts this feature was used to convert the texts to vectors, which were then used as input to XGBoost tree algorithm. In that case training only took a minute and the accuracy was 75% (with F1 78% and 71% for each class).

Then the same task solved with BERT and training took about 30 minutes and required a GPU. This approach expectedly showed better classification performance with 80% of accuracy (80% and 77% for F1).

This way the customer is given a choice which approach to choose given all the advantages and drawbacks of each approach.

Workflow

Recommender systems for financial services
Why Choose Us

Result

The Redfield team did relevance estimation for incoming news, provided text analysis and dashboard visualization. 
This allowed financial supervisors to generate a relevant news feed for the users.

Using different approaches of NLP we managed to create a meaningful business dashboard that represents the main insights from the big collection of texts. We also managed to present multiple solution for text classification of the texts relevance for the users. These solutions are flexible in terms of inference and deployment.

Tools

Technologies Used

Knime Analytics Platform
Knime Server

 

img

    Get in touch with us

    By submitting your information to our website you agree to the terms outlined in our Privacy Policy.

    Phone support

    +46 70 733 36 34

    Email us

    info@redfield.ai

    Address

    Flemminggatan 15A 112 26 Stockholm Sweden