Meet FILBERT. Google’s BERT trained on Finance, Insurance & Legal


Shoppers may not be panic buying frozen food, canned soup and peanut butter like they did at the height of the COVID-19 pandemic back in March, but keeping cupboards consistently full as global coronavirus fears fluctuate continues to power sales of some of the biggest packaged food makers.

DeepSee Blog | Background Image

The statement above carries a strong, implied positive message about some food makers juxtaposed to negative ones about global fears. It is not an easy task for computers to extract this positive message because it requires understanding of what is positive and negative in the context or language of financial behavior without getting confused with the rest of the sentence.

Solving context-specific sentiment classification, along with a litany of additional NLP use cases, is the purpose of the Knowledge Process Automation (KPA) Platform we produce at DeepSee. This paper describes a recent innovation from DeepSee we call FILBERT: Financial, Insurance and Legal Bidirectional Encorder Representation from Transformers. Based on Google’s BERT (Bidirectional Encoder Representations from Transformers), we leverage two different versions of related research efforts called FinBERT (A financial corpus trained BERT), which improved the then state-of-the-art performance by 15 percentage points. With FILBERT our aim is to demonstrate significant improvement over all other approaches when processing complex financial contracts and artifacts typically employed in banking, insurance, and risk-hedging operations. As a key component in the automation of complex knowledge-based workflows in large organizations, FILBERT will allow for faster implementations of Knowledge Process Automations without arduous data preparation or sacrifice of precision. We intend on sharing our production settings, code, some of the data we used to train FILBERT with guidelines for you to train and use your financial sentiment classifier on your own related financial documents.

Why we are doing this

DeepSee.AI is committed to KPA.

On a daily basis, we sift through an overwhelming amount of information, largely in form of unstructured text data, with documents including the contracts and artifacts of keen importance to our clients. Financial sentiment analysis is but one of the essential components demanding the attention of our clients’ analysts over continuous flows of data.

We quickly noticed that traditional approaches such as bag-of- words and name-entity recognition simply would not cut it, as those kinds of solutions often disregard essential context information in the text. Our tasks require more advanced techniques to crack the specific intersection of financial, insurance, and legal jargon. The context and understand what of is positive and what is negative from that specific point of view is but one of many use cases for the FILBERT family of models.

The current state-of-the-art approach to natural language understanding is using pre-trained language models by fine- tuning them for specific (downstream) tasks such as question answering or sentiment analysis. We followed that recipe and are developing FILBERT, leveraging both versions of FinBERT, as a BERT-based language model with a deeper understanding of financial-insurance-legal language and fine-tuned it for a litany of uses cases including flavors of sentiment classification.

NLP 2.0

If we were trying to tackle this problem in 2017, we would have built a custom model and trained it from scratch. We would have needed very large amounts of labeled data for a relatively meager but acceptable performance. Labeled data with good quality is difficult to acquire, especially in a niche domain like complex financial instruments, where in-depth expertise and critical human judgement is required.

But in 2018 the NLP world began ascending a rapid arc of innovation. Starting with ULMFit, researchers discovered how to perform transfer learning efficiently for NLP problems.

Start with procure textual data that is available in abundance, like Wikipedia. Then, train a language model with that data to simply predict the next word in a sentence. Finally, fine-tune the language model for your task with one or several task-specific layers. The advantage of this approach is that you don’t need a huge dataset for fine-tuning, because the model learns about the basics of the language itself during the initial language model training. The most difficult part of the process is creating the base language model.

Many other language models followed the ULMFit approach but used different training schemas, the most significant being BERT. BERT was the language model that made the whole concept of pre-training and fine-tuning very popular. BERT brought two core innovations to language modelling:

  1. It borrowed the transformer (the T of BERT) architecture from machine translation, which does a better job of modelling long-term dependencies than RNN-based approaches.
  2. It introduced the Masked Language Modelling (MLM) task, where a random 15% of all tokens are masked and the model predicts them enabling true bi-directionality (the B of BERT).

BERT achieved state-of-the-art performance in almost all of the down-stream tasks it was applied to, such as text classification and question answering. Now that the compute-heavy (Google used 16 TPU’s for 4 days to pre-train BERT) language model training is already done, anyone with a decent computing power could train a very accurate NLP model for their niche task based on a pre- trained language model.

First came the FinBERTs

BERT was the perfect launch point for dozens of language variants. Some of the earlier innovations included language models to assist in classifying biology research work (BioBERT), medical papers (ClinicalBERT), scientific studies (SciBERT), and a litany of others including two different versions of aa BERT fine- tuned for financial work (FinBERT).

The two FinBERTs were similar in objective and relatively close in terms of accuracy but fine-tuned using two very different data sets.

Two different efforts called FinBERT:

  1. FinBERT: Financial Sentiment Analysis with Pre- trained Language Models
    Dogu Tan Araci (Netherlands, 2019):
  2. FinBERT: A Pretrained Language Model for Financial Communications
    Yi Yang, et al (Hong Kong, 2020):

FinBERT-2019 used a subset of the Reuters TRC2 corpus and the Financial Phrase Bank for fine-tuning. FinBERT-2020 used a litany 10-k and 10-q filings with earnings call transcripts and analyst reports.

Both approaches demonstrated significant performance improvements when compared to basic BERT results.


Conceptually, FILBERT is a cousin of the FinBERT efforts, but trained with an understanding that the requirements of KPA go well beyond a general finance language model. The legal and risk aspects of many complex financial instruments require trained- analyst levels of understanding, which is daunting for human organizations to do well, much less automated computer systems.

When conceiving FILBERT we theorized that the fine-tuning of extant Transformer models to meet KPA workflow use case needs such as those our customers were facing could be an important component beyond any we would otherwise host on the DeepSee Platform.

Our suppositions regarding the superiority of a FILBERT approach, when applied to the use cases we were seeing, is proving to be correct, although we have more work to do.

As with other Transformer models, FILBERT is an umbrella concept containing a bevy of models based on specific assemblies of corpora. Descriptions of the training sets we used and the results we achieved will accompany the first public release of FILBERT.


Rapid innovation with recent developments in NLP has given rise to a new breed of KPA tools unlike any seen before. The ability to reasonably discuss the automation of aspects of knowledge-based workflows is now possible, when just a few years ago, it was beyond the reach of most organizations.

The DeepSee Platform aims to be the weapon of choice in the development of KPA solutions for the enterprise. FILBERT is an innovation that will help data scientists in our clients’ firms become much more productive in imagining, creating, and rolling out KPA solutions.