The Importance of Language in Text Analytics

"What's another word for (enter word here), mom?"

"Grab the thesaurus and look it up" is the response I would hear from my mom if I asked that question as a young child. Fortunately, we have devices that can rattle off a definition or synonym from a voice command. Simple enough for us to understand the meaning and use the word correctly in most cases, but what about for a machine?

Machines that read and comprehend have increased their abilities to do so in recent years to help us pull information from unstructured content for analysis. And as I thought about what goes into teaching machines to read, I started thinking about the similarities of how much stock we put into linguistics. I looked at how important it is that we align these capabilities with business requirements when designing data extraction and analysis software.

As Kingland founder David Kingland once said, "You have to teach machines the nuances of the financial services vocabulary. For example, we may know what a party may be in a given regulatory filing, but an AI engine may think it's something you do to celebrate someone’s birthday, if it’s otherwise not taught any differently."

We can brag about extraction accuracy and speed, but let's take a closer look at how machines comprehend the differences among words. Machines need someone to teach them how to mesh words together in a sentence before understanding its true meaning. Machine learning and deep learning have extended the ability of comprehension, but the tricky part is deciphering different parts of a sentence. Is this is a verb or subject? Does that date correlate with this event? One of the parts of the English language I’d like to delve into that can help with comprehension is a brief explanation about lexemes (and no, a lexeme isn’t a creature from a Dr. Seuss book).

According to ThoughtCo, a lexeme is the fundamental unit of the lexicon of a language. Usually, a lexeme is an individual word. For example, buy, buys, bought and buying are forms of the same lexeme. Try asking a machine to understand when it's relevant to extract the lexeme of buy for further analysis of extracted data while monitoring important customers, counterparties, or general entities. It's a difficult task and takes a human touch to pull it off.

After the information is extracted, the machine can go through and reference attributes that it expected to find in the document. But what about lexemes?

At Kingland, we've employed more than 2,000 students from Iowa State University who have - through the years - highlighted sections of documents and tagged parts of sentences with different types of things to say that's a fund manager, that monetary amount is related to the penalty/fine, or the reference to the word party is connected to an important entity. Natural Language Processing models are being trained to be smarter and contextually understand the content. And this is where we start to realize the possibilities of humans and machines.

Through this training, the machines are able to discern if the phrase party is a reference to an entity. This is one of the things that keeps us up at night – connecting data solutions like this with client requirements, and with a high degree of accuracy.

For you, the use of machine learning technology in this case, means you're able to quickly identify entities and people throughout most sources and language patterns. You get context - not just more data - when the machine accurately identifies pronouns, family names, entity names and more. This brings about the potential to accurately identify entities and people from regulatory filings or news articles, matching information to the correct Robin, Robyn or Robun, or connecting subsidiaries of companies even if they don't use the parent name.

By asking the right questions and teaching machines detailed information about our language and the specific definitions of words we use in our industries, we can create more powerful AI technology. Let's just hope the next time we ask Alexa to suggest a definition or synonym for us, she doesn't say, "grab a thesaurus."