Limitations of Lexical Search and why we need Semantic Search

In one of our previous blog posts we elaborated on the difference between lexical search and semantic search and also outlined some of the features of lexical search. In this blog article we will have a closer look on lexical search, how lexical search can be enriched when operating on structured data and where the limitations are. At the end of this article we will see some of the powers of semantic search which we will target in more detail in a future blog post.

In general, if you run a lexical search using one of the standard search engines, what happens behind the scenes is some sort of text matching. Today’s search engines include many features to improve the quality of the delivered results, like stemming, synonyms, fuzzy searches and others, but at the end it still boils down to text matching and match rating strategies.

But what if you want to search a knowledge base that contains structured content or even better that contains knowledge? Can the structure somehow be leveraged to improve the quality of the search result?

The answer to that question is “Yes”. Search engines like e.g. Solr or Elasticsearch support custom index fields with facets built on top of it. This allows annotation of indexed documents with different pieces of information and gives the user the option to apply additional filters using those facets

Let’s use an example using a multi-domain knowledge base to demonstrate the value of custom index fields and facets.

A user wants to learn more about the animal “Jaguar” and is running a search using that term.

The system will return matches related to the animal “Jaguar” as well as the car manufacturer “Jaguar”. You can imagine that it may get hard to sort out the “unwanted” results if you are in a situation where thousands of results are returned.

With a custom index field storing information about the asset type (e.g. “animal” vs “car manufacturer”) the user would be able to further filter the search results based on the asset type and thereby improve the quality of the results.

A downside of custom index fields is that they need to be defined and filled with information, so additional application logic is required to populate them. In case of a knowledge base with a static scope of asset types this may not be an issue as this may be part of the implementation and configuration. However in knowledge bases where the scope of assets continuously changes over time, this approach will not scale.

Semantic search will remedy those limitations. Semantic search respects the underlying structures in the knowledge base and retrieves information from the knowledge base using logical expressions, e.g. using the SPARQL query language to query triple stores.

In the previous example, the user can get the results by formulating a query like (in pseudo-syntax):

?x has-type Animal AND

?x has-label Jaguar

This is just a very simple example for a semantic search. Semantic searches especially allow traversal of the graph of interconnected knowledge assets. This makes it very easy to e.g. formulate a query to find all car models that have a car manufacturer which is headquartered in Germany:

?x has-type Car-Model AND

?x has-manufacturer ?m AND

?m has-headquarter ?h AND

?h located-in Germany

This demonstrates that semantic searches allow for much more powerful searches than a classical lexical search can support when operating on a structured knowledge base. But this power also comes with a “cost”: Formulating semantic searches requires knowledge about the query syntax as well as the available types of knowledge assets and how they are interconnected. UI tools can lower that cost by giving the users some guidance on how to formulate the searches (e.g. by using content assist based on lexical searches), but novice users will likely not be able to formulate a more complex query without assistance.

In this article we outlined the limitations of lexical search and demonstrated some of the powers of semantic search. In one of our future articles we will go into more details about semantic searches.

We love to hear from you! For any questions about knowledge management, please reach out to