Traditionally search engines use a lexical search, this is where literal matches of words, phrases, or variants are used to find results. Thus, lexical search allows for an easy-to-understand control of your query and the expected matches. 

The drawback in this approach is that the meaning behind the query can be lost, as it is only matching the query text characters.  For example, you might miss all specific synonyms or subtypes of the semantic meaning of the query term that you did not specify explicitly. A further disadvantage is that you cannot resolve ambiguous terms.

Advanced lexical search may also deliver back all documents in which a term is mentioned in two ways: as the exact text string or the text string plus variants thereof.  For example ‘polymers,’ “polymers” will be stemmed to “polym”, also its variant “polymer” will be stemmed to “polym”.

As an example, if you were to query: ”polymers” in IFI patents as a lexical search with variants you’d receive 12,407,911 documents. How do you then begin to understand what is really relevant? For example, if your search covered all parts of a document, including those non-relevant sections like the reference section of scientific articles, the number of potentially non-relevant hit documents is increasing.

This is where Semantic Search comes into its own.

What is semantic search?

Semantic search tries to understand the semantic meaning of the query words or phrases, resulting in better accuracy and relevance of search results.

In Dimensions Life Sciences and Chemistry (L&C) we use OntoChem’s ontologies and NLP rules stored together in dictionary cartridges to enable semantic searches. They provide the required domain knowledge and contextual rules to deliver the semantic background and ensure the accuracy of the annotation.

Semantic search is more powerful than classical lexical search. And, due to its extended domain knowledge, it usually returns more results that are also of higher relevance.

One particular advantage of semantic search is the resolution of ambiguous terminology and that all specific subtypes (“children”) of a technical term will be found without the need to mention them in the query explicitly. 

In Dimensions L&C you can search ‘Ontologically with synonyms’ to find the search term, as well as all synonyms of this concept and all synonyms of ontological subclasses.  Or, on the other hand, search ‘Concept only, with synonyms’ to find the search term, as well as all synonyms of this concept, but no ontological subclasses. 

For greater relevance, the semantic search is executed only on relevant document parts, e.g. the reference section of scientific articles is left out. This way, the portion of highly relevant hit documents is increased, and the number of less relevant hits is reduced.

Examples in action: How semantic search can improve accuracy

Problem: Ambiguous terminology

Running the query: “cancer”

A lexical search will deliver back all documents in which cancer is mentioned as the disease “cancer” but also all documents in which the species “cancer”, e.g. “Cancer borealis” or “Cancer irroratus” is meant.

In a semantic search, the user chooses the search space whether it is to be searched as a disease or a species. Domain-specific context rules regulate whether the disease or the species “cancer” is annotated in the text. So the documents that are returned as a hit are usually much more relevant.

Figure 4

Running the query: “sting”

A lexical search will deliver back all documents in which “sting”  is mentioned as the injury (disease) “sting” but also all documents in which the protein family STING is meant. Or the musician “Sting” or the verb “sting”. 

In a semantic search, the user chooses the search space whether it is to be searched as a disease or a gene. Domain-specific context rules regulate whether the disease or the protein “STING” is annotated in the text. So only the relevant documents are returned as a hit. Using the Domain Explorer, you can select the search space by choosing the domain of interest (Figure 2).

Problem: Abbreviations and acronyms

query: “pmma”

A lexical search will deliver all documents in which “pmma” is mentioned as the polymer poly(methyl methacrylate), but also all documents in which the gene “pmma” is meant.

As a semantic search, the user chooses the search space whether it is to be searched as a polymer or a gene. Domain-specific context rules regulate whether the polymer or the gene “pmma” is annotated in the text. So only the relevant documents are returned as a hit. Using the Domain Explorer, you can select the search space by choosing the domain of interest (Figure 5).

Figure 5

Examples in action: How semantic search can improve recall

The number of relevant hits is increased as a semantic search is performed with an ontological concept that contains all synonyms as well as all ontological descendant concepts (child nodes and child nodes thereof).

For example, running the query: ”polymer”

A lexical search will deliver back all documents in which “polymer” is mentioned as the text string “polymer” plus variants thereof.

A semantic search will, in addition to all documents containing the text string “polymer”, also return documents that contain specific polymers like poly(methyl methacrylate), perloid, or nylon.

For example, running the query: ”pesticides”

A lexical search will deliver back all documents in which “pesticides” is mentioned as the text string “pesticides” plus variants thereof.

A semantic search will, in addition to all documents containing the text string “pesticides”, also return documents that contain specific pesticides like bixafen, boscalid, or imazamox.

Dimensions Life Sciences and Chemistry includes both lexical and semantic search, the latter is programmed to interpret over 22 million concepts and over 55 million synonyms for more accurate results.

Get in touch if you’d like a demo to see how it works.