Apache Solr – The Inverted Index

 

             Smart Techie

               Apache Solr uses Lucene’s inverted index. Most of the search engines are using an inverted index data structureApache Solr to achieve better search performance. In the inverted index, all the search terms will be having associated document ids. Once the user issues a query, it will search for the terms and the associated documents. It is the optimized way to get fast search results from the search engine.

                 If you go with forward index, where all the documents will have associated search terms, it requires a lot of iterations to find the documents for the query. This leads to poor performance to provide search results. Now we will see an example of how Solr’s inverted index looks like. Let us define some documents with the content.

 

                                 Document Id

                                  Content to index

Doc1 dc mens shoes
Doc2 clarks shoes mens boat shoes
Doc3 basketball mens shoes
Doc4 mens watches
Doc5 jordan shoes

 

Let us construct the “Inverted Index”.  To construct the inverted index, first, we need to split all the terms and sort the terms in ascending lexicographical order.

 

                                       Terms

                                     Document Ids

basketball Doc3
boat Doc2
clarks Doc2
dc Doc1
jordan Doc5
mens Doc1, Doc2, Doc3, Doc4
shoes Doc1, Doc2, Doc3, Doc5
watches Doc4

 

Let us perform some queries.

  • User searches for mens AND shoes, the Solr will get the intersection of the documents as the results. From both the sets the documents DOC1, DOC2, Doc3 is the intersection result. The below diagram depicts the same.

Solr AND opreration

 

  • If users search for mens OR shoes, the Solr will get the union of the documents as the search results. From both the sets, the union will be DOC1, DOC2, DOC3, Doc4, and DOC5. The below diagram depicts the same.

Solr OR query

 

In the comings articles, we will see a couple more Solr features. Happy Learning !!!!

Siva Janapati is an Architect with experience in building Cloud Native Microservices architectures, Reactive Systems, Large scale distributed systems, and Serverless Systems. Siva has hands-on in architecture, design, and implementation of scalable systems using Cloud, Java, Go lang, Apache Kafka, Apache Solr, Spring, Spring Boot, Lightbend reactive tech stack, APIGEE edge & on-premise and other open-source, proprietary technologies. Expertise working with and building RESTful, GraphQL APIs. He has successfully delivered multiple applications in retail, telco, and financial services domains. He manages the GitHub(https://github.com/2013techsmarts) where he put the source code of his work related to his blog posts.

Tagged with: , ,
Posted in Apache Solr

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

DZone

DZone MVB

Java Code Geeks
Java Code Geeks
%d bloggers like this: