Apache Solr – The Inverted Index

 

             Smart Techie

               Apache Solr uses Lucene’s inverted index. Most of the search engines are using an inverted index data structureApache Solr to achieve better search performance. In the inverted index, all the search terms will be having associated document ids. Once the user issues a query, it will search for the terms and the associated documents. It is the optimized way to get fast search results from the search engine.

                 If you go with forward index, where all the documents will have associated search terms, it requires a lot of iterations to find the documents for the query. This leads to poor performance to provide search results. Now we will see an example of how Solr’s inverted index looks like. Let us define some documents with the content.

 

                                 Document Id

                                  Content to index

Doc1 dc mens shoes
Doc2 clarks shoes mens boat shoes
Doc3 basketball mens shoes
Doc4 mens watches
Doc5 jordan shoes

 

Let us construct the “Inverted Index”.  To construct the inverted index, first, we need to split all the terms and sort the terms in ascending lexicographical order.

 

                                       Terms

                                     Document Ids

basketball Doc3
boat Doc2
clarks Doc2
dc Doc1
jordan Doc5
mens Doc1, Doc2, Doc3, Doc4
shoes Doc1, Doc2, Doc3, Doc5
watches Doc4

 

Let us perform some queries.

  • User searches for mens AND shoes, the Solr will get the intersection of the documents as the results. From both the sets the documents DOC1, DOC2, Doc3 is the intersection result. The below diagram depicts the same.

Solr AND opreration

 

  • If users search for mens OR shoes, the Solr will get the union of the documents as the search results. From both the sets, the union will be DOC1, DOC2, DOC3, Doc4, and DOC5. The below diagram depicts the same.

Solr OR query

 

In the comings articles, we will see a couple more Solr features. Happy Learning !!!!

I am Siva Prasad Rao Janapati. Working as Technical Architect. Has hands on experience on ATG Commerce(DAS/DPS/DCS), Mozu commerce, Broadleaf Commerce, Java, JEE, Spring, Play, JPA, Hibernate, Velocity, JMS, Jboss, Weblogic,Tomcat, Jetty, Apache, Apache Solr, Spring Batch, JQuery, NodeJS, SOAP, REST, MySQL, Oracle, Mongo DB, Memcached, HazelCast, Git, SVN, CVS, Ant, Maven, Gradle, Amazon Web services, Rackspace, Quartz, JMeter, Junit, Open NLP, Facebook Graph,Twitter4J, YouTube Gdata, Bazzarvoice,Yotpo, 4-Tell, Alatest, Shopzilla, Linkshare. I have hands on experience on open sources and commercial technologies.

Tagged with: , ,
Posted in Apache Solr

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

DZone

DZone MVB

Java Code Geeks
Java Code Geeks
%d bloggers like this: