Apache Solr – The Inverted Index

 

             Smart Techie

               Apache Solr uses Lucene’s inverted index. Most of the search engines are using inverted index data structureApache Solr to achieve better search performance. In the inverted index all the search terms will be having associated document ids. Once the user issues a query, it will search for the terms and the associated documents. It is the optimized way to get the fast search results from the search engine.

                 If you go with forward index, where all the documents will have associated search terms, it requires lot of  iterations to find the documents for the query. This leads to poor performance to provide search results. Now we will see an example how Solr’s inverted index looks like. Let us define some documents with the content.

 

                                 Document Id

                                  Content to index

Doc1 dc mens shoes
Doc2 clarks shoes mens boat shoes
Doc3 basketball mens shoes
Doc4 mens watches
Doc5 jordan shoes

 

Let us construct the “Inverted Index”.  To construct  the inverted index, first we need to split all the terms and sort the terms in ascending lexicographical order.

 

                                       Terms

                                     Document Ids

basketball Doc3
boat Doc2
clarks Doc2
dc Doc1
jordan Doc5
mens Doc1, Doc2, Doc3, Doc4
shoes Doc1, Doc2, Doc3, Doc5
watches Doc4

 

Let us perform some queries.

  • User searches for mens AND shoes, the Solr will get the intersection of the documents as the results. From both the sets the documents DOC1, DOC2, Doc3 are the intersection result. The below diagram depicts the same.

Solr AND opreration

 

  • If users searches for mens OR shoes, the Solr will get the union of the documents as the search results. From both the sets, the union will be DOC1, DOC2, DOC3, Doc4 and DOC5. The below diagram depicts the same.

Solr OR query

 

In the comings articles we will see couple more Solr features. Happy Learning !!!!

Advertisements

I am Siva Prasad Rao Janapati. Working as a software developer. Has hands on experience on ATG Commerce(DAS/DPS/DCS), Mozu commerce, Broadleaf Commerce, Java, JEE, Spring, Play, JPA, Hibernate, Velocity, JMS, Jboss, Weblogic,Tomcat, Jetty, Apache, Apache Solr, Spring Batch, JQuery, NodeJS, SOAP, REST, MySQL, Oracle, Mongo DB, Memcached, HazelCast, Git, SVN, CVS, Ant, Maven, Gradle, Amazon Web services, Rackspace, Quartz, JMeter, Junit, Open NLP, Facebook Graph,Twitter4J, YouTube Gdata, Bazzarvoice,Yotpo, 4-Tell, Alatest, Shopzilla, Linkshare. I have hands on experience on open sources and commercial technologies.

Tagged with: , ,
Posted in Apache Solr

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

DZone

DZone MVB

Java Code Geeks
Java Code Geeks
%d bloggers like this: