Apache Solr uses Lucene’s inverted index. Most of the search engines are using inverted index data structure to achieve better search performance. In the inverted index all the search terms will be having associated document ids. Once the user issues a query, it will search for the terms and the associated documents. It is the optimized way to get the fast search results from the search engine.
If you go with forward index, where all the documents will have associated search terms, it requires lot of iterations to find the documents for the query. This leads to poor performance to provide search results. Now we will see an example how Solr’s inverted index looks like. Let us define some documents with the content.
Content to index
|Doc1||dc mens shoes|
|Doc2||clarks shoes mens boat shoes|
|Doc3||basketball mens shoes|
Let us construct the “Inverted Index”. To construct the inverted index, first we need to split all the terms and sort the terms in ascending lexicographical order.
|mens||Doc1, Doc2, Doc3, Doc4|
|shoes||Doc1, Doc2, Doc3, Doc5|
Let us perform some queries.
- User searches for mens AND shoes, the Solr will get the intersection of the documents as the results. From both the sets the documents DOC1, DOC2, Doc3 are the intersection result. The below diagram depicts the same.
- If users searches for mens OR shoes, the Solr will get the union of the documents as the search results. From both the sets, the union will be DOC1, DOC2, DOC3, Doc4 and DOC5. The below diagram depicts the same.
In the comings articles we will see couple more Solr features. Happy Learning !!!!