This concept page explains how it would be possible to add solr's faceting possibilities to Magnolia.
Enhancing the existing search with SolR's faceting possibilities
Enabling this, the following use cases could be answered.
- Easy Configuration, adding facets is easy through access to solr fields and teh possibility to facet on evrything that is indexed.
- Facet/Categorize on all fields submitted to the index and for all content inside the index ( DMS/DATA, WEBSITE, Third party )
- Do keyword based searches in faceted content, get the current facets for a specific keyword search.
- The above keyword search gives the associated categories that are available for teh specific search, refining is possible by clicking again on one of the items, for instance clicking on IT Systems will give us teh only result matching IT-Systems and "Magnolia Presentation".
- Do range faceting ( price/dates) propose a general search interface for product/e-commerce sites.
- Be able to provide user context content paths, maybe this could be another concept page on its own.
- First all content is categorized, each content must have at least one user profile categorization ( developer, marketing, buyer, ...)
- Then, navigation is done through solr's faceted search.Based on a few initial choices, different layouts can be proposed after each refining.
We will try to make things as generic as possible, to be able to use as well other search providers, we extend the ExtSearchResultModel Class with the FacetedSearchResultModel Class which will only contain the specific getters/setters for faceting.
How do we push the categories to the index ?
Two things have to be distinguished here, content from the website and assets like documents, movies and other stuff.
Content from the website is already picked up by the Heritrix crawler that calls the provider instance through the Extended Search configuration and pushes urls to the solr server which will extract all content and index it.
To add categorization tags, we can add categories to a meta field in the page by adding a script in the HtmlHeader template as follows.
Enabling this, Solr's tika parser will pick up stuff in meta categories field, and index it if teh solr scheme has a corresponding categories field, now this is nice but what if we want to create other facets, like it is done with the resources module. In the resources module we have "root" categories that we can call facets like resources_role, resources_subject, you would not like to modify your scheme each time you add other facets to magnolia no ?
This is where the power of solr enters the game, in solr you can add dynamic fields which will be created if they do not exist in the index, to do so we added the following field in solr's scheme.
This tells Solr to automatically create a category field each time a facet that starts with category_ is added to the index, this means that if a meta field as follows is sent to the index;
category_resources_role is created in the scheme and constraint1 and constraint2 are indexed under this field or facet.
The choice to either prepend the "category_" prefix to the categories "root" category in magnolia or to prepend it when submitting the content, especially when submitting resources from the JCR data repository is an implementation decision.
Now what about JCR content that is not accessible by the crawler.
This type of content can be send either by performing an extract of the data, converts it to have the correct solr syntax and submits it to the index on bulk or batch basis, or through a JCREventListener that will submit the content once it is available for publishing.
I wrote the following command to index video_resources and slideshow_resources to the solr index, this of course has to be enhanced by finding maybe a way through workflow to index or not the specified content.