The configuration parameter
indexingConfiguration is not set by default. This means all properties of a node are indexed.
If you wish to configure the indexing behaviour you need to add a parameter to the
SearchIndex element of either your repository configuration file or your workspace configuration file.
Any time you make changes to the indexing configuration do not forget to recreate the index from scratch.
Indexing configuration file should be located in the package
To optimize the index size you can index only certain properties of a node type. Index rules are processed top down and the first matching rule gets applied and all remaining ones are ignored.
As of Jackrabbit 2.0 you can also use the match all regex for the namespace prefix part of a property name. However that's currently the only supported regular expression. Please note that you have to declare the namespace prefixes in the
configuration element that you are using throughout the XML file.
nodeScopeIndex attribute set to
false the property will not be in the full-text index. Meaning it would be available for all searches except for those using
Here we are applying an index rule against nodes of type
nt:base. This also applies to nodes with a type that extends from
nt:base is the base node type of all primary nodes types this rule will apply everywhere.
<index-rule nodeType="nt:base"> <property isRegexp="true" nodeScopeIndex="false">mgnl:.*</property> <!-- Exclude Magnolia metadata from the full-text index. --> <property isRegexp="true" nodeScopeIndex="false">jcr:.*</property> <!-- Exclude JCR metadata from the full-text index. --> <property isRegexp="true">.*:.*</property> <!-- Include all properties from any namespace, even the empty namespace. --> </index-rule>
You may also add a condition to the index rule and have multiple rules with the same node type.
For example, let's say that we only want to boost page titles when the paged has been marked with a
priority property. Further more let's assume we also have a requirement to provide three priority levels of low, medium, and high.
<!-- Since the default boost it 1.0 we don't need to specify it. Anything not medium or high will be considered low. --> <index-rule nodeType="mgnl:page" condition="@priority = 'medium'"> <property boost="3.0">title</property> </index-rule> <index-rule nodeType="mgnl:page" condition="@priority = 'high'"> <property boost="5.0">title</property> </index-rule>
Finally, add a radio button to your page dialog for controlling page priority levels.
You may also reference properties in the condition that are not on the current node and/or specify the type of a node in the condition.
It is possible to configure
boost value on both nodes and/or properties that match an index rule. The default
boost value is
boost values (a reasonable range is
1.0 - 5.0) will yield a higher score value and appear as more relevant.
Here we are applying a
boost value of
3.0 added to the
title property on nodes of type
<index-rule nodeType="mgnl:page"> <property boost="3.0">title</property> </index-rule>
Sometimes it is useful to include the contents of descendant nodes into a single node to easier search on content that is scattered across multiple nodes.
Here we create an index aggregate on
mgnl:page that includes the content of
mgnl:component. This will make it easier to search content on a page that is located in one of its area or component subnodes.
<aggregate primaryType="mgnl:page"> <include primaryType="mgnl:area">*</include> <include primaryType="mgnl:component">*</include> </aggregate>
With this configuration part, you define how a property should be analyzed.
For example, let's say I wanted to target properties which I know store German language content with a German language analyzer.
<analyzer class="org.apache.lucene.analysis.de.GermanAnalyzer"> <property>text_de</property> </analyzer>
Custom configuration file
You can create a custom indexing configuration for any workspace. Once created the file can be configured at the workspace.xml file of the workspace you wish to target. Changes to this configuration require a reindexing of the workspace.
An example of this would be the website specific example shown above or the dam specific configuration here:
This shows an example of node data aggregation. Since the magnolia metadata is stored on the mgnl:asset node and the image metadata/data is stored on a mgnl:resource subnode we can aggregate this into one lucene document.