Page tree
Skip to end of metadata
Go to start of metadata

Your Rating: Results: PatheticBadOKGoodOutstanding! 2 rates

Summary

Magnolia uses Apache Jackrabbit content repository. You are free to use other content repository implementations as long as they conform to JSR 283. In this page we attempt to demystify the repository configuration file Magnolia uses to configure Jackrabbit.

Magnolia 5.4.x - Jackrabbit 2.8

Magnolia 5.5.x - Jackrabbit 2.12

Feel free to ask questions at the bottom of the page for further clarification.

Out of the box

Magnolia ships with a few examples of popular repository configurations that customers might use as-is or the basis for a customized configuration. If we look at what is provided with Magnolia 5 bundles we find 5 files in both author and public.

The repo config folders are located here:

  • /magnolia-5.x.x/apache-tomcat-x.x.x/webapps/magnoliaAuthor/WEB-INF/config/repo-conf
  • /magnolia-5.x.x/apache-tomcat-x.x.x/webapps/magnoliaPublic/WEB-INF/config/repo-conf

The five examples provided are:

  • jackrabbit-bundle-derby-search.xml
  • jackrabbit-bundle-h2-search.xml (Magnolia 5.5)
  • jackrabbit-bundle-ingres-search.xml
  • jackrabbit-bundle-mysql-search.xml
  • jackrabbit-bundle-postgres-search.xml
  • jackrabbit-memory-search.xml

Here are some other useful examples:

These are some of the more popular configurations we see used in practice. They are/were named for the persistence manager configuration you find inside. They also contain the search index configuration. These files will essentially act as a blue print for how each workspace should be set up on installation. After the workspace is created using the blueprint, you can further adjust the configurations of each workspace using it's workspace.xml file.

Jackrabbit Repository

Two properties are required by Jackrabbit to set up it's content repository:

  • Repository home directory: specified in the magnolia.properties file as magnolia.repositories.home
  • Repository configuration file: specified in the magnolia.properties file as magnolia.repositories.jackrabbit.config

Configuration File

The repository configuration file specifies global options like security, data sources, and versioning. A default workspace configuration template is also included in the repository configuration file. For each workspace that was created, there will also be a workspace.xml file created inside the workspace home directory that will be used for the workspace.

At a high level these are the elements that make up the repository configuration file.

<!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 2.0//EN" "http://jackrabbit.apache.org/dtd/repository-2.0.dtd">
<Repository>
  <DataSources .../> 
  <FileSystem .../>
  <Security .../>
  <DataStore .../>
  <Workspaces .../>
  <Workspace> <!-- The settings here will be used as the blueprint for all new workspaces -->
    <FileSystem .../>
    <PersistenceManager .../>
    <SearchIndex .../>
    <WorkspaceSecurity .../>
  </Workspace>
  <Versioning .../>
</Repository>

The configuration options available are outlined in the API here RepositoryConfig.

See http://jackrabbit.apache.org/jcr/jackrabbit-configuration.html

Data Sources

DataSource Configuration

The data source(s) used by jackrabbit can be configured as child elements of the Repository using the DataSources element.

Typically you configure one DataSource per repository and each workspace will use the exact same data source connection.

from jackrabbit-bundle-mysql-search.xml
<DataSources>
  <DataSource name="magnolia">
    <param name="driver" value="com.mysql.jdbc.Driver" />
    <param name="url" value="jdbc:mysql://localhost:3306/magnolia" />
    <param name="user" value="root" />
    <param name="password" value="password" />
    <param name="databaseType" value="mysql"/>
    <param name="validationQuery" value="select 1"/>
  </DataSource>
</DataSources>

The configuration options available are outlined in the API here DataSourceConfig.

  • driver: Depending which database you choose to work with make sure to include the jar with the appropriate driver in the classpath.
  • url: The connection url for the database.
  • user: The username for making the connection.
  • password: The password associated with the username.
  • databaseType: Options are postgresql, mysql, mssql, and oracle. Otherwise not required.

  • validationQuery: The SQL query that will be used to validate connections from this pool before returning them to the caller. The query depends on the database type, otherwise not required.

    • mysql: select 1

    • mssql: select 1

    • oracle: select 1 from dual
  • maxPoolSize: Restrict the number of connections in the pool to a max value. Not Required.

JNDI DataSource Configuration

Jackrabbit supports JNDI data sources. The container you use will determine how you setup your JNDI data source.

See: https://wiki.apache.org/jackrabbit/UsingJNDIDataSource

Embedded Datasource

Jackrabbit provides persistence manager implementations for both the H2 and Derby databases. Using these databases does not require a concrete datasource configuration. You simply provide the connection URL at the persistence manager configuration. You do need to make sure that you have the jar file in your classpath.

FileSystem

The virtual file system used by the repository to store things like registered namespaces and node types.

from jackrabbit-bundle-mysql-search.xml
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
  <param name="path" value="${rep.home}/repository" />
</FileSystem>

Jackrabbit provides a lot of choices for how you can configure the FileSystem. Choose the class that best fits your use case and click the link to see your configuration options.

See: http://jackrabbit.apache.org/jcr/jackrabbit-configuration.html#file-system-configuration

Security

The security configuration element is used to specify authentication and authorization settings for the repository.

from jackrabbit-bundle-mysql-search.xml
<Security appName="magnolia">
  <SecurityManager class="org.apache.jackrabbit.core.DefaultSecurityManager"/>
  <AccessManager class="org.apache.jackrabbit.core.security.DefaultAccessManager">
  </AccessManager>
  <!-- Login module defined here is used by the repo to authenticate every request. 
       Not by the webapp to authenticate user against the webapp context (this one has to be passed before thing here gets invoked). -->
  <LoginModule class="info.magnolia.jaas.sp.jcr.JackrabbitAuthenticationModule">
  </LoginModule>
</Security>

Jackrabbit uses the Java Authentication and Authorization Service (JAAS) to authenticate users who try to access the repository. The appName parameter in the <Security/> element is used as the JAAS application name of the repository.

Once a user has been authenticated, Jackrabbit will use the configured AccessManager to control what parts of the repository content the user is allowed to access and modify.

The slightly more advanced SimpleJBossAccessManager class is designed for use with the JBoss Application Server, where it maps JBoss roles to Jackrabbit permissions.

See: http://jackrabbit.apache.org/jcr/jackrabbit-configuration.html#security-configuration

DataStore

The data store is optionally used to store large binary values. Normally all node and property data is stored in a persistence manager, but for large binaries such as files special treatment can improve performance and reduce disk usage.   

The main features of the data store are:   

  • Space saving: only one copy per unique object it kept  
  • Fast copy: only the identifier is copied  
  • Storing and reading does not block others  
  • Multiple repositories can use the same data store  
  • Objects in the data store are immutable  
  • Garbage collection is used to purge unused objects  
  • Hot backup is supported

File System DataStore

The file data store stores each binary in a file. The file name is the hash code of the content. When reading, the data is streamed directly from the file (no local or temporary copy of the file is created). The file data store does not use any local cache, that means content is directly read from the files as needed. New content is first stored in a temporary file, and later renamed / moved to the right place.

from jackrabbit-bundle-mysql-search.xml
<DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
  <param name="path" value="${rep.home}/repository/datastore"/>
  <param name="minRecordLength" value="1024"/> <!-- default is 100 bytes -->
</DataStore>

The configuration options available are outlined in the API here FileDataStore.

See: https://wiki.apache.org/jackrabbit/DataStore#File_Data_Store

Database DataStore

The database data store stores data in a relational database. All content is stored in one table, the unique key of the table is the hash code of the content.   When reading, the data may be first copied to a temporary file on the server, or streamed directly from the database (depending on the copyWhenReading setting). New content is first stored in the table under a unique temporary identifier, and later the key is updated to the hash of the content.

MySQL does not support sending very large binaries from the JDBC driver to the database. Therefore a database DataStore should be avoided when using MySQL.

<DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
  <param name="url" value="java:jboss/datasources/jackrabbit"/> <!-- JNDI Datasource example -->
  <param name="driver" value="javax.naming.InitialContext"/>
  <param name="databaseType" value="oracle"/>
  <param name="schemaObjectPrefix" value="repo_ds_" />
</DataStore>

The configuration options available are outlined in the API here DbDataStore.

See: https://wiki.apache.org/jackrabbit/DataStore#Database_Data_Store

Workspaces

The Workspaces element of the repository configuration specifies where and how the workspaces are managed. The configuration of this element gets stored in the class RepositoryConfig.

from jackrabbit-bundle-mysql-search.xml
<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default" />

The Workspaces element has the following configuration options, set as attributes of the element and not as param sub-elements.

  • rootPath: The native file system directory for workspaces. A subdirectory is automatically created for each workspace, and the path of that subdirectory can be used in the workspace configuration as the ${wsp.path} variable.
  • defaultWorkspace: Name of the default workspace. This workspace is automatically created when the repository is first started.
  • configRootPath: By default the configuration of each workspace is stored in a workspace.xml file within the workspace directory within the rootPath directory. If this option is specified, then the workspace configuration files are stored within the specified path in the virtual file system (see above) configured for the repository.
  • maxIdleTime: By default Jackrabbit only releases resources associated with an opened workspace when the entire repository is closed. This option, if specified, sets the maximum number of seconds that a workspace can remain unused before the workspace is automatically closed.

See: http://jackrabbit.apache.org/jcr/jackrabbit-configuration.html#workspace-configuration

Workspace

The configuration specified in the Workspace element becomes the template for all workspaces created by Jackrabbit. Each workspace will have it's own workspace.xml file generated from this template. See Jackrabbit Workspace Configuration File for more information.

FileSystem

Workspace level virtual file system passed to the persistence manager and search index. The same configuration options are available here as described above for the repository level virtual file system.

See: http://jackrabbit.apache.org/jcr/jackrabbit-configuration.html#file-system-configuration

PersistenceManager

The PM is an internal Jackrabbit component that handles the persistent storage of content nodes and properties. Property values are also stored in the persistence manager, with the exception of large binary values (those are usually kept in the DataStore). Each workspace of a Jackrabbit content repository uses a separate persistence manager to store the content in that workspace.

from jackrabbit-bundle-mysql-search.xml
<PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager">
  <param name="dataSourceName" value="magnolia"/>
  <param name="schemaObjectPrefix" value="pm_${wsp.name}_" />
</PersistenceManager>

Jackrabbit provides a lot of choices for how you can configure the PersistenceManager. Choose the class that best fits your use case and click the link to see your configuration options.

All BundlePersistenceManager implementations that do not use a pool of JDBC connections have been marked as deprecated. Replace them with the pooled version.

See: https://wiki.apache.org/jackrabbit/PersistenceManagerFAQ

SearchIndex

The search index in Jackrabbit is pluggable and has a default implementation based on Apache Lucene. It is configured in the file workspace.xml once the workspace is created. For more detailed information on the settings here see Jackrabbit Workspace Configuration File and Search Index Configuration File.

from jackrabbit-bundle-mysql-search.xml
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
  <param name="path" value="${wsp.home}/index" />
  <!-- SearchIndex will get the indexing configuration from the classpath, if not found in the workspace home -->
  <param name="indexingConfiguration" value="/info/magnolia/jackrabbit/indexing_configuration.xml"/>
  <param name="useCompoundFile" value="true" />
  <param name="minMergeDocs" value="100" />
  <param name="volatileIdleTime" value="3" />
  <param name="maxMergeDocs" value="100000" />
  <param name="mergeFactor" value="10" />
  <param name="maxFieldLength" value="10000" />
  <param name="bufferSize" value="10" />
  <param name="cacheSize" value="1000" />
  <param name="forceConsistencyCheck" value="false" />
  <param name="autoRepair" value="true" />
  <param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl" />
  <param name="respectDocumentOrder" value="true" />
  <param name="resultFetchSize" value="100" />
  <param name="extractorPoolSize" value="3" />
  <param name="extractorTimeout" value="100" />
  <param name="extractorBackLogSize" value="100" />
  <!-- needed to highlight the searched term -->
  <param name="supportHighlighting" value="true"/>
  <!-- custom provider for getting an HTML excerpt in a query result with rep:excerpt() -->
  <param name="excerptProviderClass" value="info.magnolia.jackrabbit.lucene.SearchHTMLExcerpt"/>
</SearchIndex>

See: http://wiki.apache.org/jackrabbit/Search

WorkspaceSecurity

Workspace security is handled by the class MagnoliaAccessProvider. It is a Magnolia specific ACL provider. This class will compile the set of permissions a user has for a given workspace. If the user does not have any permissions for the workspace then root-only access is returned. If the user is detected as admin or superuser, which is checked first, then an implementation of CompiledPermissions that grants everything is returned.

The class MagnoliaAccessProvider has DEBUG output available that can be switched on using the Log Tools Module.

from jackrabbit-bundle-mysql-search.xml
<WorkspaceSecurity>
  <AccessControlProvider class="info.magnolia.cms.core.MagnoliaAccessProvider" />
</WorkspaceSecurity>

Versioning

FileSystem

Versioning level virtual file system passed to the persistence manager. The same configuration options are available here as described above for the repository level virtual file system.

See: http://jackrabbit.apache.org/jcr/jackrabbit-configuration.html#file-system-configuration

PersistenceManager

Persistence configuration for the version store. The versioning configuration is much like workspace configuration as they are both used by Jackrabbit for storing content. The same configuration options are available here as described above for the workspace level persistence manager.

See: https://wiki.apache.org/jackrabbit/PersistenceManagerFAQ

 

2 Comments

  1. An IMHO important setting is
    <param name="respectDocumentOrder" value="true"/>
    This will return nodes in the order as they are saved in the repository when queried without explicit ordering. In the H2 example file this parameter is not set

    1. Tom thank you for pointing that out. I'll fix it.