Page tree
Skip to end of metadata
Go to start of metadata

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 140 rates

The goal of this page is to provide informations on how to switch underlying database (Derby is used by default). The procedure is different between a freshly installed Magnolia CMS instance and an already started instance.

A newer version of this is available. See Jackrabbit Repository Configuration File which is based on Jackrabbit 2.8.

 

Introduction

Apache Jackrabbit is a fully featured content repository that implements the entire JCR API (JCR is the acronym of the Content Repository for Java technology API, a standard interface for accessing content repositories). The JR uses one connection per workspace and keeps it open as long as the repository is open.

A persistence manager (PM) is an internal Jackrabbit component that handles the persistent storage of content nodes and properties. Each workspace of a Jackrabbit content repository uses a separate persistence manager to store the content in that workspace. Also the Jackrabbit version handler uses a separate persistence manager. The persistence manager sits at the very bottom layer of the Jackrabbit system architecture. Reliability, integrity and performance of the PM are crucial to the overall stability and performance of the repository. If e.g. the data that a PM is based upon is allowed to change through external means the integrity of the repository would be at risk (think of referential integrity / node references e.g.).

In practice, a persistence manager is any Java class that implements the PersistenceManager interface and the associated behavioral contracts. Jackrabbit contains a set of built-in persistence manager classes that cover most of the deployment needs. There are also a few contributed persistence managers that give additional flexibility.

As of JackRabbit 1.3.x some changes to the persistence managers have been introduced.
Use of org.apache.jackrabbit.core.state.db.*PersistenceManager have been deprecated and two new types of managers have been introduced:

  • Bundle PM Deprecated.
  • Pooled PM is available since Jackrabbit 2.0.

Magnolia is shipped with DerbyPersistenceManager as the default one. Since Magnolia 3.6 (newly shipped with Jackrabbit 1.4) is used the bundle version. If you would like to use another database as a backend or if you would like to use just another PM you need to follow the procedure bellow. It's the same for all PMs provided, you only have to specify the PM which cover your needs. The following guide describes only the author instance configuration, you should do the same also for your public instance(s).

  • Install and configure your database on your server and create an empty database such as magnolia_author (or similar) on it.
    • Most relational databases can be used as a persistence mechanism today. Specifically, we have used MySQL, DB2, BerkleyDB, Derby, Oracle, MS SQL, PostgreSQL, Ingres DB and various other databases. If you use MySQL, select the InnoDB storage engine. MyISAM engine is not supported by Magnolia since it's not transactional thus produces unnecessary issues.
  • Take a fresh not yet started instance, remove the current derby-x.jar and put your db driver jar into magnoliaAuthor/WEB-INF/libfolder. F.e. for
  • follow to magnoliaAuthor/WEB-INF/config/repo-conf and check if there is a appropriate Jackrabbit DB persistence manager configuration file which correspond with your db (note that various versions of Magnolia provides different group of those). For bundle PM you would need the jackrabbit-bundle-your_db-search.xml, for simple PM the jackrabbit-your_db-search.xml (if there is not present the PM config file, just copy some of the existing and rename it according your db). Then edit it as follows (examples for bundle PMs):
    • MySQL

      <PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.MySqlPersistenceManager">
            <param name="driver" value="com.mysql.jdbc.Driver" />
            <param name="url" value="jdbc:mysql://<database-host>:3306/magnolia_author" />
            <param name="databaseType" value="mysql" />
            <param name="user" value="user" />
            <param name="password" value="pwd" />
            <param name="schemaObjectPrefix" value="${wsp.name}_" /><!-- read notes below to set this value properly -->
            <param name="externalBLOBs" value="false" />
      </PersistenceManager>
      
    • Oracle

      <PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.OraclePersistenceManager">
            <param name="driver" value="oracle.jdbc.OracleDriver"/>
            <param name="url" value="jdbc:oracle:thin:@<database-host>:1521:magnolia_author"/>
            <param name="databaseType" value="oracle" />
            <param name="user" value="myuser"/>
            <param name="password" value="mypassword"/>
            <param name="schemaObjectPrefix" value="${wsp.name}_"/><!-- read notes below to set this value properly -->
            <param name="externalBLOBs" value="false"/>
      </PersistenceManager>
      
    • MS SQL

      <PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.MSSqlPersistenceManager">
            <param name="driver" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" />
            <param name="url" value="jdbc:sqlserver://<database-host>:1433;DatabaseName=magnolia_author" />
            <param name="databaseType" value="mssql" />
            <param name="user" value="user" />
            <param name="password" value="pwd" />
            <param name="schemaObjectPrefix" value="${wsp.name}_" /><!-- read notes below to set this value properly -->
            <param name="externalBLOBs" value="false" />
      </PersistenceManager>
      
    • Azure DB

      <PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.MSSqlPersistenceManager">
            <param name="driver" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" />
            <param name="url" value="jdbc:sqlserver://<database-host>:1433;DatabaseName=magnolia_author" />
            <param name="schema" value="azure" />
            <param name="user" value="user" />
            <param name="password" value="pwd" />
            <param name="schemaObjectPrefix" value="${wsp.name}_" /><!-- read notes below to set this value properly -->
            <param name="externalBLOBs" value="false" />
      </PersistenceManager>
    • PostgreSQL

      <PersistenceManager class="org.apache.jackrabbit.core.persistence.pool.PostgreSQLPersistenceManager">
            <param name="driver" value="org.postgresql.Driver"/>
            <param name="url" value="jdbc:postgresql://<database-host>:5432/magnolia_author"/>
            <param name="databaseType" value="postgresql"/>
            <param name="user" value="user"/>
            <param name="password" value="pwd"/>
            <param name="schemaObjectPrefix" value="${wsp.name}_"/><!-- read notes below to set this value properly -->
            <param name="externalBLOBs" value="false"/>
      </PersistenceManager>
      
  • Don't forget that you have to specify this PM config twice in your xml - for workspaces and for versioning. The data connection is used for multiple workspaces and therefore requires the dynamic prefix ${wsp.name}_, while the versioning PM requires a static prefix version_. This is just the only one difference between both definitions:
<param name="schemaObjectPrefix"value="version_"/>
  • Don't forget to change the user and password parameter values to the values required to database username and password.
  • If you would like to use a simple PM, just update the class property with appropriate value.
  • If you need to run multiple instances of magnolia against same database schema, you can add additional prefix to schemaObjectPrefix definitions of both PM definitions (e.g. auth_${wsp.name}_ and auth_version_). Be aware of your db table name limit.
  • It's supported also the in-memory and file-system based PM.
  • You can store you BLOBs directly in your file system, then set the value of this property to true. The drawback is that if you want to backup your system you have to backup both (database and file system) and do that at the same time
    to ensure the sync (there are no missing files or anything). On the other hand in certain cases this speeds up the work with blobs and brings other benefits.

 

  • follow to magnoliaAuthor/WEB-INF/config/default/magnolia.properties and update the magnolia.repositories.jackrabbit.config property to match with the config file you've configured in previous step.
  • startup Magnolia and check the logs.

Already-started Magnolia CMS

The switching between simple and bundle PMs cases the errors because these use incompatible storage structures (tables and indexes). There are multiple different ways of how you can migrate your content:

  • setup new author instance using appropriate db and activate all the content to it, then setup new public and activate content from your new author to the new public instance.
  • do the same as above, but instead of activating content, you can export content from old instances and import it in new ones using jsp scripts
  • export/import your data from the instance with simple PM to the instance with bundle PM using Tools: Export/Import features.
  • use the backup module (since Magnolia 3.6 EE) to create a backup of your current environment, after the backup is made (it will create bung of gz files with all your data) you will have to unzip repoConf.gz and modify the workspace config files to match workspace configuration of your new instance and zip it again. After that you can use script from magnolia tools to restore such backup over your new mysql backed instance. Be aware that if you run the backup it will likely take few hours if you repo is really big as it had to get all item states, references and complete version history.
  • packager allows you to collect whatever (context, config, scripts, repos etc.) you need to one package which is then easy to transfer.
  • use migration by groovy script

The main difference between approaches above is that only the backup module will preserve the version history. On the other hand if the history is not critical, others approaches allows you to do the cleanup of the instance as you can control what will be migrated to new instance and what not. Also not copying version history will mean your new database will consume significantly less space then the old one.

Additional notes

  • more about Jackrabbit's PMs you can find f.e. here
  • for MySQL you would also need to change max_allowed_packet size to a higher value, by default MySQL server packet limit is set to 1MB which is way too small for Jackrabbit bundle persistence manager. Follow to appropriate MySQL reference
    • you can set this property in MySQL server configuration file my.cnf (linux) or my.ini (windows) as max_allowed_packet=16M.

DataStore

Another concept related to data storage that JR introduces is so called DataStore (since Jackrabbit 1.4 which is shipped since Magnolia 3.6). It can be used to emulate File System in the DBs with the support for such functionality (e.g. Oracle). If you want to take advantage of Jackrabbit's new DataStore feature, you will have to migrate your repository; modifying your existing repository configuration to add the DataStore feature, without migrating it, will result in the loss of blobs. For more informations please follow to official DataStore documentation.

Data storage and backup

BLOBs are not by default stored in the database when they exceed a certain threshold definied in your Jackrabbit configuration - instead they are saved on the file system. The default threshold used by a Magnolia installation is 1024 bytes. All files above the defined threshold are put onto the filesystem and not in the database.

Example from a default Magnolia MySQL configuration using a FileDataStore:

<DataStore class="org.apache.jackrabbit.core.data.FileDataStore">
    <param name="path" value="${rep.home}/repository/datastore"/>
    <param name="minRecordLength" value="1024"/>
</DataStore>

You can find more information about this topic in the Jackrabbit documentation.

So unless you changed the configuration to save everything to the database you also have to backup the filesystem to be save to not lose any data!

 

 

8 Comments

    • Using autoReconnect=true in connection URLs is discouraged by both the MySql documentation and JackRabbit developers; JackRabbit bundle PMs have connection recovery mechanisms (since JR 1.4, but not before. see this issue).
    • externalBLOBS: set to true will store binary data on filesystem instead of in the database.
    • Remember to change the database name (url) for your public instance(s), esp. if they are on the same server.
  1. Atlassian also made some interesting research about the autoReconnect parameter of MySQL's driver, which I accidentally bumped into while setting up our Jira: http://jira.atlassian.com/browse/JRA-15731
    --> when using a DataSource, one needs to double-check which version of dbcp (the connection pool) is used ! Some versions don't behave so well !

  2. Hey, how about updating this for JR2.2 with <DataSource> elements ?

  3. This page may be of interest in relation to Jackrabbit repository configuration: Database-Only Repositories with JNDI Datasources

  4. And this page may be of interest if you're having "too many open files" troubles with Derby: H2 database configuration

  5. Does anyone have experience with setting the validationQueryTimeout property and it's effect on responsiveness of Magnolia when a lot / all connections have timed out?

    We're running into issues with an Oracle JDBC setup, getting long waiting times for shutdown / login after a period of inactivity on a system.

    1. It's the SQL query that will be used to validate connections from the pool before returning them to the caller. The query depends on the database type. For Oracle it's select 1 from dual.

      As far as it's effect on responsiveness, I'd have to say that depends on the environment. To run the query select 1 from dual should be instantaneous. If you are having problems with responsiveness then I'd investigate if the network is causing some bottleneck. 

      HTH