| Your Rating: |
![]() ![]() ![]() ![]()
|
Results: |
![]() ![]() ![]() ![]()
|
9 | rates |
Huh, what's this about?
This page gives a quick example for how to set up magnolia with Jackrabbit JCR writing to Database only (as far as possible).
The default setup for magnolia uses the built-in DerbyDB, as well as the filesystem for storing the JCR Content. The sample configurations provided for mysql also only put part of the repository in the Database. The remaining parts land on the file system, by default within the webapp folder of magnolia.
There are a number of disadvantages to this "mixed" setup:
- Mixing file-system and DB provides for 2 points of failure.
- Mixing file-system and DB makes consistent backups more difficult. Basically, to guarantee a consistent backup, magnolia has to be shut down.
- Filesystem-backed storage means Repository is not clusterable in Jackrabbit.
Switching to a database-only setup gets rid of these disdvantages.
These instructions have been tested with Magnolia 4.3 and 4.4.
Structure of JCR Repositories
Basically, a JCR repository can have one or more workspaces. Each workspace is what is called a "repository" in magnolia: website, dms, data, imageing, config, etc...
Each workspace requires a number of different "Storage-Backends" in JCR in order to store all the different data-elements. These are:
- A "FileSystem" for content
- A "PersistanceManager" for content
- A "Datastore" for large content (blobs)
- A "FileSystem" for versions
- A "PersistanceManager" for versions
Also required is:
- A general "FileSystem" for the repository (all workspaces)
Confusingly, even though the element is called "FileSystem", both "FileSystem" and "PersistanceManager" can be configured to use a number of different backends, either file-system based, database based, or others.
Our objective is to configure everything to use database-backed storage.
So will everything be in the DB?
Unfortunately, the answer is "no". Even after configuring all storage to use database-backends, Jackrabbit will still write the following into the repositories folder:
- a config file per workspace. If this file is missing, the workspace will be reinitialized, so make sure you don't delete these!!
- the search-indexes per workspace. These can be deleted any time, and will be recreated as needed.
For more information, see the Jackrabbit Documentation and Wiki.
How to set it up?
Note: Your old repository will be gone, so make a backup fo your content first!!
- Create a database and appropriate users on the database server of your choice. You will need a seperate database for each magnolia instance, eg. one for author and one for public.
- Install appropriate JDBC drivers for your database in either WEB-INF/lib or TOMCAT_HOME/lib
- Create JNDI Datasource definitions in the web.xml, context.xml or server.xml files, see JNDI Datasources in the Apache Tomcat documentation for more details.
- Configure Jackrabbit like in the example file below. Your Jackrabbit Config file goes in the folder TOMCAT_HOME/webapps/magnoliaAuthor/WEB-INF/config/repo-conf
- Configure magnolia to use the new jackrabbit configuration. Edit TOMCAT_HOME/webapps/magnoliaAuthor/WEB-INF/config/default/magnolia.properties. Set the property "magnolia.repositories.jackrabbit.config",
eg: magnolia.repositories.jackrabbit.config=WEB-INF/config/repo-conf/jackrabbit-mysql.xml - Configure the repository home dir (where Jackrabbit will still write a few config files and the indices) to lie outside Tomcat's webapps folder.
eg: magnolia.repositories.home=c:/dev/magnolia/repo-author/repositories - Repeat steps 4-6 for the public instance.
See the following example JackRabbit config file:
Things to note:
My JNDI datasource in this example is called "magnoliaAuthorDS". This is for the author instance. For the public instance, replace all occurrences of "magnoliaAuthorDS" with the JNDI name of your public instance datasource.
Notes on MySQL
When using mysql for JackRabbit, you will probably need to configure the number of connections allowed by the database server to about 200. JackRabbit opens a lot of connections.
If you are using MySQL as the Database, and want to move the Datastore (Blob storage) into the DB as well (as in the setup above) then you will need to configure mysql to handle larger binary objects via JDBC. There are instructions for doing this on the Jackrabbit wiki as well as in the mysql documentation, but for my version of mysql (5.1) it was enough to add the following to my.cnf:
The limit specified here (in my example 32M for 32 megabytes) will be a hard limit on the maximum file size you will be able to upload to the repository.
MySQL datasource definition
You can define the datasource in your servlet container in the normal way. A standard javax.sql.Datasource definition, as described in the tomcat documentation will work. However, JackRabbit does not need the connection pooling offered by the standard datasource, and there is no easy way to disable the connection pool.
Some configurations could even be harmful, as JackRabbit keeps connections open for a very long time without using them, so if you have configured recovery of abandoned connections (recoverAbandoned=true), the connection pool may "steal back" connections JackRabbit is still using, mistakenly believing them to be abandoned.
To avoid this kind of thing from the outset you might consider configuring an unpooled datasource, for example as follows:
You need to specify the type so that Tomcat does not try to instantiate its own pooled DS.
Two other important differences:
- the
userproperty - in Tomcat's regular DS, this is calledusername. explicitUrlneeds to be set to true unless you configure all parameters explicitly outside the url (including database name, which we don't do in this example).
Connection idle timeouts
In addition to the abandoned connection recovery at the tomcat end of things, there are also timeouts for idle connections configured at the mysql side.
This should not be a problem on a production server, where requests can be expected to come in at a somewhat constant rate. On development setups, where there may be no use of the system at all for a whole weekend, the connection-idle-timeout needs to be increased.
For mysql, add something like the following to your my.ini, and restart the server:

13 Comments
Hide/Show CommentsJul 14, 2011
Grégory Joseph
Richard, it would be useful if you could also share your DS configuration on the appserver side. I have one particular instance which somehow always seems to lock. Others (like documentation and forum) work perfectly fine with the same configuration, so this one's a bit bizarre, but I'd be curious to see other's configuration.
(in particular because DS tend to do connection pooling, with JackRabbit doesn't need - nor want, ideally)
Jul 28, 2011
Richard Unger
Hi Gregory!
I have included a sample datasource definition in the wiki-article above. Essentially, by using one of the "simple" datasource types (which are supplied with most JDBC driver packages) you can easily configure a datasource without pooling.
However, I don't think the pool is a problem in itself (it's just that JackRabbit does not use it, and never 'gives back' its connections, so the pool is a bit useless), unless you configure abandoned connection recovery. Since JackRabbit leaves its connection idle for what is sometimes a VERY long time, the pool would consider them abandoned and reclaim them, leading to problems. But with removeAbandoned=false (the default), there should be no problem with the pool.
However, we also see some "lock-ups" like you mention, but we ONLY ever see them when we shut down tomcat. In this case, the shutdown takes a very long time (>6 mins), and we see many error messages of the form:
2011-07-28 15:19:32,800 WARN rg.apache.jackrabbit.core.fs.db.DatabaseFileSystem: execute failed, about to reconnect...
2011-07-28 15:19:42,907 WARN rg.apache.jackrabbit.core.fs.db.DatabaseFileSystem: execute failed, about to reconnect...
2011-07-28 15:19:53,026 WARN rg.apache.jackrabbit.core.fs.db.DatabaseFileSystem: execute failed, about to reconnect...
2011-07-28 15:20:03,254 WARN rg.apache.jackrabbit.core.fs.db.DatabaseFileSystem: execute failed, about to reconnect...
Since we use this setup only in development, and the shutdown error messages do not seem to have any impact on the repositories, we have just been ignoring this.
It's just a hunch, but I think it has to do with the connections being idle for too long. By default, MySQL closes idle connections after 8 hours. I will try raising this value, and let you know what the effect is.
Jul 28, 2011
Grégory Joseph
Thanks !
Yeah, i've setup longer idle time in mysql - i think. I don't think I configured a DbFS (your warnings above), so I'm getting different issues. I'll investigate further, but it might indeed be related to the pool "abandoning" connection (although that should also "work", given JackRabbit's connection recovery "manager" ...)
Ha, interesting detail though, I don't specify the mysql-specific
type(I havetype="javax.sql.DataSource"nor afactoryin my DSs!Aug 04, 2011
Richard Unger
Hi Gregory!
That's the difference between pooled and unpooled. Without the factory and with type =
javax.sql.DataSource(which is actually an interface) tomcat has "default" logic to use apache commons DBCP to create a pooled connection.By specifying the factory and type explicity I create a "simple" datasource for unpooled connections, using an implementation supplied with the JDBC driver. Most JDBC drivers have such "simple" datasource implementations, AFAIK.
Regards from Vienna,
Richard
Aug 04, 2011
Grégory Joseph
Brilliant, thanks !
Sep 02, 2011
Grégory Joseph
Wow, took me a while to figure out that the
usernameproperty was in factuserfor the Mysql datasource. Added a note about it.Thanks again !
Aug 04, 2011
Richard Unger
Confirmed: the shutdown problems were due to connections that timed out at the mysql end. Increasing the connection-idle time in mysql got rid of these errors.
Sep 12, 2011
Grégory Joseph
Regarding timeout settings, Jackrabbit has a connection recovery mechanism that should circumvent timed out connections. I've had weird behavior in the past, in part due to using a pooled connexion. It's always hard to say if it really works, since it logs warnings even when still attempting to recover the connection - do you have any insight ? If it really works, it means 1) we should not use
?autoreconnectat the driver level 2) we should not setwait_timeoutat server level.Feb 12, 2013
Edgar Vonk
Hi Richard (/Magnolia),
I only came across this very helpful article just now. I have some questions:
We currently use more or less default Magnolia MySQL setups but do store the filesystem storage (repositories) outside of the web app folder. Even for local development using the default Derby database. Otherwise the repositories folder would be deleted (and recreated) for every new deployment of our Magnolia installation which is not what we want (the deployable artifact being the complete WAR; which is the standard JEE deployment model). It seems Magnolia has a different deployment model in mind when they decided to place the filesystem storage by default inside the web app folder. They seem to suggest a deployment model where you replace / add / remove artifacts (like JARs) inside the web app folder. I never really understood this. This is asking for problems (artifacts out of sync for one thing) in my opinion.
cheers,
Edgar
Feb 13, 2013
Edgar Vonk
Ah sorry, I understand it better now. The Jackrabbit data store where by default all (>1000 bytes) JCR binaries are stored is not a type of cache: it is the only place where these binaries are stored. If you loose the data store, you loose the binaries.
So your suggestion to keep the data store in the database instead of in the filesystem makes a lot of sense. The answer to my question #2 (how do you migrate) is I think: first make an complete content export (e.g. using the Magnolia backup scripts), change the configuration, start with empty databases and perform an import?
cheers, Edgar
Feb 14, 2013
Jan Haderka
#2 yes, export, clean install, import is the way to go.
For the other question as you found out it's not really about Magnolia 4.4 vs. 4.5 but about whether or not you use datastore or not which is possible in both (and configured by default in 4.5).
Feb 14, 2013
Edgar Vonk
Thanks Jan. And I understood from other posts that the performance gain of using the datastore is very large (as opposed to not using it). I assume that this is true also when you store the datastore in a database and not on the filesystem?
Feb 14, 2013
Jan Haderka
In general yes, but it really depends on how well is your DB capable of handling binary data and how fast is the network connection between app and DB server (in case they are not on same host).