Your Rating: |
![]() ![]() ![]() ![]() ![]() |
Results: |
![]() ![]() ![]() ![]() ![]() |
144 | rates |
Huh, what's this about?
This page gives a quick example for how to set up magnolia with Jackrabbit JCR writing to Database only (as far as possible).
The default setup for magnolia uses the built-in DerbyDB, as well as the filesystem for storing the JCR Content. The sample configurations provided for mysql also only puts part of the repository in the database. The remaining parts land on the file system, by default within the webapp folder of magnolia.
There are a number of disadvantages to this "mixed" setup:
- Mixing file-system and DB provides for 2 points of failure.
- Mixing file-system and DB makes consistent backups more difficult. Basically, to guarantee a consistent backup, magnolia has to be shut down.
- Filesystem-backed storage means Repository is not clusterable in Jackrabbit.
Switching to a database-only setup gets rid of these disdvantages.
These instructions have been tested with Magnolia 4.3 and 4.4.
Structure of JCR Repositories
Basically, a JCR repository can have one or more workspaces. Each workspace is what is called a "repository" in magnolia: website, dms, data, imageing, config, etc...
Each workspace requires a number of different "Storage-Backends" in JCR in order to store all the different data-elements. These are:
- A "FileSystem" for content
- A "PersistanceManager" for content
- A "Datastore" for large content (blobs)
- A "FileSystem" for versions
- A "PersistanceManager" for versions
Also required is:
- A general "FileSystem" for the repository (all workspaces)
Confusingly, even though the element is called "FileSystem", both "FileSystem" and "PersistanceManager" can be configured to use a number of different backends, either file-system based, database based, or others.
Our objective is to configure everything to use database-backed storage.
So will everything be in the DB?
Unfortunately, the answer is "no". Even after configuring all storage to use database-backends, Jackrabbit will still write the following into the repositories folder:
- a config file per workspace. If this file is missing, the workspace will be reinitialized, so make sure you don't delete these!!
- the search-indexes per workspace. These can be deleted any time, and will be recreated as needed.
For more information, see the Jackrabbit Documentation and Wiki.
How to set it up?
Note: Your old repository will be gone, so make a backup fo your content first!!
- Create a database and appropriate users on the database server of your choice. You will need a seperate database for each magnolia instance, eg. one for author and one for public.
- Install appropriate JDBC drivers for your database in either WEB-INF/lib or TOMCAT_HOME/lib
- Create JNDI Datasource definitions in the web.xml, context.xml or server.xml files, see JNDI Datasources in the Apache Tomcat documentation for more details.
- Configure Jackrabbit like in the example file below. Your Jackrabbit Config file goes in the folder TOMCAT_HOME/webapps/magnoliaAuthor/WEB-INF/config/repo-conf
- Configure magnolia to use the new jackrabbit configuration. Edit TOMCAT_HOME/webapps/magnoliaAuthor/WEB-INF/config/default/magnolia.properties. Set the property "magnolia.repositories.jackrabbit.config",
eg: magnolia.repositories.jackrabbit.config=WEB-INF/config/repo-conf/jackrabbit-mysql.xml - Configure the repository home dir (where Jackrabbit will still write a few config files and the indices) to lie outside Tomcat's webapps folder.
eg: magnolia.repositories.home=c:/dev/magnolia/repo-author/repositories - Repeat steps 4-6 for the public instance.
See the following example JackRabbit config file:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 1.5//EN" "http://jackrabbit.apache.org/dtd/repository-1.5.dtd"> <Repository> <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/magnoliaAuthorDS"/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="fsrep_"/> </FileSystem> <Security appName="Jackrabbit"> <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager"></AccessManager> <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule"> <param name="anonymousId" value="anonymous" /> </LoginModule> </Security> <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/magnoliaAuthorDS"/> <param name="databaseType" value="mysql"/> <param name="schemaObjectPrefix" value="ds_" /> </DataStore> <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default" /> <Workspace name="default"> <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/magnoliaAuthorDS"/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="fsws_${wsp.name}_"/> </FileSystem> <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/magnoliaAuthorDS"/> <param name="schema" value="mysql" /><!-- warning, this is not the schema name, it's the db type --> <param name="schemaObjectPrefix" value="pm_${wsp.name}_" /> <param name="externalBLOBs" value="false" /> </PersistenceManager> <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> <param name="path" value="${wsp.home}/index" /> <param name="useCompoundFile" value="true" /> <param name="minMergeDocs" value="100" /> <param name="volatileIdleTime" value="3" /> <param name="maxMergeDocs" value="100000" /> <param name="mergeFactor" value="10" /> <param name="maxFieldLength" value="10000" /> <param name="bufferSize" value="10" /> <param name="cacheSize" value="1000" /> <param name="forceConsistencyCheck" value="false" /> <param name="autoRepair" value="true" /> <param name="analyzer" value="org.apache.lucene.analysis.standard.StandardAnalyzer" /> <param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl" /> <param name="respectDocumentOrder" value="true" /> <param name="resultFetchSize" value="2147483647" /> <param name="extractorPoolSize" value="3" /> <param name="extractorTimeout" value="100" /> <param name="extractorBackLogSize" value="100" /> <param name="textFilterClasses" value="org.apache.jackrabbit.extractor.MsWordTextExtractor, org.apache.jackrabbit.extractor.MsExcelTextExtractor, org.apache.jackrabbit.extractor.MsPowerPointTextExtractor, org.apache.jackrabbit.extractor.PdfTextExtractor, org.apache.jackrabbit.extractor.OpenOfficeTextExtractor, org.apache.jackrabbit.extractor.RTFTextExtractor, org.apache.jackrabbit.extractor.HTMLTextExtractor, org.apache.jackrabbit.extractor.PlainTextExtractor, org.apache.jackrabbit.extractor.XMLTextExtractor" /> </SearchIndex> </Workspace> <Versioning rootPath="${rep.home}/version"> <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/magnoliaAuthorDS"/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="fsver_"/> </FileSystem> <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager"> <param name="driver" value="javax.naming.InitialContext"/> <param name="url" value="java:comp/env/jdbc/magnoliaAuthorDS"/> <param name="schema" value="mysql" /><!-- warning, this is not the schema name, it's the db type --> <param name="schemaObjectPrefix" value="version_" /> <param name="externalBLOBs" value="false" /> </PersistenceManager> </Versioning> </Repository>
Things to note:
My JNDI datasource in this example is called "magnoliaAuthorDS". This is for the author instance. For the public instance, replace all occurrences of "magnoliaAuthorDS" with the JNDI name of your public instance datasource.
Notes on MySQL
When using mysql for JackRabbit, you will probably need to configure the number of connections allowed by the database server to about 200. JackRabbit opens a lot of connections.
If you are using MySQL as the Database, and want to move the Datastore (Blob storage) into the DB as well (as in the setup above) then you will need to configure mysql to handle larger binary objects via JDBC. There are instructions for doing this on the Jackrabbit wiki as well as in the mysql documentation, but for my version of mysql (5.1) it was enough to add the following to my.cnf:
max_connections = 200 max_allowed_packet = 32M
The limit specified here (in my example 32M for 32 megabytes) will be a hard limit on the maximum file size you will be able to upload to the repository.
MySQL datasource definition
You can define the datasource in your servlet container in the normal way. A standard javax.sql.Datasource definition, as described in the tomcat documentation will work. However, JackRabbit does not need the connection pooling offered by the standard datasource, and there is no easy way to disable the connection pool.
Some configurations could even be harmful, as JackRabbit keeps connections open for a very long time without using them, so if you have configured recovery of abandoned connections (recoverAbandoned=true), the connection pool may "steal back" connections JackRabbit is still using, mistakenly believing them to be abandoned.
To avoid this kind of thing from the outset you might consider configuring an unpooled datasource, for example as follows:
<Resource name="jdbc/magnoliaAuthorDS" auth="Container" type="com.mysql.jdbc.jdbc2.optional.MysqlDataSource" factory="com.mysql.jdbc.jdbc2.optional.MysqlDataSourceFactory" user="***" password="***" driverClassName="com.mysql.jdbc.Driver" explicitUrl="true" url="jdbc:mysql://localhost:3306/magnolia_author" />
You need to specify the type
so that Tomcat does not try to instantiate its own pooled DS. Two other important differences:
- the
user
property - in Tomcat's regular DS, this is calledusername
. explicitUrl
needs to be set to true unless you configure all parameters explicitly outside the url (including database name, which we don't do in this example).
Connection idle timeouts
In addition to the abandoned connection recovery at the tomcat end of things, there are also timeouts for idle connections configured at the mysql side.
This should not be a problem on a production server, where requests can be expected to come in at a somewhat constant rate. On development setups, where there may be no use of the system at all for a whole weekend, the connection-idle-timeout needs to be increased.
For mysql, add something like the following to your my.ini, and restart the server:
wait_timeout = 302400 interactive_timeout = 302400
20 Comments
Magnolia International
Richard, it would be useful if you could also share your DS configuration on the appserver side. I have one particular instance which somehow always seems to lock. Others (like documentation and forum) work perfectly fine with the same configuration, so this one's a bit bizarre, but I'd be curious to see other's configuration.
(in particular because DS tend to do connection pooling, with JackRabbit doesn't need - nor want, ideally)
Richard Unger
Hi Gregory!
I have included a sample datasource definition in the wiki-article above. Essentially, by using one of the "simple" datasource types (which are supplied with most JDBC driver packages) you can easily configure a datasource without pooling.
However, I don't think the pool is a problem in itself (it's just that JackRabbit does not use it, and never 'gives back' its connections, so the pool is a bit useless), unless you configure abandoned connection recovery. Since JackRabbit leaves its connection idle for what is sometimes a VERY long time, the pool would consider them abandoned and reclaim them, leading to problems. But with removeAbandoned=false (the default), there should be no problem with the pool.
However, we also see some "lock-ups" like you mention, but we ONLY ever see them when we shut down tomcat. In this case, the shutdown takes a very long time (>6 mins), and we see many error messages of the form:
2011-07-28 15:19:32,800 WARN rg.apache.jackrabbit.core.fs.db.DatabaseFileSystem: execute failed, about to reconnect...
2011-07-28 15:19:42,907 WARN rg.apache.jackrabbit.core.fs.db.DatabaseFileSystem: execute failed, about to reconnect...
2011-07-28 15:19:53,026 WARN rg.apache.jackrabbit.core.fs.db.DatabaseFileSystem: execute failed, about to reconnect...
2011-07-28 15:20:03,254 WARN rg.apache.jackrabbit.core.fs.db.DatabaseFileSystem: execute failed, about to reconnect...
Since we use this setup only in development, and the shutdown error messages do not seem to have any impact on the repositories, we have just been ignoring this.
It's just a hunch, but I think it has to do with the connections being idle for too long. By default, MySQL closes idle connections after 8 hours. I will try raising this value, and let you know what the effect is.
Magnolia International
Thanks !
Yeah, i've setup longer idle time in mysql - i think. I don't think I configured a DbFS (your warnings above), so I'm getting different issues. I'll investigate further, but it might indeed be related to the pool "abandoning" connection (although that should also "work", given JackRabbit's connection recovery "manager" ...)
Ha, interesting detail though, I don't specify the mysql-specific
type
(I havetype="javax.sql.DataSource"
nor afactory
in my DSs!Richard Unger
Hi Gregory!
That's the difference between pooled and unpooled. Without the factory and with type =
javax.sql.DataSource
(which is actually an interface) tomcat has "default" logic to use apache commons DBCP to create a pooled connection.By specifying the factory and type explicity I create a "simple" datasource for unpooled connections, using an implementation supplied with the JDBC driver. Most JDBC drivers have such "simple" datasource implementations, AFAIK.
Regards from Vienna,
Richard
Magnolia International
Brilliant, thanks !
Magnolia International
Wow, took me a while to figure out that the
username
property was in factuser
for the Mysql datasource. Added a note about it.Thanks again !
Richard Unger
Confirmed: the shutdown problems were due to connections that timed out at the mysql end. Increasing the connection-idle time in mysql got rid of these errors.
Magnolia International
Regarding timeout settings, Jackrabbit has a connection recovery mechanism that should circumvent timed out connections. I've had weird behavior in the past, in part due to using a pooled connexion. It's always hard to say if it really works, since it logs warnings even when still attempting to recover the connection - do you have any insight ? If it really works, it means 1) we should not use
?autoreconnect
at the driver level 2) we should not setwait_timeout
at server level.Edgar Vonk
Hi Richard (/Magnolia),
I only came across this very helpful article just now. I have some questions:
We currently use more or less default Magnolia MySQL setups but do store the filesystem storage (repositories) outside of the web app folder. Even for local development using the default Derby database. Otherwise the repositories folder would be deleted (and recreated) for every new deployment of our Magnolia installation which is not what we want (the deployable artifact being the complete WAR; which is the standard JEE deployment model). It seems Magnolia has a different deployment model in mind when they decided to place the filesystem storage by default inside the web app folder. They seem to suggest a deployment model where you replace / add / remove artifacts (like JARs) inside the web app folder. I never really understood this. This is asking for problems (artifacts out of sync for one thing) in my opinion.
cheers,
Edgar
Edgar Vonk
Ah sorry, I understand it better now. The Jackrabbit data store where by default all (>1000 bytes) JCR binaries are stored is not a type of cache: it is the only place where these binaries are stored. If you loose the data store, you loose the binaries.
So your suggestion to keep the data store in the database instead of in the filesystem makes a lot of sense. The answer to my question #2 (how do you migrate) is I think: first make an complete content export (e.g. using the Magnolia backup scripts), change the configuration, start with empty databases and perform an import?
cheers, Edgar
Jan Haderka
#2 yes, export, clean install, import is the way to go.
For the other question as you found out it's not really about Magnolia 4.4 vs. 4.5 but about whether or not you use datastore or not which is possible in both (and configured by default in 4.5).
Edgar Vonk
Thanks Jan. And I understood from other posts that the performance gain of using the datastore is very large (as opposed to not using it). I assume that this is true also when you store the datastore in a database and not on the filesystem?
Jan Haderka
In general yes, but it really depends on how well is your DB capable of handling binary data and how fast is the network connection between app and DB server (in case they are not on same host).
Richard Unger
Hi,
To be quite clear: I'm not at all recommending moving the DataStore to a database for production setups, at least not if blobs of serious size are being stored. The DB based Datastore deals with DB Blobs, which are always problematic. Also, the time required to read/write large blobs to/from the DB will block up connections for a long time, completely changing Jackrabbit's "connection behavior" and causing lock-ups and other problems under load. For a cluster, use a shared FS for the Datastore in production, locking doesn't matter as it is append-only.
And really really don't do this in production with mysql.
That warning given, if you aren't storing large blobs, or for development or testing it can still be a useful setup.
Richard Unger
HI Edgar,
Sorry, I left this unanswered a long time :-/ That post happened while I was on parental leave...
To answer your questions:
1) Yes the filesystem should be backed up in a consistent state with the database. In an ideal world you will stop the jackrabbit instance and perform the backup. In practice that can be problematic. If you use DB-Persistence, then most of the repository fs contents can be "regenerated" as you describe, but that isn't what I would call a "production ready" procedure that I would recommend as part of a backup strategy. And in any case, a fs based DataStore needs to be backup-ed.
If online backup is required you can also back up the DB (perhaps using snapshots, in a transaction or using locks or some other way that gets you consistency) and then backup the fs based DataStore at any point after the DB. Since the Datastore is append only in normal operation that might get you a few "extra" blobs compared to the DB state, but they don't hurt.
But Backup is not the only reason to want DB-only repositories. Two others that come to mind are: JCR clusters --> they need transactional persistence managers and filesystems, and it can be convenient to have the datastore in the DB for shared access. The other reason might be for development purposes - its easier to "swap out" repositories if all you need to do is change the DB.
2) Import / Export, as Jan wrote. There's no other way, to my knowledge.
3) Not directly in magnolia 4.5, but we'll be migrating our config for this to magnolia 5 soon. But this is all at the Jackrabbit level, and should be transparent to magnolia, really.
Hope you're doing well! Regards from Vienna,
Richard
Viet Nguyen
→ OK, configure 'useSimpleFSDirectory=false' and 'directoryManagerClass=org.apache.jackrabbit.core.query.lucene.directory.RAMDirectoryManager' in WEB-INF/config/repo-conf/jackrabbit-memory-search.xml as below to store all your indexes into memory (RAM):
Hope this useful.
Viet
Training Participants - FullStack Developer
hallo Magnolia
I am working to migrate my CMS to a new Instance and I have this error
Caused by: java.io.FileNotFoundException: /srv/tomcat/tomcat_magnolia-public/repositories/magnolia/repository/datastore/bd/0d/48/bd0d4851b0a5a4911393b3953b0db98382aff8ec
How can i fix this? I Think I need delete all index from Magnolia repository, what are you think?
Regards
DL
Richard Unger
Hi!
I would assume that rebuilding the index will NOT help in this situation. What is happening here is that JCR is not finding a BLOB that it expecting to find in its Datastore.
The likely explanation is that you in some way "copied" a repository, but forgot about the filesystem datastore, or moved it to the wrong location, or didn't mount its filesystem, or something along those lines...
Regards from Vienna,
Richard
Training Participants - FullStack Developer
Hi Richard
Thanks for your quickly replied. I didn't move the filesystem
Only I change the instance and set a new Path where will be the new repositories
I suppose when do you deploy the CMS created the new data store and new index
Maybe I am wrong
Best Regards from Switzerland
DL
Patrick Robinson
I tried to follow the most basic version of this and adapted the postgres docker magnolia image (https://github.com/magnolia-sre/magnolia-docker) to work with mysql instead. The goal would be to have a starting point for a stateless docker image with all data in MySQL, but I don't even get it to start properly in docker compose, which is virtually identical to the original, but I used JNDI and only referenced it from the repo.xml (and of course for filesystem and datastore in addition to just PM).
What I get is a "This implementation of ModuleManagerUI is only meant to be used at startup.". Supposedly that is because something in the repository already exists, but I definitively start with an empty database, and as the point of the whole enterprise was to remove any changes on the filesystem at runtime, I definitively have no filesystem content, beyond the starting point within docker.
It does create various tables in mysql, though many, but not all are empty.