JR BackupTool have been developed 2 years ago during "Google's Summer of Code". It is currently not supported, but readily available at Apache JR repository. Using this tool should minimize effort necessary to develop and maintain backup functionality in Magnolia.
Review implementation of JR BackupTool and customize it to use from inside of Magnolia to perform full backup and restore.
- documented at http://wiki.apache.org/jackrabbit/BackupTool
- will investigate workarounds and effort in making this hot pluggable
- single threaded sequential backup (i.e. node by node, workspace by workspace) - can be quite time consuming to run on big sites
- no backup of lucene indexes for now
- only full backup/restore for now, no partial backup/restore functionality
+ backup/restore includes versions
- even tho there was list of plans and tasks to further develop the tool, it appears dormant since aug/2006 when it was developed
- url changed in comparison to location mentioned in the documentation - http://svn.apache.org/repos/asf/jackrabbit/sandbox/backup/
- another options include:
- would be to look into jcr-imp/exp tool http://svn.apache.org/repos/asf/jackrabbit/sandbox/jackrabbit-jcr-import-export-tool/
- standalone JCR backup tool - JeCARS http://sourceforge.net/projects/jecars/
- initial speed test - to export our default webapp with samples takes about 3 minutes and produces about 3MB file (after some changes to the code this time was reduced to 1.5 minute).
- to use backup tool on running magnolia two solutions are possible:
- we have to either modify it to obtain repo reference by JNDI (or by other means)
- or we can incorporate backup tool in magnolia and pass it reference to repo from MgnlContext
- needs further development when used with JR 1.4 (configured to use
- it is possible to run backup from inside of magnolia while still serving requests even thought it is not officially supported by the tool. However any writes that happens while backup is still running are not reflected by the backup which could lead to inconsistencies in for example referenced links. To ensure consistency of the backup it will be necessary to ensure no nodes are written to the repo while backup is still running.
- Currently it is impossible to restore versions due to some internal properties handling in JR. While versions are restored during import they are restored to transient store only and removed on session.save() operation.
- There's a bunch of bugs opened in JR related to this issue - jcr:created property is not honoured on restore, nodes are saved one by one increasing load on DB and slowing down the whole operation, etc.
- using maven 1 (converted to m2)
- done against JR 1.1 (doesn't compile against JR 1.3/1.4 (exceptions changed, fixed it))
- poor javadoc
- when running no problems are reported (e.g. misconfiguration, etc), it just finishes silently
- extra code is needed to register custom prefixes for workspaces (tools handles only custom node types)
- restoring of versions is broken, needs deeper investigation
No writes can happen while backup is running to ensure backup consistency. To do this we need to prevent writing from following scenarios/operations:
- direct editing via admininterface
- editing of pages (adding editing paragraphs)
- workflows (approving, rejecting items, comments)
- writeback (forum, polls)
- timed tasks (custom tasks created by customer if any)