This article helps customer to copy/synchronize workspace content and configuration between Magnolia instances by providing some insights and steps how to do it.

Background information

In Magnolia CMS content stored mostly in different workspaces, some (or all) workspaces making up a repository stored under your configured folder "magnolia.repositories.home" in magnolia configuration file "WEB-INF/config/default/magnolia.properties". Also the repository configuration file located under "magnolia.repositories.config" configuration point, this helps customers freely configure their configuration file location out of the content storage and protect them from illegal access. The workspaces config file is configured follow "magnolia.repositories.jackrabbit.config" configuration point by the form of "jackrabbit-bundle-xxxx-default.xml" where "xxxx" is the back end database that customers want to use. We provided some templates for H2, MySQL, Postgres and Derby.

The basic initial configuration files "repositories.xml" and "jackrabbit-bundle-xxxx-default.xml" files are template files and just involve in repository and workspace generation. During running process, the "repositories.xml" still being read but "jackrabbit-bundle-xxxx-default.xml" is not involve anymore. The "jackrabbit-bundle-xxxx-default.xml" was read as a template to process / generate detail workspace configuration files under each generated workspace folder with the pattern "<repository_home>/magnolia/workspaces/<workspace_name>/workspace.xml".

Changing "jackrabbit-bundle-xxxx-default.xml" during runtime have no affect to the existing / generated files, however changing the target files ("workspace.xml") files affect running instances for the next restart (because these files being loaded on system startup).

Moreover Magnolia Content types module helps creating new nodetypes, workspaces and namespaces on the fly when using Magnolia (reference here for more information). This feature changes Repository configuration files and workspaces at runtime environment. So synchronization of this kind of workspaces need our attention to also synchronize its related information for consistency and proper working environment. So when copying or syncing workspace content, besides the "real" content storing in DB, the related workspace configuration files also need to be synchronized. Below is an example of workspace configuration files:

Step-by-step guide

When customer creating a new content type with its name space, nodetype and workspace, below files and content need to be synchronized:

  1. Namespace definitions: under "repository/namespaces" folder, you can find your custom namespace registry and its index are stored as text file in "ns_idx.properties" and "ns_reg.properties". Please copy your custom one to the target environment for its synchronization
  2. Nodetype definitions: similarly the custom nodetype definition is stored under "repository/nodetypes" folder in "custom_nodetypes.xml" file. This will not be auto generated if you are not starting up a clean and clear Magnolia instance. So you would also have to merge your existing one (in target environment) with the one defined there. Otherwise and error of invalid nodetype may happen when accessing content having that nodetype after your data synchronization.
  3. Workspace configuration: which storing the detail workspace configuration under "workspaces/<your_workspace_name>/workspace.xml" file. If you want to change PersistenceManager class, SearchIndex class, excerptProviderClass or AccessControlProvider class, you need to change it in each of your workspace configuration file. These files would be used in the next system startup process.
  4. Index and lock: for your convenience, you can remove all files and folder under "index" folder. System would regenerate, reindex the whole workspace for you again in its restart. This ensure the repository consistency and clean up all un-synchronized indexes. For content sync process, this folder should not be copied over different instances. It need to be cleaned up in the target instance instead.
  5. The real content: The content usually be stored in customer's configured DB tables with the name prefix according to "schemaObjectPrefix" name, usually be:
    1. pm_${wsp.name}_NODE (table), 
    2. pm_${wsp.name}_NODE_IDX (index), 
    3. pm_${wsp.name}_PROP (table),
    4. pm_${wsp.name}_PROP_IDX (index), 
    5. pm_${wsp.name}_REFS (table),
    6. pm_${wsp.name}_REFS_IDX (index), 
    7. pm_${wsp.name}_BINVAL (table),
    8. pm_${wsp.name}_BINVAL_IDX (index)


Repository configuration documentation

Workspace configuration documentation

Jackrabbit configuration wiki