This is an implementation concept related to the UX design of Digital asset management
Terminology
This page uses the terms asset and variant interchangeably. The term variation is not used but is frequently used in discussions, it is understood that this is synonymous with variant.
A rendition is the result of transforming an asset.
A rendition configuration is an instruction of how to transform an asset.
Requirements
The DAM is a new crucial part of 5.0. It is replacing the DMS and adds a range of new features.
The DAM will have a concept of originals, where the file you upload is kept as an original that you can revert an asset back to later on. Note that this is only relevant for assets that are big (binaries) and can be altered within Magnolia. We are talking images, audio files and eventually video files (provided we have editors for these). For binaries that are altered outside the DAM like word docs, no AssetVariations exist. There may be versioning on assets though, which would allow to revert to a previous version of a binary if so configured. Variations and Versions are two different concepts and might be applied simultaneously to an AssetOriginal. AssetVariations are created especially for images, where an Original os often big, but the AssetVariations are smaller and use only part of the Original. For a typical example, imagine a group shot of three persons in high def where an author decides he needs to have three AssetVariants, each containing one of these persons. These variations will be created based on the original, but may be much smaller, as they are only needed for a web resolution. That is also why the original may not be duplicated for the AssetVariations.
Conceptual Model
The conceptual model consists of four parts.
Folder
Folders form a hierarchical structure with assets contained in folders.
Asset
An asset holds meta data and if it has been modified contains also binary data.
Every asset link to a original. Many assets can use the same original.
An asset is NOT a transformation of the original. We don't track editing operations made in order to recreate the variant from the original.
Original
An original is always used by at least one asset. (BK: thats the wrong way around. It should say every original has a default asset which is the one that a user interacts with.) (TM: an asset can be duplicated resulting in an orignal being linked to by multiple assets)
When the last asset using a particular original is deleted the original is orphaned and deleted.
An original is associated with a provider.
An original can hold binary data. This is up to the provider being used. Files uploaded to Magnolia will have binary data stored. If the provider pulls binary data from an external source such as flickr there will not be binary data in the original.
There is always a node representing the original. Even when the binary data comes from an external source.
Provider
Providers are java classes responsible for providing the binary data of originals. They don't take part in the storage layout other than being referenced by name/type on the original.
Upload pool
The upload pool is a special folder where assets are stored when the user uploads a file and don't want to decide where in the tree it should be stored. The idea is that it can/should be moved later.
Media types
We've identified five different media types:
- Images
- Audio
- Video
- Documents
- Flash (anybody using Flash these days?
)
- possibly many other types e.g. PDF, Slideshows (not PP but things like http://www.nytimes.com/slideshow/2013/01/24/travel/36-marin-slide-show.html?ref=travel ), video links (youtube…)
The feature set in DAM differs for different media types. Therefore it's of interest to highlight them in this concept.
By default, only documents are versioned. Versioning should be configurable per asset type. This could be postponed to a next version.
TM: I'm taking this from my meeting minutes. Is this final?
Renditions
Renditions are very similar to the image transformations done in imaging today. A rendition is the result of transforming an asset. A common use of renditions is to serve assets in a size and form suitable for a specific template and/or device.
Notable use cases
When a user uploads a file to the DAM an original is created holding the binary data. The original is linked to the default built in provider. An asset is created linking to the original. The asset does not have binary data. The user sees the asset while the original is hidden.
When a user duplicates an asset a new asset is created. If the asset had been modified thereby holding binary data the new asset gets a copy of this binary data. The new asset links to the same original.
When a user deletes an asset the asset node is removed from the JCR. If the original it is linked to has no other assets using it the original is also removed.
When a user interacts with the DAM he browses a tree of folders and assets. Originals are not shown.
Workspace layout
Alternative 1 - Separate workspaces
Assets in one workspace, originals in another.
The upload pool would be a special folder in the assets workspace.
Pros
Simplicity, originals never show up when navigating/enumerating the tree of assets
Cons
References will need to be by uuid stored as string only
Streaming the original requires supporting an additional workspace
Alternative 2 - Separated by path
Assets under /assets and originals under /originals
The upload pool would be a special folder under /assets.
Pros
References can be JCR properties of reference type
Cons
Navigating/enumerating asset must always start with the /assets node and never go to the root
The hierarchy of assets start at level 1, creating a special case for the topmost folder. I.e. an asset shown in the root of DAM is not in the root of the workspace. This needs to be taken into account in code accessing the workspace.
Alternative 3 - Originals in special path
Similar to alternative 2 but assets are kept directly under the root and originals are in a special folder under the root named mgnl:originals
The upload pool would be a special folder under the root.
Q: Would the folder also have a special type to make it easier to filter out? Possibly a mixin? We'll need something similar for the upload pool.
A:
Alternative 4 - Mixed in the same tree
Since an asset refers to exactly one master we could have a tree of folders and originals with the assets as sub nodes of the original.
The upload pool would be a special folder under the root.
Pros
Findig the original given an asset is trivial
Detecting when the original is not used by any assets is trivial
Enumerating the assets using a particular original is trivial
Cons
Not ideal for the workbench component as it will need to hide the original but show the assets beneath it
Ordering assets will only be possible among the assets having the same original
Enumerating/navigating the assets will require entering each original in a folder and collecting the assets from within them
Uploading a new file into an asset will change its node path
Alternative 5 - Versions
One option would be to keep the original as a version of the asset node.
The upload pool would be a special folder under the root.
Pros
Saves a node
One hierarchy
Cons
It's not possible to activate a version, a version is created as part of activation after the node has been transferred
Versions are readonly, so it will be impossible to modify the original (should we want to)
Requires special version management for DAM workspace
Exports can never include current and the original (first version)
Some customers must be able to delete _all_ versions, for instance pharma companies if they have sensitive information that must be destroyed
For some customers its a legal requirement to retain versions, for instance in banking
Additionally
Q: What is the id/name of an original? Is it important at all?
Q: How do we create a hierarchy of originals? We want to avoid a flat structure in JCR
Node Types
As nt:file is a subtype of nt:hierarchyNode adding sub nodes to a file should be possible. E.g. rich text content.
From the jackrabbit wiki
The nt:file node type represents a file with some content. The jcr:created property inherited from nt:hierarchyNode contains the file creation time, while any other file metadata and the file contents can be found in the jcr:content child node.
The jcr:content child node can be of any node type to allow maximum flexibility, but the nt:resource node type is the common choice for jcr:content nodes.
Folder
nt:folder -> mgnl:folder
Asset
nt:hierarchyNode -> nt:file -> mgnl:asset
Has a property linking to the original.
Depending on the provider has a content node containing the binary data.
Can have any property on it representing its meta data.
Orginal
nt:hierarchyNode -> nt:file -> mgnl:original
Identical to asset except it has no link to an original and does not keep meta data.
Potentially has a content node containing the binary data. Depends on the provider.
Resource
nt:resource -> mgnl:resource
Used for the content node under mgnl:asset and mgnl:original that holds the binary data.
Activation
Q: What steps will be necessary to support activation of these new node types. Simply configuration?
A: When activating an asset we will also activate the original if it hasn't already been activated or has been changed since then. We have this functionality all ready in the data module.
Activating only renditions (out of scope for 5.0)
We need to be aware that users will routinely upload large high resolution originals and then work on these to create variants that are of lower resolution due to both resizing and cropping. Consider the case of a 50 MB original. The user will upload it, possibly modify its asset, and a rendition of it is requested by the template. A rendition that very well might be a 120x300 pixel image. The only binary that is absolutely necessary on the public instance is the rendition.
This is different to how we do it today where imaging generates images just-in-time and caches them for reuse.
To make this work we'd need to:
- have permanent storage of renditions (not just cache)
- generate renditions when an asset is added or changed
- activate also all renditions when the asset node is activated but not the binary data or the original
- regenerate renditions when the rendition configuration changes
- this makes all stored renditions for this rendition configuration outdated
- remove all renditions for a rendition configuration when it is removed
We could lazily generate the renditions on the author instance when they're needed, i.e. for display or for activation. On the public instance they would need to exist pre-generated always (since the original is not there to be used as a source).
We could store renditions as child nodes of an asset node.
If a user has 50 assets and a rendition configuration, all activated and present on the public instance, and then changes the rendition configuration and activates it. He probably expects that to have immediate effect and would not expect that he has to also activate all assets. Were we to automate this for him then we would have to generate renditions for all assets and push these.
Versioning
When we version an asset, and the asset does not hold its own binary data we must also version the original it links to. The versioned asset node must link to the versioned original node.
When we restore an asset, we must NOT also restore the original as it can be in use by other assets. Therefore restoring the asset means that it should get the binary data that the original had at the time the asset was versioned. Again, this is for an asset that does not hold its own binary data.
Q: We keep a limited number of versions. Therefore references between versioned nodes risk being broken as versions are dropped. How do we manage this?
A: Versioned assets will always refer to the current original node (not a version). This works since we never change an original.
Linking
The DAM will support streaming binary data from the DAM and generating links to both assets and originals.
When generating a link to an original its provider will be responsible for producing it. This will allow providers that don't store the binary data in the workspace to output links pointing elsewhere. To flickr for instance.
Links to assets, from website for instance, will be in the form <provider id>:<asset id>. The provider id is a string identifying the provider and the asset id is used internally by the provider to identify a specific asset. We don't have requirements on the format of the assert id. For internal assets the asset id will be the uuid in dam workspace.
Links to folders use the same format, <provider id>:<asset id>.
Comparison to DMS
We want to save the assets differently than in the current DMS implementation: When the original DMS module was written, we were not able to create custom node types.
We want to simplify the clumsy current structure.
For integrations we want a structure that conforms to best practice and utilizes current JCR possibilities. This will help when integrating using for instance WebDAV and CMIS.
Migration from DMS
On the topic of migrating existing nodes in dms workspace:
Q: Should we rename "document" node to "binary"
Q: What should we do with "description_files" and its metadata node.
Discarded Proposals
Node Types
Using the usual magnolia structure by extending mgnl:content
- nt:base
- mgnl:content
- mgnl:page
- mgnl:area
- mgnl:component
- mgnl:asset
Using a flat mgnl:resource
- mgnl:resource
- mgnl:asset
Advantages:
- flat
- indexing simpler, faster
Disadvantages:
- but only one binary
- different than other content
9 Comments
Jan Haderka
Versioning is one of the things that would need to be tested and pbly modfied as well.
And we need to consider impact of such structure on exposing content via WebDAV
For multivariant assets (i18n-ed, multiple sizes etc.) it might be possibility to have extra structure:
mgnl:folder
mgnl:multiasset
mgnl:asset
mgnl:asset
mgnl:asset
this way we could potentially group assets that should be presented as one within Magnolia, but still allow each variant to keep own set of properties and be indexed separately.
Boris Kraft
I think"asset" as a name for the variants misleading. The original is also an asset. There are also assets that have no variant. I suggest to be more specific and talk of an AssetOriginal and AssetVariation.
Christopher Zimmermann
But couldn't we rather forget about the idea of variants? I think it makes it simpler. If you think of the OriginalAsset as a special object - (which it is - as it's not shown to user and not used on website) then we're left with just assets. Some assets point to an OriginalAsset, several assets can point to the same OriginalAsset, some assets point to no OriginalAsset (document).
Well, we can't totally drop the word variant - its still useful to discuss the "children" of one OriginalAsset.
Christopher Zimmermann
To overcome the duplication of media problem: What if an OriginalAsset is ONLY created if someone edit's the Asset.
Jan Haderka
+1 to Christophers suggestion for creating OriginalAsset (or AssetOriginal???) only on demand.
BTW before we can start serving such modified audio/video files, please keep in mind that Magnolia CMS is not a streaming solution and doesn't provide any support for multicast or other features necessary to stream data en-mass. 200 ppl listening to a song would fully occupy one public instance and not allow anything else to be served by such instance and even nothing to be activated to such instance! IMHO you should from the very beginning consider storing such modified binaries still in outside storage and keeping only metadata about it in Magnolia and use streaming capabilities of such outside storage.
Boris Kraft
Fine with me if original is only created on demand. The whole point is that we need to be able to revert to original, must not waste storage, and in the UI abstract away the idea that there are originals and variants. The user may safely assume that pressing duplicate in fact does duplicate the asset, but technically, we do not duplicate the original.
Tobias Mattsson
On an API level I'd want there to always be an original. The implementation though could make sure that the original and a unmodified variant uses the same binary content. Imo the variant and the original should both be nodes in JCR but an unmodified variant has no binary data.
Christopher Zimmermann
Probably obvious - but just to point it out: If we have an Original node with no binary data - and we want to use nt:file for originals - then we'd still have to have a jcr:content child node (cause nt:file enforces it.) - i guess with an empty binary.
Andreas Weder
Tobias has asked me to review this document. I've had a look at the first couple of chapters and have skipped the more technical ones. I've also had a look at your comments. So here are mine.