This is an implementation concept related to the UX design of Digital asset management
This page uses the terms asset and variant interchangeably. The term variation is not used but is frequently used in discussions, it is understood that this is synonymous with variant.
A rendition is the result of transforming an asset.
A rendition configuration is an instruction of how to transform an asset.
The DAM is a new crucial part of 5.0. It is replacing the DMS and adds a range of new features.
The DAM will have a concept of originals, where the file you upload is kept as an original that you can revert an asset back to later on. Note that this is only relevant for assets that are big (binaries) and can be altered within Magnolia. We are talking images, audio files and eventually video files (provided we have editors for these). For binaries that are altered outside the DAM like word docs, no AssetVariations exist. There may be versioning on assets though, which would allow to revert to a previous version of a binary if so configured. Variations and Versions are two different concepts and might be applied simultaneously to an AssetOriginal. AssetVariations are created especially for images, where an Original os often big, but the AssetVariations are smaller and use only part of the Original. For a typical example, imagine a group shot of three persons in high def where an author decides he needs to have three AssetVariants, each containing one of these persons. These variations will be created based on the original, but may be much smaller, as they are only needed for a web resolution. That is also why the original may not be duplicated for the AssetVariations.
The conceptual model consists of four parts.
Folders form a hierarchical structure with assets contained in folders.
An asset holds meta data and if it has been modified contains also binary data.
Every asset link to a original. Many assets can use the same original.
An asset is NOT a transformation of the original. We don't track editing operations made in order to recreate the variant from the original.
An original is always used by at least one asset. (BK: thats the wrong way around. It should say every original has a default asset which is the one that a user interacts with.) (TM: an asset can be duplicated resulting in an orignal being linked to by multiple assets)
When the last asset using a particular original is deleted the original is orphaned and deleted.
An original is associated with a provider.
An original can hold binary data. This is up to the provider being used. Files uploaded to Magnolia will have binary data stored. If the provider pulls binary data from an external source such as flickr there will not be binary data in the original.
There is always a node representing the original. Even when the binary data comes from an external source.
Providers are java classes responsible for providing the binary data of originals. They don't take part in the storage layout other than being referenced by name/type on the original.
The upload pool is a special folder where assets are stored when the user uploads a file and don't want to decide where in the tree it should be stored. The idea is that it can/should be moved later.
We've identified five different media types:
- Flash (anybody using Flash these days? )
- possibly many other types e.g. PDF, Slideshows (not PP but things like http://www.nytimes.com/slideshow/2013/01/24/travel/36-marin-slide-show.html?ref=travel ), video links (youtube…)
The feature set in DAM differs for different media types. Therefore it's of interest to highlight them in this concept.
By default, only documents are versioned. Versioning should be configurable per asset type. This could be postponed to a next version.
TM: I'm taking this from my meeting minutes. Is this final?
Renditions are very similar to the image transformations done in imaging today. A rendition is the result of transforming an asset. A common use of renditions is to serve assets in a size and form suitable for a specific template and/or device.
Notable use cases
When a user uploads a file to the DAM an original is created holding the binary data. The original is linked to the default built in provider. An asset is created linking to the original. The asset does not have binary data. The user sees the asset while the original is hidden.
When a user duplicates an asset a new asset is created. If the asset had been modified thereby holding binary data the new asset gets a copy of this binary data. The new asset links to the same original.
When a user deletes an asset the asset node is removed from the JCR. If the original it is linked to has no other assets using it the original is also removed.
When a user interacts with the DAM he browses a tree of folders and assets. Originals are not shown.
Alternative 1 - Separate workspaces
Assets in one workspace, originals in another.
The upload pool would be a special folder in the assets workspace.
Simplicity, originals never show up when navigating/enumerating the tree of assets
References will need to be by uuid stored as string only
Streaming the original requires supporting an additional workspace
Alternative 2 - Separated by path
Assets under /assets and originals under /originals
The upload pool would be a special folder under /assets.
References can be JCR properties of reference type
Navigating/enumerating asset must always start with the /assets node and never go to the root
The hierarchy of assets start at level 1, creating a special case for the topmost folder. I.e. an asset shown in the root of DAM is not in the root of the workspace. This needs to be taken into account in code accessing the workspace.
Alternative 3 - Originals in special path
Similar to alternative 2 but assets are kept directly under the root and originals are in a special folder under the root named mgnl:originals
The upload pool would be a special folder under the root.
Q: Would the folder also have a special type to make it easier to filter out? Possibly a mixin? We'll need something similar for the upload pool.
Alternative 4 - Mixed in the same tree
Since an asset refers to exactly one master we could have a tree of folders and originals with the assets as sub nodes of the original.
The upload pool would be a special folder under the root.
Findig the original given an asset is trivial
Detecting when the original is not used by any assets is trivial
Enumerating the assets using a particular original is trivial
Not ideal for the workbench component as it will need to hide the original but show the assets beneath it
Ordering assets will only be possible among the assets having the same original
Enumerating/navigating the assets will require entering each original in a folder and collecting the assets from within them
Uploading a new file into an asset will change its node path
Alternative 5 - Versions
One option would be to keep the original as a version of the asset node.
The upload pool would be a special folder under the root.
Saves a node
It's not possible to activate a version, a version is created as part of activation after the node has been transferred
Versions are readonly, so it will be impossible to modify the original (should we want to)
Requires special version management for DAM workspace
Exports can never include current and the original (first version)
Some customers must be able to delete _all_ versions, for instance pharma companies if they have sensitive information that must be destroyed
For some customers its a legal requirement to retain versions, for instance in banking
Q: What is the id/name of an original? Is it important at all?
Q: How do we create a hierarchy of originals? We want to avoid a flat structure in JCR
From the jackrabbit wiki
The nt:file node type represents a file with some content. The jcr:created property inherited from nt:hierarchyNode contains the file creation time, while any other file metadata and the file contents can be found in the jcr:content child node.
nt:folder -> mgnl:folder
nt:hierarchyNode -> nt:file -> mgnl:asset
Has a property linking to the original.
Depending on the provider has a content node containing the binary data.
Can have any property on it representing its meta data.
nt:hierarchyNode -> nt:file -> mgnl:original
Identical to asset except it has no link to an original and does not keep meta data.
Potentially has a content node containing the binary data. Depends on the provider.
nt:resource -> mgnl:resource
Used for the content node under mgnl:asset and mgnl:original that holds the binary data.
Q: What steps will be necessary to support activation of these new node types. Simply configuration?
A: When activating an asset we will also activate the original if it hasn't already been activated or has been changed since then. We have this functionality all ready in the data module.
Activating only renditions (out of scope for 5.0)
We need to be aware that users will routinely upload large high resolution originals and then work on these to create variants that are of lower resolution due to both resizing and cropping. Consider the case of a 50 MB original. The user will upload it, possibly modify its asset, and a rendition of it is requested by the template. A rendition that very well might be a 120x300 pixel image. The only binary that is absolutely necessary on the public instance is the rendition.
This is different to how we do it today where imaging generates images just-in-time and caches them for reuse.
To make this work we'd need to:
- have permanent storage of renditions (not just cache)
- generate renditions when an asset is added or changed
- activate also all renditions when the asset node is activated but not the binary data or the original
- regenerate renditions when the rendition configuration changes
- this makes all stored renditions for this rendition configuration outdated
- remove all renditions for a rendition configuration when it is removed
We could lazily generate the renditions on the author instance when they're needed, i.e. for display or for activation. On the public instance they would need to exist pre-generated always (since the original is not there to be used as a source).
We could store renditions as child nodes of an asset node.
If a user has 50 assets and a rendition configuration, all activated and present on the public instance, and then changes the rendition configuration and activates it. He probably expects that to have immediate effect and would not expect that he has to also activate all assets. Were we to automate this for him then we would have to generate renditions for all assets and push these.
When we version an asset, and the asset does not hold its own binary data we must also version the original it links to. The versioned asset node must link to the versioned original node.
When we restore an asset, we must NOT also restore the original as it can be in use by other assets. Therefore restoring the asset means that it should get the binary data that the original had at the time the asset was versioned. Again, this is for an asset that does not hold its own binary data.
Q: We keep a limited number of versions. Therefore references between versioned nodes risk being broken as versions are dropped. How do we manage this?
A: Versioned assets will always refer to the current original node (not a version). This works since we never change an original.
The DAM will support streaming binary data from the DAM and generating links to both assets and originals.
When generating a link to an original its provider will be responsible for producing it. This will allow providers that don't store the binary data in the workspace to output links pointing elsewhere. To flickr for instance.
Links to assets, from website for instance, will be in the form <provider id>:<asset id>. The provider id is a string identifying the provider and the asset id is used internally by the provider to identify a specific asset. We don't have requirements on the format of the assert id. For internal assets the asset id will be the uuid in dam workspace.
Links to folders use the same format, <provider id>:<asset id>.
Comparison to DMS
We want to save the assets differently than in the current DMS implementation: When the original DMS module was written, we were not able to create custom node types.
We want to simplify the clumsy current structure.
For integrations we want a structure that conforms to best practice and utilizes current JCR possibilities. This will help when integrating using for instance WebDAV and CMIS.
Migration from DMS
On the topic of migrating existing nodes in dms workspace:
Q: Should we rename "document" node to "binary"
Q: What should we do with "description_files" and its metadata node.
Using the usual magnolia structure by extending mgnl:content
Using a flat mgnl:resource
- indexing simpler, faster
- but only one binary
- different than other content