Page tree
Skip to end of metadata
Go to start of metadata

This is an implementation concept related to the UX design of Digital asset management

 

Terminology

This page uses the terms asset and variant interchangeably. The term variation is not used but is frequently used in discussions, it is understood that this is synonymous with variant.

A rendition is the result of transforming an asset.

A rendition configuration is an instruction of how to transform an asset.

Requirements

The DAM is a new crucial part of 5.0. It is replacing the DMS and adds a range of new features.

The DAM will have a concept of originals, where the file you upload is kept as an original that you can revert an asset back to later on. Note that this is only relevant for assets that are big (binaries) and can be altered within Magnolia. We are talking images, audio files and eventually video files (provided we have editors for these). For binaries that are altered outside the DAM like word docs, no AssetVariations exist. There may be versioning on assets though, which would allow to revert to a previous version of a binary if so configured. Variations and Versions are two different concepts and might be applied simultaneously to an AssetOriginal. AssetVariations are created especially for images, where an Original os often big, but the AssetVariations are smaller and use only part of the Original. For a typical example, imagine a group shot of three persons in high def where an author decides he needs to have three AssetVariants, each containing one of these persons. These variations will be created based on the original, but may be much smaller, as they are only needed for a web resolution. That is also why the original may not be duplicated for the AssetVariations.

Conceptual Model

The conceptual model consists of four parts.

Folder

Folders form a hierarchical structure with assets contained in folders.

Asset

An asset holds meta data and if it has been modified contains also binary data.

Every asset link to a original. Many assets can use the same original.

An asset is NOT a transformation of the original. We don't track editing operations made in order to recreate the variant from the original.

Original

An original is always used by at least one asset. (BK: thats the wrong way around. It should say every original has a default asset which is the one that a user interacts with.) (TM: an asset can be duplicated resulting in an orignal being linked to by multiple assets)

When the last asset using a particular original is deleted the original is orphaned and deleted.

An original is associated with a provider.

An original can hold binary data. This is up to the provider being used. Files uploaded to Magnolia will have binary data stored. If the provider pulls binary data from an external source such as flickr there will not be binary data in the original.

There is always a node representing the original. Even when the binary data comes from an external source.

Provider

Providers are java classes responsible for providing the binary data of originals. They don't take part in the storage layout other than being referenced by name/type on the original.

Upload pool

The upload pool is a special folder where assets are stored when the user uploads a file and don't want to decide where in the tree it should be stored. The idea is that it can/should be moved later.

Media types

We've identified five different media types:

The feature set in DAM differs for different media types. Therefore it's of interest to highlight them in this concept.

By default, only documents are versioned. Versioning should be configurable per asset type. This could be postponed to a next version.

(warning) TM: I'm taking this from my meeting minutes. Is this final?

Renditions

Renditions are very similar to the image transformations done in imaging today. A rendition is the result of transforming an asset. A common use of renditions is to serve assets in a size and form suitable for a specific template and/or device.

Notable use cases

When a user uploads a file to the DAM an original is created holding the binary data. The original is linked to the default built in provider. An asset is created linking to the original. The asset does not have binary data. The user sees the asset while the original is hidden.

When a user duplicates an asset a new asset is created. If the asset had been modified thereby holding binary data the new asset gets a copy of this binary data. The new asset links to the same original.

When a user deletes an asset the asset node is removed from the JCR. If the original it is linked to has no other assets using it the original is also removed.

When a user interacts with the DAM he browses a tree of folders and assets. Originals are not shown.

 

Workspace layout

Alternative 1 - Separate workspaces

Assets in one workspace, originals in another.

The upload pool would be a special folder in the assets workspace.

Pros

(plus) Simplicity, originals never show up when navigating/enumerating the tree of assets

Cons

(minus) References will need to be by uuid stored as string only

(minus) Streaming the original requires supporting an additional workspace

Alternative 2 - Separated by path

Assets under /assets and originals under /originals

The upload pool would be a special folder under /assets.

Pros

(plus) References can be JCR properties of reference type

Cons

(minus) Navigating/enumerating asset must always start with the /assets node and never go to the root

(minus) The hierarchy of assets start at level 1, creating a special case for the topmost folder. I.e. an asset shown in the root of DAM is not in the root of the workspace. This needs to be taken into account in code accessing the workspace.

Alternative 3 - Originals in special path

Similar to alternative 2 but assets are kept directly under the root and originals are in a special folder under the root named mgnl:originals

The upload pool would be a special folder under the root.

Q: Would the folder also have a special type to make it easier to filter out? Possibly a mixin? We'll need something similar for the upload pool.

A:

Alternative 4 - Mixed in the same tree

Since an asset refers to exactly one master we could have a tree of folders and originals with the assets as sub nodes of the original.

The upload pool would be a special folder under the root.

Pros

(plus) Findig the original given an asset is trivial

(plus) Detecting when the original is not used by any assets is trivial

(plus) Enumerating the assets using a particular original is trivial

Cons

(minus) Not ideal for the workbench component as it will need to hide the original but show the assets beneath it

(minus) Ordering assets will only be possible among the assets having the same original

(minus) Enumerating/navigating the assets will require entering each original in a folder and collecting the assets from within them

(minus) Uploading a new file into an asset will change its node path

Alternative 5 - Versions

One option would be to keep the original as a version of the asset node.

The upload pool would be a special folder under the root.

Pros

(plus) Saves a node

 (plus) One hierarchy

Cons

(minus) It's not possible to activate a version, a version is created as part of activation after the node has been transferred

(minus) Versions are readonly, so it will be impossible to modify the original (should we want to)

(minus) Requires special version management for DAM workspace

(minus) Exports can never include current and the original (first version)

(minus)Some customers must be able to delete _all_ versions, for instance pharma companies if they have sensitive information that must be destroyed

(minus) For some customers its a legal requirement to retain versions, for instance in banking 

Additionally

Q: What is the id/name of an original? Is it important at all?

Q: How do we create a hierarchy of originals? We want to avoid a flat structure in JCR

 

Node Types

We want to keep the structure as close as possible to the JCR types. This will ease future integration, for example with WebDav, CMIS and possibly also for customer generated JCR-data.

 
As nt:file is a subtype of nt:hierarchyNode adding sub nodes to a file should be possible. E.g. rich text content.


From the jackrabbit wiki

The nt:file node type represents a file with some content. The jcr:created property inherited from nt:hierarchyNode contains the file creation time, while any other file metadata and the file contents can be found in the jcr:content child node.  

The jcr:content child node can be of any node type to allow maximum flexibility, but the nt:resource node type is the common choice for jcr:content nodes.

 

http://wiki.apache.org/jackrabbit/nt%3Afile

http://wiki.apache.org/jackrabbit/nt%3Aresource

Folder

nt:folder -> mgnl:folder

Asset

nt:hierarchyNode -> nt:file -> mgnl:asset

Has a property linking to the original.

Depending on the provider has a content node containing the binary data.

Can have any property on it representing its meta data.

Orginal

nt:hierarchyNode -> nt:file -> mgnl:original

Identical to asset except it has no link to an original and does not keep meta data.

Potentially has a content node containing the binary data. Depends on the provider.

Resource

nt:resource -> mgnl:resource

Used for the content node under mgnl:asset and mgnl:original that holds the binary data.

 

Activation

Q: What steps will be necessary to support activation of these new node types. Simply configuration?

A: When activating an asset we will also activate the original if it hasn't already been activated or has been changed since then. We have this functionality all ready in the data module.

 

Activating only renditions (out of scope for 5.0)

We need to be aware that users will routinely upload large high resolution originals and then work on these to create variants that are of lower resolution due to both resizing and cropping. Consider the case of a 50 MB original. The user will upload it, possibly modify its asset, and a rendition of it is requested by the template. A rendition that very well might be a 120x300 pixel image. The only binary that is absolutely necessary on the public instance is the rendition.

This is different to how we do it today where imaging generates images just-in-time and caches them for reuse.

To make this work we'd need to:

  • have permanent storage of renditions (not just cache)
  • generate renditions when an asset is added or changed
  • activate also all renditions when the asset node is activated but not the binary data or the original
  • regenerate renditions when the rendition configuration changes
    • this makes all stored renditions for this rendition configuration outdated
  • remove all renditions for a rendition configuration when it is removed

We could lazily generate the renditions on the author instance when they're needed, i.e. for display or for activation. On the public instance they would need to exist pre-generated always (since the original is not there to be used as a source).

We could store renditions as child nodes of an asset node.

If a user has 50 assets and a rendition configuration, all activated and present on the public instance, and then changes the rendition configuration and activates it. He probably expects that to have immediate effect and would not expect that he has to also activate all assets. Were we to automate this for him then we would have to generate renditions for all assets and push these.

Versioning

When we version an asset, and the asset does not hold its own binary data we must also version the original it links to. The versioned asset node must link to the versioned original node.

When we restore an asset, we must NOT also restore the original as it can be in use by other assets. Therefore restoring the asset means that it should get the binary data that the original had at the time the asset was versioned. Again, this is for an asset that does not hold its own binary data.

Q: We keep a limited number of versions. Therefore references between versioned nodes risk being broken as versions are dropped. How do we manage this?

A: Versioned assets will always refer to the current original node (not a version). This works since we never change an original.

Linking

The DAM will support streaming binary data from the DAM and generating links to both assets and originals.

When generating a link to an original its provider will be responsible for producing it. This will allow providers that don't store the binary data in the workspace to output links pointing elsewhere. To flickr for instance.

Links to assets, from website for instance, will be in the form <provider id>:<asset id>. The provider id is a string identifying the provider and the asset id is used internally by the provider to identify a specific asset. We don't have requirements on the format of the assert id. For internal assets the asset id will be the uuid in dam workspace.

Links to folders use the same format, <provider id>:<asset id>.

Comparison to DMS

We want to save the assets differently than in the current DMS implementation: When the original DMS module was written, we were not able to create custom node types.

We want to simplify the clumsy current structure.

For integrations we want a structure that conforms to best practice and utilizes current JCR possibilities. This will help when integrating using for instance WebDAV and CMIS.

Migration from DMS

On the topic of migrating existing nodes in dms workspace:

Q: Should we rename "document" node to "binary"

Q: What should we do with "description_files" and its metadata node.

Discarded Proposals

Node Types

Using the usual magnolia structure by extending mgnl:content

  • nt:base
    • mgnl:content
      • mgnl:page
      • mgnl:area
      • mgnl:component
      • mgnl:asset

Using a flat mgnl:resource

  • mgnl:resource
    • mgnl:asset

Advantages:

  • flat
    • indexing simpler, faster

Disadvantages:

  • but only one binary
  • different than other content

 

 

9 Comments

  1. Versioning is one of the things that would need to be tested and pbly modfied as well.

    And we need to consider impact of such structure on exposing content via WebDAV

    For multivariant assets (i18n-ed, multiple sizes etc.) it might be possibility to have extra structure:

    • mgnl:folder
      • mgnl:multiasset
        • mgnl:asset
        • mgnl:asset
        • ...
      • mgnl:asset

    this way we could potentially group assets that should be presented as one within Magnolia, but still allow each variant to keep own set of properties and be indexed separately.

     

     

     

  2. I think"asset" as a name for the variants misleading. The original is also an asset. There are also assets that have no variant. I suggest to be more specific and talk of an AssetOriginal and AssetVariation.

    1. But couldn't we rather forget about the idea of variants? I think it makes it simpler. If you think of the OriginalAsset as a special object - (which it is - as it's not shown to user and not used on website) then we're left with just assets. Some assets point to an OriginalAsset, several assets can point to the same OriginalAsset, some assets point to no OriginalAsset (document).

      Well, we can't totally drop the word variant - its still useful to discuss the "children" of one OriginalAsset. 

  3. To overcome the duplication of media problem: What if an OriginalAsset is ONLY created if someone edit's the Asset.

  4. +1 to Christophers suggestion for creating OriginalAsset (or AssetOriginal???) only on demand.

    BTW before we can start serving such modified audio/video files, please keep in mind that Magnolia CMS is not a streaming solution and doesn't provide any support for multicast or other features necessary to stream data en-mass. 200 ppl listening to a song would fully occupy one public instance and not allow anything else to be served by such instance and even nothing to be activated to such instance!  IMHO you should from the very beginning consider storing such modified binaries still in outside storage and keeping only metadata about it in Magnolia and use streaming capabilities of such outside storage.

    1. Fine with me if original is only created on demand. The whole point is that we need to be able to revert to original, must not waste storage, and in the UI abstract away the idea that there are originals and variants. The user may safely assume that pressing duplicate in fact does duplicate the asset, but technically, we do not duplicate the original.

  5. On an API level I'd want there to always be an original. The implementation though could make sure that the original and a unmodified variant uses the same binary content. Imo the variant and the original should both be nodes in JCR but an unmodified variant has no binary data.

  6. Probably obvious - but just to point it out: If we have an Original node with no binary data - and we want to use nt:file for originals - then we'd still have to have a jcr:content child node (cause nt:file enforces it.) - i guess with an empty binary.

  7. Tobias has asked me to review this document. I've had a look at the first couple of chapters and have skipped the more technical ones. I've also had a look at your comments. So here are mine.

    • we should use AssetVariant consistently: you currently use both AssetVariation and AssetVariant. Please stick with "variant", as it better matches our case and is shorter.
    • I suggest that we continue to use the term "variant" or "asset variant" internally, as it helps to understand the underlying concept and, I suspect, to better differentiate between different capabilities of assets in the code base as well.
      • I've certainly done so in the UX concept page on the DAM: http://wiki.magnolia-cms.com/display/UX/Digital+asset+management. I've found that I else always had to talk about "original assets", "assets with originals", "assets without originals", "assets that are being versioned". That doesn't make things simpler. I'm sure the documentation team would agree with this as well. Actually, in an earlier review, Antti thought that "variant" and "master/original" were terms that worked.
      • I realize that this is too complex in the basic use cases for users to understand, though, which is why we can simply talk about "assets" in all user-facing interfaces. I don't see a problem with that: we can talk about the essence of the concept using simple terms and use a more complex picture only when this is required, i.e. when you have to dig deeper or extend the code base.
    • conceptual model
      • asset: "we do not track editing operations": I presume this is confirmed with Philipp. I'm saying this as such a tracking was discussed once in order to be able to re-apply changes to an asset variant once the original changes. UX-wise, such an automatism would certainly cause some issues, but it is at least thinkable.
      • original: "An original is always used by at least one asset." (and Boris' comment). I think the description is correct here. From the feature standpoint, it is reverse, true. But since this describes the technical concept, it is at least consistent with what is described elsewhere. As long as the technical concept implements the feature concept described on http://wiki.magnolia-cms.com/display/UX/Digital+asset+management (it does), I don't mind how it's implemented. As an outsider, I don't want to interfere with how things are actually implemented.
    • document and versioning vs. other media assets and variants
      • yes, currently and for 5.0, the idea is that documents have versions and no variants, while all other asset types have variants, but no versions.
      • we have already briefly discussed introducing versions for media assets like images as well. These could be created manually as simple "snapshots" of the current state of a variant and its current original, but I would actually like to confirm this with our user base, once our DAM is out.
      • similarly, I wouldn't restrict documents to not have any variants currently. The concept of variants for document assets could prove to be valuable when we deal with translation support for multi-language documents.
      • Given all of this, I wonder whether it wouldn't make sense to actually implement all asset types the same way.
    • notable use cases:
      • these look fine to me.