It has been noticed ( DEV-506 - Getting issue details... STATUS ) that blocks are rendered noticeably slower than the components. We have investigated the case and analysed it under some different circumstances.
TL;DR: we should optimise the way the block definition is resolved. Current way is of linear complexity (depending on the amount of blocks registered in the system) and easily takes as much time as the rest of the rendering. The simple solutio would be to treat the block type reference stored in the content as the block id and retrieve it via
Ad-hoc rendering benchmarking
A sample page has been created in order to benchmark the performance of block rendering vs component rendering. The page merely renders an image component next to an image block. The rendering logic of the two is almost identical: the default rendition is resolved and then image macro is doing the rest of the job. The page is queried a 1000 times (seems to be enough for JIT to kick in) and the performance is observed in the profiler tuned to inspect the
info.magnolia.templating.elements.* package which contains both component and block rendering classes. The following methods are particularly interesting for the case:
BlockElement#end(out)- contains most of the block rendering logic including rendering engine call and
BlockElement#resolveTemplateDefinitioninvocation which pulls the correct block definition from the corresponding registry.
ComponentElement#begin()- contains most of the component rendering logic.
Note: in the profiler screenshots below there are three numeric columns (total time spent in a method, avg time spent in a method, invocation number).
30 block definitions in the registry with the block definition retrieval unchanged
Block definitions are currently resolved by iterating all the block definitions and finding the first one whose type property values matches the type property of the block content (stream chain with
From the profiler output below we can see that block rendering takes roughly twice as much time as the component's rendering (64% vs 28%) and half of the block rendering time is spent actually fetching all the block definitions (31%). This, however doesn't seem to be so dramatic since the average time spent in each of the methods is relatively little (~1.5ms vs ~3ms). On the other hand, under the heavy load this could become a valid reason for concern.
30 block definitions in the registry with simplified block definition retrieval
We attempt to reduce the cost of the block definition. We are utilising the following fact: the type property by which we locate the block definition matches the name property (default implementation of BlockDefinition#getType() delegates to #getName()). Since name property in turn is used as an id for the block in block registry - we can fetch the corresponding block via
Registry#getProvider(name). This turns block definition fetching into a constant-time complex operation (hash map look-up).
The screenshot below show that which such augmentation applied - blocks and components are rendered in a comparable time: 400 microseconds still spent on the block provider resolution which is already several times faster than the current solution and it is much more scalable (as it will be shown in the following analysis). Template resolution logic takes roughly 10 times less time (total ~1500ms vs ~160ms).
150 and 500 block definitions in the registry with the block definition retrieval unchanged
The following two runs were done to illustrate the dependency of the block definition resolution cost on the amount of blocks. We have increased the amount of blocks from ~30 to ~150 and then to ~500. This has in turn increased the block definition resolution cost approximately linearly: total time went from ~1.5 to ~5s to ~24s. In the 500 block definitions case components take more than 20 times less to render!
Although it is highly unrealistic that there would be more than a couple of dozen block definitions registered in the system, these numbers sound alarming.
With ~150 block definitions
With ~500 block definitions
150 and 500 block definitions in the registry with the optimised block definition look-up
The following two screenshots demonstrate that with fetching the block definition by id does not rise in cost with higher amount of block definitions. The average method invocation cost is still in range of 150-200 microseconds.
With 150 block definitions
With 500 block definitions
The block definitions used in those test tuns have been provided via YAML files without any decorators applied whatsoever. Technically decoration should not add much off overhead on top (since decoration caches itself and does not kick in again unless the definition has changed). One interesting case is Blossom definitions - those are resolved every time from scratch (it is though not possible to define a block with Blossom at the moment though).
Measures we can take
- Probably the best idea is to change the convention of how the block definitions are referenced in block content.
- Registry impl improvements (wouldn't change the situation dramatically though):
AbstractRegistryuses Guava functional APIs instead of streams which could be easily refactored.
- We could add methods that return Stream instead of collections. This though would not help with searching the definitions be their properties: one would still need to go over the providers and fetch their definitions.
- Allow different registries to have custom metadata? I.e. blocks could have the "type" property as available metadata attribute. We could allow metadata section to be specified in YAML or JCR and then definition look-ups would be possible to be accomplished without resolving the actual objects.