Your Rating: |
![]() ![]() ![]() ![]() ![]() |
Results: |
![]() ![]() ![]() ![]() ![]() |
102 | rates |
Basics
Content Apps have different views: Tree, List, Thumbnail, Search - sorting is only implemented for ListView. Search has a view of his own. (Andreas: please make sure you implement the revised concept (or a first version of it) - see also my comment below. I'm ready to answer any UI questions you might have).
Search
Updated concept
A new concept dealing with some issues present in 5.0 final has been created here Search revisited
Search is a Content App view. In particular, it is a special case of a List view which displays only the subset of items returned by a given search.
Basic search
- It is triggered by clicking in the search box found in the toolbar as specified by the UX design http://wiki.magnolia-cms.com/display/UX/Basic+search+in+apps
- Performs a JCR full-text search in the App workspace JCR 2.0 specs 6.7.19
- Allows using all JCR wildcards and boolean operators for full-search. Search terms separated by whitespaces are ANDed by default.
- It is backed by the SearchJCRContainer, a subclass of FlatJcrContainer, implementing a generic JCR SQL2 full-text query by overriding AbstractJcrContainer.constructJCRQuery(boolean).
- Can be bookmarked and restored.
Advanced search
- To be implemented Advanced search in apps#Advancedsearchusingqueries
Indexing
JR indexing configuration (see http://wiki.apache.org/jackrabbit/IndexingConfiguration) is used to aggregate contents found in sub-nodes, i.e. content found in a mgnl:page under a mgnl:area/mgnl:component even in nested areas. That content would not normally be available to search unless resorting to JCR_SLQ2 joins on descendant children. However such joins are unfortunately quite slow and thus a shop-stopper for us.
Drawbacks
- Only one indexing_configuration.xml exists at the moment and it's for the website workspace. It is not possible to have one generic index aggregate configuration dealing with mgnl:content and mgnl:contentNode as this seems not to work (at least in the website workspace case).
- The process of copying the configuration file to the actual workspace folder is currently not automated
- The index aggregate produces a wrong number of results in queries. The issue is analyzed in-depth here
To be considered
- Should we provide a way for custom Apps to plugin their own implementation of SearchJcrContainer, i.e. via WorkbenchConfiguration?
- Query injection. Is that a real issue?
- Query terms escaping. So far only quotes escape was implemented (see http://wiki.apache.org/jackrabbit/EncodingAndEscaping). Maybe there is more to it?
- Should search also provide a Thumbnail view besides the List view?
Sort
Reasoning
- ListView uses AbstractJCRContainer that will generate the query considering the available columns
- default sorting should be configurable (on WorkbenchDefinition)
- clicks on sortable columns (defined on ColumnDefinition) re-trigger the query with new sorting
Data retrieval for ListView
- JR 2.4.x supports various query languages
- as we generate the queries ourselves we could use any of these
Performance comparison
Test were done with Contact App - M5 running in Tomcat 7, MySQL as DB, all on a single MacBook Pro.
No of items | Sort by | XPATH t[s] | JCR_JQOM t[s] | XPATH (MetaData as mixin) t[s] | JCR_JQOM (MetaData as mixin) t[s] |
---|---|---|---|---|---|
20'000 | name | not supported | 1.3 - 1.6 | not supported | 1.8 - 2.4 |
0.005 - 0.08 | 2.8 - 4 | 0.001 - 0.02 | 1.2 - 2.4 | ||
mgnl:lastmodified | 0.7 -12 | 19 - 26 | 0.002 - 0.03 | 3 - 11 | |
100'000 | name | not supported | 10 - 11 | not supported | 9 - 14 |
0.01 - 0.06 | 6 - 11 | 0.06 - 0.06 | 6 - 12 | ||
mgnl:lastmodified | 50 - 80 | 210 - 290 | 0.01 - 0.03 | 35 - 100 | |
500'000 | name | not supported | 65 - 95 | not supported | 9 - 11 |
0.05 - 1.5 | 40 - 150 | 0.005 - 0.01 | 6 - 10 | ||
mgnl:lastmodified | 350 -430 | 2800 - 3500 | 0.01 - 0.03 | 40 - 65 |

Interpretation:
- XPATH is by far faster than JQOM (but doesn't scale too well for subnodes)
- MetaData as mixin is a huge performance gain
- JOINS in JQOM are performance killers
Options
There's actually two independent decisions to be taken:
- stay with MetaData as subnode or migrate to MetaData as mixing (details on separate wiki page)
- use new but slower JQOM or stick with deprecated XPATH (details on separate wiki page)
Recommendation
- migrate to MetaData
- simplification + performance gain outweigh the effort
- simplification + performance gain outweigh the effort
- use JQOM
- slower but in combination with MetaData as mixin still acceptable
- will become faster - maybe not for JR 2.4 but at least for OAK (JR 3.0)
- sorting by name is a must - XPATH would mean we have to introduce a workaround
We might be lucky that JCR-3446 gets implemented soon, so we could easily fall back to XPATH in case of big need.
8 Comments
Magnolia International
Cool perf analysis. Did you run the same tests with an "SQL" query, by any chance ? It'd support sorting by name with jcr:name, I'd think (and I assumed the same would work with xpath)
Daniel Lipp
There's actually two "special" things we need: sort by name and the possibility to access MetaData properties (hosted on a subnode for now). JCR_SQL1 doesn't support joins so that's why we didn't test it.
Magnolia International
Of course, but it should work with the MetaData as mixin, no? Also, the results would likely be catastrophic, but it was "always" possible to sort by mgnl:lastmod with jcr-sql, with the feature that lets one coerce search results to the nearest parent of a given type, afaik.
Also, i might be dreaming this, but isn't there such a thing as jcr-sql2, which does have support for joins ?
Philipp Bärfuss
SQL2 has all the feature we needed but it is terrible slow. Which we got confirmed by the JR team. They use collection sorting and not the index as for SQL1. There won't be any improvements on that until JR 3 (Oak).
XPATH and SQL1 are quite equal but don't have sorting by jcr:name. We might get that added by the JR team.
Till now we didn't have the courage to decide for mixin metadata for the 5.0 release as it will cause again some extra effort.
So XPATH is currently the best option under the current constraints.
Daniel Lipp
MetaData as mixing is faster in any case. On top we would no longer need joins, so yes - JCR_SQL1 could then become an option. In general there's JCR_SQL1 + XPATH (since JCR 1.0) and then new JCR_JQOM + JCS_SQL2 (since JCR 2.0). The later two use the so called AQM (Advanced Query Model) implemented in JR 2.x (both support joins) but all ordering there is sloooooow in there. According to Jukka this will most likely only change with OAK...
Magnolia International
ho, so using jcr-sql2 is not faster than jqom ? Even with simple queries ?
Daniel Lipp
similar speed on large data set because both currently do sorting based on collections and not using lucene
Didn't check in details but I'm pretty sure they share large parts of code...
Andreas Weder
Please note that some information on this page is outdated: search no longer has its own view, but has been reworked to act as a filter on existing views. The corresponding UI design pages have been updated: