Page tree
Skip to end of metadata
Go to start of metadata

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 101 rates

Basics

Content Apps have different views: Tree, List, Thumbnail, Search - sorting is only implemented for ListView. Search has a view of his own. (Andreas: please make sure you implement the revised concept (or a first version of it) - see also my comment below. I'm ready to answer any UI questions you might have).

Search

Updated concept

A new concept dealing with some issues present in 5.0 final has been created here Search revisited

 

Search is a Content App view. In particular, it is a special case of a List view which displays only the subset of items returned by a given search.

Basic search

  • It is triggered by clicking in the search box found in the toolbar as specified by the UX design  http://wiki.magnolia-cms.com/display/UX/Basic+search+in+apps
  • Performs a JCR full-text search in the App workspace JCR 2.0 specs 6.7.19
    • Allows using all JCR wildcards and boolean operators for full-search. Search terms separated by whitespaces are ANDed by default.
  • It is backed by the SearchJCRContainer, a subclass of FlatJcrContainer, implementing a generic JCR SQL2 full-text query by overriding AbstractJcrContainer.constructJCRQuery(boolean).
  • Can be bookmarked and restored.

Advanced search

Indexing

JR indexing configuration (see http://wiki.apache.org/jackrabbit/IndexingConfiguration) is used to aggregate contents found in sub-nodes, i.e. content found in a mgnl:page under a mgnl:area/mgnl:component even in nested areas. That content would not normally be available to search unless resorting to JCR_SLQ2 joins on descendant children. However such joins are unfortunately quite slow and thus a shop-stopper for us.

  • Drawbacks

    • Only one indexing_configuration.xml exists at the moment and it's for the website workspace. It is not possible to have one generic index aggregate configuration dealing with mgnl:content and mgnl:contentNode as this seems not to work (at least in the website workspace case).
    • The process of copying the configuration file to the actual workspace folder is currently not automated
    • The index aggregate produces a wrong number of results in queries. The issue is analyzed in-depth here 

      Error rendering macro 'jira'

      Unable to locate Jira server for this macro. It may be due to Application Link configuration.

To be considered

  • Should we provide a way for custom Apps to plugin their own implementation of SearchJcrContainer, i.e. via WorkbenchConfiguration?
  • Query injection. Is that a real issue? 
  • Query terms escaping. So far only quotes escape was implemented (see http://wiki.apache.org/jackrabbit/EncodingAndEscaping). Maybe there is more to it?
  • Should search also provide a Thumbnail view besides the List view?

Sort

Reasoning

  • ListView uses AbstractJCRContainer that will generate the query considering the available columns
  • default sorting should be configurable (on WorkbenchDefinition)
  • clicks on sortable columns (defined on ColumnDefinition) re-trigger the query with new sorting

Data retrieval for ListView

Performance comparison

Test were done with Contact App - M5 running in Tomcat 7, MySQL as DB, all on a single MacBook Pro.

No of itemsSort byXPATH t[s]JCR_JQOM t[s]XPATH (MetaData as mixin) t[s]JCR_JQOM (MetaData as mixin) t[s]
20'000namenot supported1.3 - 1.6not supported1.8 - 2.4
 email0.005 - 0.082.8 - 40.001 - 0.021.2 - 2.4
 mgnl:lastmodified0.7 -1219 - 260.002 - 0.033 - 11
100'000namenot supported

10 - 11

not supported9 - 14
 email0.01 - 0.066 - 110.06 - 0.066 - 12
 mgnl:lastmodified50 - 80210 - 2900.01 - 0.0335 - 100
500'000namenot supported65 - 95not supported9 - 11
 email0.05 - 1.540 - 1500.005 - 0.016 - 10
 mgnl:lastmodified350 -4302800 - 35000.01 - 0.0340 - 65

 

 

 

Interpretation:

  • XPATH is by far faster than JQOM (but doesn't scale too well for subnodes)
  • MetaData as mixin is a huge performance gain
  • JOINS in JQOM are performance killers

Options

There's actually two independent decisions to be taken:

  1. stay with MetaData as subnode or migrate to MetaData as mixing (details on  separate wiki page)
  2. use new but slower JQOM or stick with deprecated XPATH (details on separate wiki page)

Recommendation

  • migrate to MetaData
    • simplification + performance gain outweigh the effort

  • use JQOM
    • slower but in combination with MetaData as mixin still acceptable
    • will become faster - maybe not for JR 2.4 but at least for OAK (JR 3.0)
    • sorting by name is a must - XPATH would mean we have to introduce a workaround

We might be lucky that JCR-3446 gets implemented soon, so we could easily fall back to XPATH in case of big need.

  • No labels

8 Comments

  1. Cool perf analysis. Did you run the same tests with an "SQL" query, by any chance ? It'd support sorting by name with jcr:name, I'd think (and I assumed the same would work with xpath)

  2. There's actually two "special" things we need: sort by name and the possibility to access MetaData properties (hosted on a subnode for now). JCR_SQL1 doesn't support joins so that's why we didn't test it.

    1. Of course, but it should work with the MetaData as mixin, no? Also, the results would likely be catastrophic, but it was "always" possible to sort by mgnl:lastmod with jcr-sql, with the feature that lets one coerce search results to the nearest parent of a given type, afaik.

      Also, i might be dreaming this, but isn't there such a thing as jcr-sql2, which does have support for joins ?

      1. SQL2 has all the feature we needed but it is terrible slow. Which we got confirmed by the JR team. They use collection sorting and not the index as for SQL1. There won't be any improvements on that until JR 3 (Oak).

        XPATH and SQL1 are quite equal but don't have sorting by jcr:name. We might get that added by the JR team.

        Till now we didn't have the courage to decide for mixin metadata for the 5.0 release as it will cause again some extra effort.

        So XPATH is currently the best option under the current constraints.

  3. MetaData as mixing is faster in any case. On top we would no longer need joins, so yes - JCR_SQL1 could then become an option. In general there's JCR_SQL1 + XPATH (since JCR 1.0) and then new JCR_JQOM + JCS_SQL2 (since JCR 2.0). The later two use the so called AQM (Advanced Query Model) implemented in JR 2.x (both support joins) but all ordering there is sloooooow in there. According to Jukka this will most likely only change with OAK...

    1. ho, so using jcr-sql2 is not faster than jqom ? Even with simple queries ? 

  4. similar speed on large data set because both currently do sorting based on collections and not using lucene (sad) Didn't check in details but I'm pretty sure they share large parts of code...

  5. Please note that some information on this page is outdated: search no longer has its own view, but has been reworked to act as a filter on existing views. The corresponding UI design pages have been updated: