Implemented in 4.3
Official Documentation Available
This topic is now covered in i18n and l10n > Authoring.
If the property magnolia.utf8.enabled is set to true UTF8 page names are accepted.
UTF8 for page names
The simple target is to be able to use non-ascii chars for page names (actually, for any node in the repository)
The current idea and status is described in the MAGNOLIA-3009 jira, and we have more info on
JCR supports non-ascii chars in node names, but we have to be sure that everything is encoded (or undecoded) properly.
The task may be broken in two steps:
- review the reading/writing of nodes in the repository (server side), given that input values are properly encoded
- review the decoding/handling of http requests
We will first approach item one, creating a set of unit tests for checking the base operations on nodes with extended chars (read, write, update, delete - from simple west-european chars to chinese). Magnolia can already read nodes with extended characters from the repo, but we will probably have to check carefully the escaping or removal of unwanted chars. At this moment everything is filtered by Path.getValidatedLabel() which just drops everything.
Item 2 looks a lot more complex. It involves:
- properly decode requests. Note than in 4.2 URLDecoding of request path has been removed, but we will have to put it back, since it's needed for some browsers (surely needed for firefox and not needed for IE). This anyway should have nothing to do with UTF8 normalization (URLDecoding, escaped chars are not UTF8)
- properly handle NFC/NFD strings, both in paths than in parameters