URI encoding issues can be witnessed by doing a search with umlauts or Chinese characters on a site that uses the STK default search. Jackrabbit stores content in Unicode. Issues are typically due to character the set conversion done in the application server. This page shows how configure Tomcat to use Unicode.
Option 1: useBodyEncodingForURI
Per Apache Tomcat Configuration Reference:
useBodyEncodingForURIattribute specifies if the encoding specified in contentType should be used for URI query parameters, instead of using the URIEncoding. This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false.
By default, Tomcat uses ISO-8859-1 to decode URIs. Change the encoding to UTF-8 which is used by default in Magnolia CMS.
Add the attribute to
<CATALINA_HOME>/conf/server.xml and set its value to
<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" useBodyEncodingForURI="true"/>
Option 2: Modify MIME mappings
Edit the appropriate MIME mappings, such as .html and .htm. Change
How it works:
- By default, Tomcat uses ISO-8859-1 to decode URIs.
- Magnolia CMS set the contentType of pages to UTF-8 by default. This means that the browser encodes GET form parameters using UTF-8. This is why issues occur.
- If you configure the MIME types as above, pages are served using ISO-8859-1, which means the browser will use ISO-8859-1 to encode GET form parameters, and Tomcat will then be able to decode those properly.
- If you configure this at Tomcat level then Tomcat also decodes them properly for the same reason.
Some other webapp running in the same container might be unhappy with the changes. If you have no other webapp in your Tomcat, go ahead and change its URIEncoding to UTF-8. If you do, proceed carefully. Read http://wiki.apache.org/tomcat/FAQ/CharacterEncoding