Solving problems with request parameter encoding
When working with Servlets, JSPs or JSF pages you might come a time consuming point, where you´ll have to struggle with encoding problems on POST or GET request parameter. This article shows some possibilities to overcome these problems - and saving time.
When sending request parameter there are always two options: You can do a GET request where parameter are part of the URL or you can use POST requests where the parameter are send within the request body. In both cases when internalionalization comes into, it is often required to transmit characters from different languages or character sets. Often this leads to problems because the sender encoding the parameter uses a different encoding than the receiver who decodes the parameter.
The reasons for such problems lies in the fact that many specifications (Java Servlet Specification, HTTP 1.1 Protocol, URI Syntax, ...) took part in defining how request parameter encoding has to be handled and browsers as well as servlet containers interpret this rules different. The next thing is that many participants are involved in request chain of which all must use the same encoding to succeed in the end.
How to solve this?
First let´s have a look at possible participants which can affect encoding of request parameter:
- the browser
- JSP / JSF pages
- the servlet container
- servlets with it´s request and response objects
- servlet filters
To bring them all on one level your first have to decide which encoding you want to use in general. While in most specifications mentioned above ISO-8859-1 (or US-ASCII) is default you should consider UTF-8 being a better candiate because it can represent all unicode characters (internationalization!) and became standard during the last years. We think: it´s a good decision to switch your application environment to UTF-8.
By performing the following steps you can achieve this in relation to Tomcat 6 (altough some rules are not Tomcat specific) to solve your encoding parameter problems for GET and POST scenarios:
1.) Change all your JSP or JSF pages to include charset name in their contentType
These settings should be made on each page (best put it in your template if existing) :
<!-- JSP XML --> <jsp:directive.page contentType="text/html; charset=UTF-8" /> <!-- JSF/Facelets XHTML --> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
2.) When using servlets or servlet filters: ensure that request and response are set to UTF-8 character encoding
The character encoding can be set on the ServletRequest and ServletResponse objects:
... servletRequest.setCharacterEncoding("UTF-8"); servlerResponse.setCharacterEncoding("UTF-8"); ...
To prevent implementing this in all of your servlets and filters you can use a character encoding filter that set´s the request character encoding for every request to UTF-8.
- An example can be found in your Tomcat distribution under webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java
- When using Spring you can use a special filter that is decribed in this post.
Be sure that this filter is the first element in the chain, that accesses request parameter. After having accessed request parameter (servletRequest.getParameter...()) once, setting the requst character encoding has no affect anymore on reading parameter from the request (at least in Tomcat the call seems to be cached).
3.) Configure Tomcat for interpreting GET parameter correctly
Decoding GET parameter is not the same then decoding POST parameter because in case of GET it is part of the URL and not the request body. That´s why a filter mentioned above may not solve encoding problems here because request character encoding refers to request body. To fix it in Tomcat you must set URIEncoding="UTF-8" on your <Connector> in server.xml (for details see here).
<Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8080" protocol="HTTP/1.1" ...>
This must be done for all connectors!
You´ll find some more details on the mentioned Tomcat settings and explanations on this in the Tomcat Wiki.
4.) Prepare your requests
When sending a GET request be sure that parameter are correctly UTF-8 URL-encoded, e.g. "param=hallö" must look like this:
If you don´t use an encoder function to generate your links, you can also convert your parameter by using an online encoder.
For sending POST request you should set the form attribute accept-charset to UTF-8. Sometimes this must not have an effect because often web browsers appear to send a request body using the encoding of the page used to generate the POST (for instance, the <form> element came from a page with a specific encoding it is that encoding which is used to submit the POST data for that form). So when using UTF-8 as page contentType all should be fine even without setting accept-charset.
On sending POST requests there also exist a strange browser in Internet Explorer: if a page that has ISO-8859-1 as it´s contentType contains a form that should be sent UTF-8 encoded, you need a special hack to achieve that. You can read about a solution in this article. If you don´t use the hack the form parameter will not be sent in UTF-8 (but in ISO-8859-1). Well actually you should not have this problem when following the rules mentioned above and set the contentType of your pages to UTF-8.