August 09, 2007

UTF-8 encoding with MySQL and Glassfish

Allright, after 6 hours of coding, googling, experimenting and swearing, I finally managed to develop a website that fully supports UTF-8. The website is running on Glassfish Application Server (build V2 50g to be more precise) with a MySQL database underneath. Listed below, you can find all the steps I had to go through in order to get the website running.

JSP's

  • With the JSP @page directive you can specify the desired encoding by specifying both the page encoding and the content type:
    <%@ page pageEncoding="UTF-8" contentType="text/html;charset=UTF-8" language="java" %>
    pageEncoding specifies in which encoding the jsp page has been saved. contentType defines what content type should be sent in the response to the browser.

  • It is further recommended to provide the content type through the meta-tag within the head-tag of the HTML-document:
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    ...
    </head>
    ...
    </html>
  • To be complete, you can also specify the @charset directive at the top of every external css page you are using:
    @charset "utf-8";
    ...

Servlets

  • With every request you get in a Servlet, you'll have to set the encoding on the Request object:
    public void doXXXX(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    request.setCharacterEncoding("UTF-8");
    ...
    }
    Beware that if you use a filter that already reads from the Request, you will need to set the character encoding in the filter.

  • At the same time, when you build your response, for example to return XML or JSON data, you set the content type of the Response object:
    response.setContentType("application/json; charset=UTF-8");
    As you probably know, this response header has to be set before you start writing your data.

Database

  • MySQL supports Unicode as of version 4.1 and by that it is possible to store data in de database in UTF-8 encoding. Activating the UTF-8 encoding on a MySQL table is done during its creation by specifying a CHARACTER SET and a COLLATION:
    CREATE TABLE `USER` (
    ...
    ) ENGINE=MyISAM CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    For more information about character sets in MySQL, you can read the document Character Set Support.

  • At least your data is now stored in UTF-8, but it doesn't end there. In Glassfish, you still have to create a JDBC Connection Pool with the correct settings allowing the JDBC driver to actually read and write your data in UTF-8. In the Admin Console you select the desired Connection Pool and then you navigate to Additional Properties. You will already see a number of properties being filled in (like DatabaseName, url, username and password). To enable UTF-8 support for JDBC, you'll have to add two extra properties:
    useUnicode = true
    characterEncoding = utf8

Mail

  • To be able to send e-mail messages encoded in UTF-8, you will also need to provide the encoding type on the subject and the content of the message. Finally, you also need to set the content type of the e-mail message itself.
    MimeMessage msg = new MimeMessage(session);
    msg.setSubject(subject, "UTF-8");
    msg.setText(body, "UTF-8");
    if (asHtml) {
    msg.setContent(mailMessage.getBody(), "text/html; charset=UTF-8");
    msg.setHeader("Content-Type", "text/html; charset=UTF-8");
    } else {
    msg.setHeader("Content-Type", "text/plain; charset=UTF-8");
    }


And that's it. You should now have a website that doesn't give problems showing and storing your UTF-8 content. Below you can see a screenshot of my web application that shows data in Georgian (username), in Japanese (Location) and even in Runic alphabet (Tags):