August 09, 2007

UTF-8 encoding with MySQL and Glassfish

Allright, after 6 hours of coding, googling, experimenting and swearing, I finally managed to develop a website that fully supports UTF-8. The website is running on Glassfish Application Server (build V2 50g to be more precise) with a MySQL database underneath. Listed below, you can find all the steps I had to go through in order to get the website running.

JSP's

  • With the JSP @page directive you can specify the desired encoding by specifying both the page encoding and the content type:
    <%@ page pageEncoding="UTF-8" contentType="text/html;charset=UTF-8" language="java" %>
    pageEncoding specifies in which encoding the jsp page has been saved. contentType defines what content type should be sent in the response to the browser.

  • It is further recommended to provide the content type through the meta-tag within the head-tag of the HTML-document:
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    ...
    </head>
    ...
    </html>
  • To be complete, you can also specify the @charset directive at the top of every external css page you are using:
    @charset "utf-8";
    ...

Servlets

  • With every request you get in a Servlet, you'll have to set the encoding on the Request object:
    public void doXXXX(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    request.setCharacterEncoding("UTF-8");
    ...
    }
    Beware that if you use a filter that already reads from the Request, you will need to set the character encoding in the filter.

  • At the same time, when you build your response, for example to return XML or JSON data, you set the content type of the Response object:
    response.setContentType("application/json; charset=UTF-8");
    As you probably know, this response header has to be set before you start writing your data.

Database

  • MySQL supports Unicode as of version 4.1 and by that it is possible to store data in de database in UTF-8 encoding. Activating the UTF-8 encoding on a MySQL table is done during its creation by specifying a CHARACTER SET and a COLLATION:
    CREATE TABLE `USER` (
    ...
    ) ENGINE=MyISAM CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    For more information about character sets in MySQL, you can read the document Character Set Support.

  • At least your data is now stored in UTF-8, but it doesn't end there. In Glassfish, you still have to create a JDBC Connection Pool with the correct settings allowing the JDBC driver to actually read and write your data in UTF-8. In the Admin Console you select the desired Connection Pool and then you navigate to Additional Properties. You will already see a number of properties being filled in (like DatabaseName, url, username and password). To enable UTF-8 support for JDBC, you'll have to add two extra properties:
    useUnicode = true
    characterEncoding = utf8

Mail

  • To be able to send e-mail messages encoded in UTF-8, you will also need to provide the encoding type on the subject and the content of the message. Finally, you also need to set the content type of the e-mail message itself.
    MimeMessage msg = new MimeMessage(session);
    msg.setSubject(subject, "UTF-8");
    msg.setText(body, "UTF-8");
    if (asHtml) {
    msg.setContent(mailMessage.getBody(), "text/html; charset=UTF-8");
    msg.setHeader("Content-Type", "text/html; charset=UTF-8");
    } else {
    msg.setHeader("Content-Type", "text/plain; charset=UTF-8");
    }


And that's it. You should now have a website that doesn't give problems showing and storing your UTF-8 content. Below you can see a screenshot of my web application that shows data in Georgian (username), in Japanese (Location) and even in Runic alphabet (Tags):

9 comments:

Unknown said...

Man, you saved my life! After two weeks of sucking hard, your post solved my problem. Thanks a lot!

jet said...

Thank you! Thank you! Thank you! I almost gave up solving this problem.

Anonymous said...

many thanks!

Anonymous said...

Awaiting the day that this (UTF-8) is the default encoding for all the elements involved...!

Izzie said...

This did not solve my problem...followed each step with to the '.' and still having com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column maybe this is more than just a sql exception???

Damir said...

It's 2012 and I can't thank you enough! It works now

Varuna Singh said...

Thanks, you solved my problem that I've had for a while. Thanks for posting!

Technogeekscs said...

This is most informative and also this post most user-friendly and super navigation to all posts.
Data Science
Selenium
ETL Testing
AWS
Python Online Classes

z1gprphjks said...

Gambling operators who desire commerce with South Koreans have methods to assist with deposits and withdrawals. Clients may have to make use of eWallets or different forms of direct cash switch as cost methods. We'll go into SticPay and different choices in more element in the banking sections under. When may be} enjoying in} online, positive that you|just bear in mind to|just ensure you} always set a restrict for your self before the game begins and that 토토사이트 you stick with it.