Which XML encoding should I use?

The short answer is utf-8. The long answer goes along these lines:

  • utf-8 is the default XML encoding specified by the XML standard. If you use it, you don't have to define the encoding manually, thus reducing the chances of introducing errors. For example, some browsers get confused if the encoding specified in the xml pseudo-instruction is not the same as the one specified in the HTTP header.
  • All XML parsers are required to recognize utf-8 encoding. By using it, you don't risk any future compatibility issues. For example, some of my applications break on some installations of Internet Explorer on Vista as they use windows-1250 encoding. They work fine within IE on Windows XP, Firefox on Vista (and sometimes even with IE on Vista).
  • By using utf-8 you'll never encounter a character you cannot encode.

This post is part of You've asked for it series of articles.

No comments:

Post a Comment