XML Encoding

XML documents may contain foreign characters, like Norwegian æ ø å , or French ê è é.

To let your XML parser understand these characters, you should save your XML documents as Unicode.


Windows 2000 Notepad

Windows 2000 Notepad can save files as Unicode.

Save the XML file below as Unicode (note that the document does not contain any encoding attribute):



Jani
Tove
Norwegian: æøå. French: êèé

The file above, note_encode_none_u.xml will NOT generate an error in IE 5+, Firefox, or Opera, but it WILL generate an error in Netscape 6.2.


Windows 2000 Notepad with Encoding

Windows 2000 Notepad files saved as Unicode use "UTF-16" encoding.

If you add an encoding attribute to XML files saved as Unicode, windows encoding values will generate an error.

The following encoding (open it), will NOT give an error message:

The following encoding (open it), will NOT give an error message:

The following encoding (open it), will NOT give an error message:

The following encoding (open it), will NOT generate an error in IE 5+, Firefox, or Opera, but it WILL generate an error in Netscape 6.2.



Error Messages

If you try to load an XML document into Internet Explorer, you can get two different errors indicating encoding problems:

An invalid character was found in text content.

You will get this error message if a character in the XML document does not match the encoding attribute. Normally you will get this error message if your XML document contains "foreign" characters, and the file was saved with a single-byte encoding editor like Notepad, and no encoding attribute was specified.

Switch from current encoding to specified encoding not supported.

You will get this error message if your file was saved as Unicode/UTF-16 but the encoding attribute specified a single-byte encoding like Windows-1252, ISO-8859-1 or UTF-8. You can also get this error message if your document was saved with single-byte encoding, but the encoding attribute specified a double-byte encoding like UTF-16.


Conclusion

The conclusion is that the encoding attribute has to specify the encoding used when the document was saved. My best advice to avoid errors is:

  • Use an editor that supports encoding
  • Make sure you know what encoding it uses
  • Use the same encoding attribute in your XML documents

Mix it