[ID3 Dev] Unicode
Ben Bennett
fiji at ayup.limey.net
Sun Feb 11 19:59:55 PST 2007
On Mon, Feb 12, 2007 at 03:36:58AM +0000, Mark Smith wrote:
> The ID3v2.3 spec only allows for iso-8859-1 (the encoding byte set to
> 0x00) and UCS-2 which is essentially UTF-16 (encoding byte set to
> 0x01), so for completeness, we need to add the BOM when writing
> Unicode strings.
Not just for completeness... for conformance with the spec! "All
Unicode strings use 16-bit unicode 2.0 (ISO/IEC 10646-1:1993,
UCS-2). Unicode strings must begin with the Unicode BOM ($FF FE or $FE
FF) to identify the byte order." [Section 3.3]
2.4 adds another text encoding description 0x02 which is explicitly
big endian, and the BOM is ommitted.
> There is also UTF-8, which (I may be wrong about this) always has
> it's multi-byte characters in big-endian order.
UTF-8 does not have endian issues... it is always treated byte by
byte, i.e. it never accessed more than one byte at a time. But that
is only supported in 2.4.
-ben
---------------------------------------------------------------------
To unsubscribe, e-mail: id3v2-unsubscribe at id3.org
For additional commands, e-mail: id3v2-help at id3.org
More information about the ID3v2
mailing list