[ID3 Dev] Unicode

Sun Feb 11 19:59:55 PST 2007

On Mon, Feb 12, 2007 at 03:36:58AM +0000, Mark Smith wrote:
> The ID3v2.3 spec only allows for iso-8859-1 (the encoding byte set to  
> 0x00) and UCS-2 which is essentially UTF-16 (encoding byte set to  
> 0x01), so for completeness, we need to add the BOM when writing  
> Unicode strings.

Not just for completeness... for conformance with the spec!  "All
Unicode strings use 16-bit unicode 2.0 (ISO/IEC 10646-1:1993,
UCS-2). Unicode strings must begin with the Unicode BOM ($FF FE or $FE
FF) to identify the byte order." [Section 3.3]

2.4 adds another text encoding description 0x02 which is explicitly
big endian, and the BOM is ommitted.

> There is also UTF-8, which (I may be wrong about this) always has  
> it's multi-byte characters in big-endian order.

UTF-8 does not have endian issues... it is always treated byte by
byte, i.e. it never accessed more than one byte at a time.  But that
is only supported in 2.4.

			-ben

---------------------------------------------------------------------
To unsubscribe, e-mail: id3v2-unsubscribe at id3.org
For additional commands, e-mail: id3v2-help at id3.org