[ID3 Dev] Unicode

Robert Manson rmanson at gracenote.com
Mon Feb 12 12:22:02 PST 2007


Hi Ben,
With regards to your comment:
>>it will also do the right thing, regardless of platform.
If you're intent here was to indicate that the code snippets you posted
are portable I'm afraid you are mistaken.

The wchar_t type is not portable instead you should use ui16_t for
UTF-16 (or UCS-2). (whar_t is 16 bits on windows but 32 bits on linux,
and is not guaranteed to be more than 7 bits) Also note that ui16_t is
guaranteed to be *at least* 2 bytes so you should not use sizeof(),
instead you should use the number 2.

Also, while it is true that literal values are big endian, the byte
ordering of the lval is system dependent so the code snippets will
produce different results depending on whether or not the platform is
big or little endian.

-Rob

-----Original Message-----
From: Ben Allison [mailto:benski at winamp.com] 
Sent: Sunday, February 11, 2007 9:04 PM
To: id3v2 at id3.org
Subject: Re: [ID3 Dev] Unicode

And just a note that C compilers treat literal numbers as big endian.

So if you do this
wchar_t BOM = 0xFEFF;
fwrite(&BOM, 1, sizeof(BOM), fp);

it will "do the right thing" regardless of the platform.

similiarly, if you do this:
wchar_t BOM;
fread(&BOM, 1, sizeof(BOM), fp);
if (BOM == 0xFEFF) // don't switch endian
{
}
else // switch endian
{
}

it will also do the right thing, regardless of platform.

> Mark,
>
> This is backwards.  Little endian BOM is FF FE and big endian BOM is
FE FF
>
> Mark Smith wrote:
>> 0xFF 0xFE for big-endian, or 0xFE 0xFF for little-endian.
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: id3v2-unsubscribe at id3.org
For additional commands, e-mail: id3v2-help at id3.org


---------------------------------------------------------------------
To unsubscribe, e-mail: id3v2-unsubscribe at id3.org
For additional commands, e-mail: id3v2-help at id3.org



More information about the ID3v2 mailing list