[ID3 Dev] Encoding UTF-16 (i.e UTF-16 with BOM which is the most compatible choice Little Endian or Big Endian ?)

Steve Dirickson sdirickson at real.com
Thu Apr 28 11:14:50 PDT 2011


Then I guess it depends on your definition of "most compatible". If you use BE, Windows machines, Intel-based Macs, etc. will have to do the byte swapping; if you use LE, other architectures will have to do so. Does that matter?

WRT unsync, if you aren't using it, the BOM issue is a don't-care. If you are using it-stop! ;-) Since you want to stay 2.3-compatible, you're going to be using a BOM either way; unsync just complicates the issue.

If you're trying to figure out how to avoid breaking the smallest number of broken apps that don't properly handle Unicode tags, that's an exercise in frustration. Unless you know that a significant share of your target user base uses a known-broken app that happens to work with one but not the other, I'd say pick one, and accept that some number of misbehaving apps are going to show garbage to the user.

From: Paul Taylor [mailto:paul_t100 at fastmail.fm]
Sent: Wednesday, 27 April, 2011 11:37
To: id3v2 at id3.org
Cc: Steve Dirickson
Subject: Re: [ID3 Dev] Encoding UTF-16 (i.e UTF-16 with BOM which is the most compatible choice Little Endian or Big Endian ?)

On 27/04/2011 18:03, Steve Dirickson wrote:
I think the key is that UTF-16BE is equivalent to "network byte order". Any app that produces Unicode for external consumption really should provide the BOM. But, if it doesn't, the only reasonable assumption the recipient can make is that the text is in network byte order. The alternative is to try heuristics and look for lots of binary zero values (or lots of the same small value) in every other byte, and then make the call based on whether those recurring values are in the even or odd bytes.

I think you are missing my point Im NOT talking about the UTF-16BE encoding but the UTF-16 encoding (Which is UTF with BOM and can contain LE or BE data). I have no problem reading or writing the data but would like to know which is the most compatible choice. When embedded within an mp3 UTF-16 is not magically decoded by the operating system, it has to be decoded by the application, and Im sure there are some applications that can embed BOM LE but not BOM BE or vice versa. There is also the complication that the Byte order marks themselves in BOM LE  requires unsynchronization if you are using unsynchronization whereas BOM BE does not, and applications such as Windows 7 Explorer itself don't understand unsynchronization making me think that BOM BE is more compatible.

Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.id3.org/pipermail/id3v2/attachments/20110428/8590ce8b/attachment.html>


More information about the ID3v2 mailing list