<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
On 27/04/2011 18:03, Steve Dirickson wrote:
<blockquote
cite="mid:4A242CD46F4C34418D17941FF9746F9774438C59E2@SEAMBX.corp.real.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta name="Generator" content="Microsoft Word 12 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">I think the key is that UTF-16BE is equivalent to
“network byte order”. Any app that produces Unicode for
external consumption really should provide the BOM. But, if
it doesn’t, the only reasonable assumption the recipient can
make is that the text is in network byte order. The
alternative is to try heuristics and look for lots of binary
zero values (or lots of the same small value) in every other
byte, and then make the call based on whether those
recurring values are in the even or odd bytes.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
</div>
</blockquote>
I think you are missing my point Im NOT talking about the UTF-16BE
encoding but the UTF-16 encoding (Which is UTF with BOM and can
contain LE or BE data). I have no problem reading or writing the
data but would like to know which is the most compatible choice.
When embedded within an mp3 UTF-16 is not magically decoded by the
operating system, it has to be decoded by the application, and Im
sure there are some applications that can embed BOM LE but not BOM
BE or vice versa. There is also the complication that the Byte order
marks themselves in BOM LE requires unsynchronization if you are
using unsynchronization whereas BOM BE does not, and applications
such as Windows 7 Explorer itself don't understand unsynchronization
making me think that BOM BE is more compatible. <br>
<br>
Paul<br>
</body>
</html>