[ID3 Dev] ID3v2.4.0 native frames

Ben Bennett fiji at ayup.limey.net
Wed Aug 24 06:57:00 PDT 2005


On Wed, Aug 24, 2005 at 01:42:19PM +0100, Ion Todirel wrote:
> Ben, look on this frame (or others):
> 
> <COMM>
> Text encoding              $xx
> Language                    $xx xx xx
> Short content descrip.  <text string according to encoding> $00 (00)
> The actual text             <full text string according to encoding>
>  
> $00 is a byte (new byte(); (0)) separator between "Short content descrip." and "Text"?

The reason the terminator is specified as $00 (00) is that the
terminator may be one byte or it may be two depending on the encoding.
Section 4 of the "Main Structure" document
(http://www.id3.org/id3v2.4.0-structure.txt) outlines the rules:
     $00   ISO-8859-1 [ISO-8859-1]. Terminated with $00.
     $01   UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
           strings in the same frame SHALL have the same byteorder.
           Terminated with $00 00.
     $02   UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
           Terminated with $00 00.
     $03   UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.

So for encodings $00 and $03 the terminator is $00, and for $01 and
$02 (the wide character Unicode encodings) the terminator is $00 $00.
 
> For some frames is not specified String Encoding, for example for frame "WXXX":
> 
> <"WXXX">
> Text encoding     $xx
> Description         <text string according to encoding> $00 (00)
> URL                   <text string>
> 
> URL is Encoded with Latin1Encoding ("ISO-8859-1") ?
> 
> or for "UFID" frame, what encoding shuld use to decode "Identifier" from byte to string? "ISO-8859-1" ?

 From the same section of the spec:
  If nothing else is said, strings, including numeric strings and URLs
  [URL], are represented as ISO-8859-1 [ISO-8859-1] characters in the
  range $20 - $FF. Such strings are represented in frame descriptions
  as <text string>, or <full text string> if newlines are allowed. If
  nothing else is said newline character is forbidden. In ISO-8859-1 a
  newline is represented, when allowed, with $0A only.

Note that in the WXXX case it should be URL encoded anyway.

Also note the rules about newlines and the distinction in the spec
between <text string> and <full text string>.

The spec is unlear about what to do with newlines when the encoding is
not ISO-8859-1.

			       -ben

---------------------------------------------------------------------
To unsubscribe, e-mail: id3v2-unsubscribe at id3.org
For additional commands, e-mail: id3v2-help at id3.org



More information about the ID3v2 mailing list