[ID3 Dev] Synchronised lyrics/text frames and Byte-Order-Markers

Mathias Kunter mathiaskunter at yahoo.de
Wed Feb 10 00:01:56 PST 2010


Hi Martin,

I'd say the spec is clear about this: if the text encoding signals UTF-16, then each sync MUST have its individual BOM, since the specification requires that a BOM must always be present when encoding UTF-16 strings. Since each sync is stored as a terminated string, each string must have a BOM, and each string MAY use a different BOM.

It isn't recommended to mix big and little endian strings within the same frame, but your implementation should be able to handle different BOMs when decoding a SYLT frame (or any other frame with multiple UTF-16 encoded strings) - "Be conservative in what you do; be liberal in what you accept from 
others."

> All syncs can have their own individual BOMs.
> This would be crazy - to say the least.

Why? Because it takes, let's say, 25.000 songs * 200 sync strings * 2 bytes BOM = around 10 MB disk space for an entire music collection? Storing album cover artwork takes much more space.

Best regards,
Mathias K.






________________________________
Von: Martin Benkert <martin.benkert at gmail.com>
An: id3v2 at id3.org
Gesendet: Montag, den 8. Februar 2010, 21:02:31 Uhr
Betreff: [ID3 Dev] Synchronised lyrics/text frames and Byte-Order-Markers

Hi,

the synchronized lyrics frame SLT/SYLT supports text encodings, and also
Byte-Order-Markers (BOMs).

It has a 'Content Descriptor' which is simply an encoded string and might
have a BOM, just like all other encoded frames.

But it also has a binary structure of items called 'syncs', which have
this structure

     Terminated text to be synced (typically a syllable)
     Sync identifier (terminator to above string)   $00 (00)
     Time stamp                                     $xx (xx ...)

The first item is plain text. I guess it is encoded according to the
specified encoding of the frame.

But what about BOMs here? In a typical file there might be some hundred
syncs (if they are syllables). If they all have individual BOMs this will
be a lot of data. However, if they do not have BOMs at all, it is not
clear how for instance Little-Endian encoding might be specified.

There are four ways it might be intended to be

- syncs do not have any BOMs at all. This would be nice, but it implies
   Big-Endian byte-order.
- the very first sync might have a BOM that is also applied to all
   following syncs. This appears to be the best way. It also allows to
   specify Little-Endian byte-order. But the ID3 standards do not specify
   this in any way.
- all syncs can have their own individual BOMs. This would be crazy - to
   say the least. But it would comply with the ID3 standards (at least in
   an implicit way).      
- the BOM used by 'Content Descriptor' is also used for the syncs. Might be
   nice - but it beaks all conventions: Other frames with two encoded strings
   (COM/COMM, COMR, GEO/GEOB, ULT/USLT) all might have individual BOMs for
   text fields - only ID3v2.4 specifies 'All strings in the same frame SHALL
   have the same byte order'.

This somehow indicates that the lyrics in a SLT/SYLT frame appears to be a
real oddity with respect to BOMs.

Question here is: Is there a recommended way to specify the byte-order of
the lyrics of synchronized lyrics frames SLT/SYLT?

Thanks
Martin


---------------------------------------------------------------------
To unsubscribe, e-mail: id3v2-unsubscribe at id3.org
For additional commands, e-mail: id3v2-help at id3.org

__________________________________________________
Do You Yahoo!?
Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen Massenmails. 
http://mail.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.id3.org/pipermail/id3v2/attachments/20100210/dd8fba1f/attachment.html>


More information about the ID3v2 mailing list