[ID3 Dev] UTF16LE BOM FF FE incorrectly being identified as start of MP3 Audio

Paul Taylor paul_t100 at fastmail.fm
Sun Apr 15 03:14:48 PDT 2007


Hi, in my id3 library I made the decision to find the start of the 
MP3Audio first, and then read the ID3tag starting from the start of the
file upto the start of the MP3Audio. I did it this way rather than
reading the ID3Tag Size from the ID3 tag header and using this size as 
the basis of the tag. I did this because there are so many application 
that make mistakes when writing ID3 tags, whereas applications that 
encode mp3s do the encoding correctly, so as long as I identify the 
start of the mp3 correctly there is no problem with accidently 
overwriting mp3 info with the ID3 tag.

This is how I find the MP3 Audio

WHILE BYTES EXIST
{
    Get next byte
    IF  match two bytes:0xFF and 0xE0 (or greater)
    {
        IF can map these and next two bytes to a MPEG frame header 
(mapping to valid values (BitRate,Samplingrate ....)
        {
            IF can map this frame to a XING Frame
            {
            	FOUND MATCH Break
	    }
            ELSE
            {
                skip to end of this MPEG frame as reported in header
                IF( can map next four bytes to a MPEG frame header)
		{			
			FOUND MATCH Break
		}
		ELSE
		{			
			No match Found
            		CONTINUE
		}
            }

        }
        ELSE
        {
            No match Found
            CONTINUE
        }
    }
    ELSE
    {
        No match found
        CONTINUE
    }

}

So in summary I find a potental match, map it to an MPEG frame if it
maps to a Xing frame I'm done, if not I try to match the next frame to
see if its an audio frame, if it I'm done, if not these are not valid
mp3 audio so carry on searching. The only time I find a potential 
invalid match is when the ID3 tag is not unsynchronised, now I thought 
this happened quite rarely for just frames such as APIC frames but Ive 
now realised that if the text frames are encoded as UTF16 with Little 
Endian BOM then every text frame will  start with FF FE and if it isnt 
unsychronised there is a real chance that the size read in the text 
frame (when read as an MPEG frame) could endup with me getting
an invalid match to a second frame.

So my question is  is there something about the byte combination FF FE
that would allow me to identify that it is definently NOT the start of
MP3 Audio frame.

thanks paul


---------------------------------------------------------------------
To unsubscribe, e-mail: id3v2-unsubscribe at id3.org
For additional commands, e-mail: id3v2-help at id3.org



More information about the ID3v2 mailing list