[ID3 Dev] Accessibilty extension draft is posted

Scott Wheeler wheeler at kde.org
Fri Jun 16 07:23:02 PDT 2006


On Wednesday 14 June 2006 16:02, Chris Newell wrote:
> >Two things that jump out to me as bits that won't work:
> >
> >- You can't embed arbitrary audio into an ID3 tag and expect for it to be
> >decoded properly since it messes up synching.  For instance, with MPEG
> > audio, when you hit an MPEG synch frame, the audio player will just play
> > the stuff there like it's the first bit of audio content.  The rest of
> > the tag will be ignored.
>
> I discovered this to my cost when trying out a prototype:-) However, using
> the ID3v2 unsynchronisation scheme appeared to solve this.
>
> The draft proposal recommends that unsynchronisation is applied but perhaps
> this should be a mandatory if AudioText frames are present.

Unfortunately MPEG is the simple case.  Granted, the primary target of ID3 
files is MPEG audio, but they are used for some other things (FLAC comes to 
mind).  The absence of explicit unsynchronization for non-MPEG formats 
usually doesn't cause problems just because statistically it's not likely to 
accidentally have a properly formatted header.  However, if you're 
intentionally putting them in there, it could be problematic.

I'm not sure what a real solution is -- I'd be tempted to just use 
low-sampling frequency PCM, but that has its downsides too...

Also, speaking of synchs, the image in the draft shows a synch safe integer 
for the size, which would be really strange in an ID3v2.3 tag since they're 
not used there.

> >- Equivalent Frame ID doesn't work for frames that allow multiple
> > instances of the same frame.
>
> Couldn't you use multiple instances of AudioText frame with the same
> Equivalent Frame ID?
>
> It's true that in the current proposal you cannot assume a one-to-one
> relationship between a specific text frame and a specific AudioText frame
> if there are multiple instances with the same Equivalent Frame ID and
> language code.
>
> Would a satisfactory solution be to imply this relationship (if required)
> from the order in which they are found within the frame?

There's no guarantee that external taggers don't reorder frames when they 
write tags, so a third party tool just updating the tag would break things.

> >You also might want to consider if you want to do anything
> >special for frames that contain multiple, distinct strings.
>
> I had a hard think about this before coming up with the simple solution
> provided for frames like the COMM frame. My conclusion was that multiple
> audio clips were not necessarily helpful to the client user interface so
> the additional complexity might not be worthwhile.

One option might be instead of creating individual frames for each 
corresponding text frame to create a dictionary of string -> audio pairs.

That would, incidentally get around another problem that I just thought of:  
updating.

With the current draft, if you sent me a file with the genre set to "Jazz" and 
a corresponding audio text frame, if I set it to "Blues" the content would be 
out of synch.  Using a dictionary approach instead would mean that a lookup 
for "Blues" would fail and (appropriately) there would be no corresponding 
audio text.

> >(I must say that I'm somewhat sceptical of the uses of the extension in
> >general, vs., say, screen readers, but I'll take the time to read the
> > paper you just sent later on.)
>
> The paper does give a rationale for the proposed approach compared to the
> use of Computer Generated Speech.
>
> My view (and I'd be happy to be proved wrong) is that producing good
> Computer Generated Speech on low profile devices like MP3 players is quite
> hard whereas the implementation of AudioText frames is really simple.

That occurred to me after I sent this, but the thing that most occurs to me is 
kind of a critical mass argument.  For this sort of information to be useful 
in common practice it would require a large scale adoption -- i.e. having 
this on 1% of MP3s wouldn't make devices as a whole usable.  A possible 
solution would be to also develop an application that can do text generation 
and automatically write these fields.

With all this I don't mean to be overly negative -- just kind of playing 
devil's advocate from an engineering perspective...

-Scott

-- 
The three chief virtues of a programmer are: laziness, impatience and hubris.
--Larry Wall

---------------------------------------------------------------------
To unsubscribe, e-mail: id3v2-unsubscribe at id3.org
For additional commands, e-mail: id3v2-help at id3.org



More information about the ID3v2 mailing list