[ID3 Dev] Genre suggestion

Pat Furrie pfurrie at hotmail.com
Wed Sep 20 06:22:59 PDT 2006


>http://en.wikipedia.org/wiki/Musicologist

Very interesting... thank you for the link.


>By two genres merging I meant that one of them becomes invalid.  Without
>a versioning scheme you will supply correct genre information if a "bit"
>has been invalidated.

Can you give an example of something like that?

Here is a clumsy analogy of how I see it:
With the chaning of the classification of the Pluto from planet to just 
plain old solar rock (or whatever they decided to call it), is there a need 
to go remove all reference of Pluto being a planet from every book?  No.  It 
is a historical fact that Pluto, while maybe not now having the moniker of 
"planet," was considered a planet at one point, and thus there is valuable 
information in that fact.

A more "music classification specific" example might be:
Later this month a super-hurricane grinds up through the Gulf of Mexico and 
completely removes New Orleans from the earth, with no chance of being 
resurected.  One song category may have been "New Orleans," but with city 
gone, do we eliminate that descriptor?

No.  The music that existed before doesn't change just because something 
with which it was related to no longer exists.  The idea of "New Orleans" 
still exists, or at least the memory, and that's continues.

There well may be a situation where a genre or classification could become 
invalid, so I need a little help here.  However, my first-blush guess is 
that there is no good need to eliminate any of the category descriptors from 
such a list.  If a term were to morph into something, that's pretty easy: 
change the name associated with the bit in the bit map.  Same if alterations 
are made to the spelling.


>How do you ensure against invalid combinations of bits being set, for
>example, Male Solo-Vocalist and Female Solo-Vocalist both being set?

Quick answer: the publically contributed music database.

Couple of thoughts: There doesn't need to be some fancy rulebase which 
spells out some specific logic regarding "if x then not y," at least not 
from a computer programmatic standpoint.  Regarding your example, perhaps 
you have a long piece of music, with two very seperated sections, each 
having a solo component, one with a man and the other a woman.  It doesn't 
seem to fit the description of "duet" (though perhaps you also flag it as 
such), but it could easily be seen as having two soloists; I don't see that 
it is necessarily an invalid combination.  Could a piece of music have more 
than one seeminly controdicatory mood descriptors set?  What about "happy" 
and "sad"?  Sure.  And that also tells us more about the piece of music, 
too, and that's good.


>What kinds of scenarios would a bitmap approach address that having
>string based genres [taken from predefined sets] could not address.
>For example:
>Jazz, Female Vocalist, Soundtrack, Mellow

Efficiency and ease-of-use.

If I have a bit-mapped space containing 700 descriptors, and a particular 
piece of music has 20 descriptors set, there is nothing saying I couldn't 
describe it with a string based genre field.  However, 20 descriptors as a 
string would require more space than all 700 descriptor bits would.  Your 
example takes 37 bytes (not counting commas) for four items, which works out 
to roughly 9  bytes per term.  With 20 terms, this would be 180 bytes, and 
already exceeds the total number of bytes (150) which might be recommended 
to have reserved for the bit-mapped space.  That same bitmap space can 
accomodate any number of descriptors up to the limit of defined terms (in 
this example, 700), but that would require 6300 bytes (per piece of music).  
If I were to reserve that much space for each piece of music in my current 
database (9,376 total), the additional string space would come to nearly 60 
additional megs, just to hold that data.  For the bit mapped space required 
of all my current library, it is just over 1.4 megs.

It is also more efficient from a public database point of view.  What is 
stored in the database is the bits, but not how they are expressed.  Yes, 
the bits represent particular descriptors, but how is that represented on 
your particular computer?  You can accept default values for the text 
strings that represent the particular bits to you, or you can alter them to 
suit your own methods.  Some people like all caps, some people do all lower 
case, some spell terms diffferently, and some use a different langauge.  Not 
important?  Well, it is for some people.

However, still, hard drive space is cheap, and that may not be a big deal.  
But how does one manage to apply dozens of descriptors (or genres) to a 
whole library of music?  If you have something already labeld "rock" and 
something already labeld "soundtrack" but decide each needs to have the 
other genre as well, it is natural to just append the additional genre to 
the end of each one.  That gives you one with a genre of "rock soundtrack" 
and the other "soundtrack rock".  Yes, doing a search for "soundtrack" will 
pull up both.  That's good.  But you now have two additional listings in 
your genre pick-list.  Who cares about the pick list?  Those who want to 
avoid typos creeping in, and those who can't remember just how something was 
termed.

It may not be too bad with just two terms like rock and soundtrack, but when 
you get into a couple dozen terms in order to fully realize a rich 
description of the songs, this begins to be ridiculous.  Your pick list ends 
up having every combination of many, many words, and becomes unwieldly.  And 
I don't know how that a string based method fares in a public database 
environment; only by limiting the number of genres to one does that become 
reasonable (and isn't what most of the current ones are doing?).

What about making changes to your genre scheme?  With a bit-mapped scheme, 
you change the name of the single string which represents a given bit, and 
you are done.  Now all your songs appear with the changed term, no mistakes, 
and super fast, as the files themselves aren't changed.  They don't need to 
be.  With a string based scheme, you have to go write the changes to every 
file.  Which files?  Well, your program will have to scan through the 
database to find them, or maintain a secondary database of all that info, 
but still have to scan through it for each of the terms to be changed... and 
hope that it doesn't inadvertantly change some other terms that has a part 
of it with the same string as the part you are changing.  Ugly, slow, and 
difficult.  Yes, you could do it, but why?

Pat



---------------------------------------------------------------------
To unsubscribe, e-mail: id3v2-unsubscribe at id3.org
For additional commands, e-mail: id3v2-help at id3.org



More information about the ID3v2 mailing list