Tuesday, November 18, 2008

Thesaurus Statistics: November 18, 2008

As part of a continuing series (see my first post on this from July), I present an updated table of thesaurus statistics:

The explanations again, with slight modification:
  • Total number of terms: Every term in the thesaurus (obviously). This includes 20,809 dates (e.g., 2008-07-27). Since these have relationships and notes just like other terms, I have no problem including them in the count.
  • Non-postable terms: These are what is known as the lead-in vocabulary. Disneyland Omnibus would lead one to the postable term Omnibus.
  • Top terms: A key part of the thesaurus is the hierarchy, and this is the number of terms at the top. Now, with the structure I'm using it shouldn't be as high as it is. I'd like to have only 10-15 top terms. A large number of the top terms, then, are terms I've picked up while going through sources that I still need to fit into the conceptual structure.
  • Orphan terms: These are a subset of the top terms that don't have relations to other terms. This number should be 0.
  • Subject categories: This is another means of classification outside of the hierarchy. For instance, for people and organizations in the thesaurus I have a "Role" classification. The Roles include Atmosphere Talent, Cast Member, lessee personnel, and Retlaw personnel.
  • Hierarchical relationships: Hierarchy is used for structure and to facilitate searching. Fantasy on Parade has broader terms events, promotions, and programs by names (a term which then functions something like an authority list) and parades.
  • Equivalence relationships: The Omnibus example above is an equivalent relationship--where I've defined that one term can be used for two terms. This needn't be confined to nomenclature issues. If I wanted lower specificity, I could have Omnibus (along with Fire Engine, Horseless Carriage (Red), etc.) point to Main Street Vehicles. But with this project I've been much more inclined to address distinctions than to lump things together.
  • Associative relationships: These types of relations call attention to other terms that might be of interest. In the Library of Congress Subject Headings, these are just RT--related terms. While I do denote some relations this way, I've also defined a fair number of more specific relationships. For example, Dominguez, Ron PFMD [performed as/in] Davy Crockett (walk-around character). Fantasia Gardens PRED [had predecessor] Motor Boat Cruise. (In time I might get more specific and create a Motor Boat Cruise Dock term.)
As you can see from the numbers, I've been quite busy over the past few months!

3 comments:

Vintage Disneyland Tickets said...

This stuff makes my head spin! How in the world do you sleep? I would be endlessly trying to sort those 36,996 terms in my head. Watch those "orphaned" terms young man, I know those are driving you crazy - LOL! Keep up the great work!

Major Pepperidge said...

Whoo-ee! Looks complicated. You'll need to put plenty of photos of hot babes in to make the book sexier!

The Viewliner Limited said...

Happy Thanksgiving... wishing all the best for you and your family, Richard.