Monday, December 31, 2018

Disneyland Thesaurus 2018 Year in Review

Although I have been quiet on the blogging front, this year has been the most productive on the thesaurus itself since it came into existence on August 19, 2007. In raw numbers, in 2018 I added 32,471 terms, or 32% of the 101,868 terms of the current total. The number of relationships has more than doubled, from 216,414 to 434,653. But adding terms and relationships is only one aspect of the overall project. The following summary stretches back to the last quarter of 2017, when I resumed work on the thesaurus in earnest.

Identifying and Acquiring Sources
The thesaurus is built around information from reputable sources, so it's always been driven by a need to first build a bibliography of Disneyland sources and then acquire them (mostly as digital surrogates). In years past this has involved a lot of scanning (of Disneyland Lines in particular). This year has been much more focused on acquiring digital objects—tens of thousands of them, in fact.

I maintain two Excel tracking spreadsheets to document my work with sources, one titled Publication Tracking Spreadsheet and the other Web Material Tracking Spreadsheet. Web Material had been stagnant, with only a few of the vintage Disneyland blogs listed and only a little work done years ago to locally save the files. So, one project was to not only expand the index (by copying and pasting blog links and titles and then filling in the dates), but also to save the content for when it eventually goes offline. I am pleased to report that I saved all the Disney-related content from the "big three" vintage Disneyland blogs: Stuff from the Park, Gorillas Don't Blog, and Davelandblog. This involved saving each individual relevant post as an MHT file (which combines the HTML and graphic elements), but also each of the photos individually. I did this for a number of other blogs as well.

I have saved 10% of the Disney Parks Blog posts I've identified as relevant to the Disneyland Resort (4,914). This endeavor is a bit more complicated because of the frequent inclusion of videos, which have to be saved through a separate process. I had initially relied on the sketchy sites where you input the video URL and, after presenting you with some ads and viruses, you get a download link that often didn't work. I researched and found a program called YouTube-DLG which has made it much easier to download videos in bulk. (It is still a chore, however, to extract the video URLs and associate the downloaded files with my saved blog posts.) This blog-saving activity has yielded 54,895 files, totaling 35GB. The volume will rise rapidly as I save additional videos.

Another aspect of digital collection has been with newspaper articles. For the first time since 2010 I made an effort to augment and bring this source type up to date. In 2018 I saved an additional 1,639 newspaper articles. I now have 18,007 newspaper articles and advertisements (with the bulk being from the Los Angeles Times and Orange County Register, and still needing to save the Times from 2001-2018).

Finally, I had long neglected researching in the Anaheim Heritage Room's collection of Disneyland Lines, which they have as a Disney depository of the Walt Disney Archives. Since the Archives wasn't founded until June 1970, and the Lines I was missing were mainly in 1969 and 1970, I didn't think the collection would contain what I needed. I was pleasantly surprised to find how wrong I was. I added images from 21 Disneyland Lines from 1969-1971 and input information from them in the thesaurus.

Thesaurus Work
The bulk of the new terms has come about from two activities. The first is that I decided it would be valuable to include information on the broader Disney universe, both to contextualize existing terms and because you never know when synergy will bring what I call "general Disney" terms into direct Disneyland relevance. (Did Evinrude ever appear at Disneyland prior to Kevin and Jody integrating him into Mickey's Soundsational Parade?) I began this work with the 2016 edition of Disney A-Z and then started fresh with the encyclopedia found on (Although you would expect the online version to be the most complete and up-to-date, I have found in some cases that the 2016 print edition was more thorough. I've been keeping a running tally for a future blog post.)

As of today, the thesaurus has 9,993 General Disney terms. I've tagged them so they can be stripped from any thesaurus outputs, as needed. I still have 3,399 entries to input (which will yield many more terms than that).

The other activity—which contributed to the service anniversary milestone detailed here—has been adding in Cast Member names from lists in Disneyland Resort Lines of recent years. Well, two type of lists. One is the service anniversary list published every month. After five years, the service anniversaries for 15 years and above should be duplicative, but the 10 year names are almost always new to the thesaurus. As of today, the thesaurus has 29,249 Cast Member names. I'm going to go out on a limb to suggest that it is the largest such list outside of a Disney HR system.

I was unfortunate enough to discover that the July 17, 2005, edition of the Disneyland Resort Line contains the names of 18,700 Disneyland Resort Cast Members as of June 11, 2005. In the print edition, you could barely make out the names. But if you had access to an electronic version, you could copy the names and run some find-and-replace actions to end up with an Excel spreadsheet sorted alphabetically by last name. I tried various ways of automating the input of this information, but because I already had so many names in the thesaurus, the tedious prep work led to too many errors for my comfort. I've made use of a program to expand the Windows clipboard (Ditto) and at least make the entry as painless as possible. I've input information on 9,775 people I have linked to a "50th Anniversary Cast Members" term. Many of these people have additional citations from service anniversary lists or telephone directories, or mentions in the Disneyland Line.

I tried a new way of putting the thesaurus to use. One goal is that it could be used to index digital objects, be they photographs or documents or blog posts or what have you. I created a new field (IDX) to try applying the thesaurus in this manner. I indexed photos from 116 posts from Gorillas Don't Blog and 101 posts from Main Gate Admission. It was an interesting exercise, but I want to put more thought into it before continuing further. It did reveal the need for additional terms, such as construction (activity), photographs from Frontierland rivercraft, and photographs from Skyway.

I also took a step toward improving the possibility that I could "complete" the thesaurus. The first was in defining for myself what "complete" means, which is going through a defined set of materials to draw out terms and relationships. In rough terms, that includes going through the entire run of some internal publications (chiefly the Disneyland (Resort) Line), all newspaper articles and advertisements, a sampling of Park guidemaps through the years (preferably at least one per year), Disney A-Z, the Disney Parks Blog, and some other publications and web material. There would certainly be missing information, but I'd be comfortable that I had made a legitimate effort of comprehensively examining a substantial collection of reliable sources. (And, of course, if I reach that goal, I would be more than happy to add from other sources.)

When I first started the thesaurus, I did not source where particular information came from. I quickly realized the problem with this approach and reoriented my work to include this information—both for the term itself and the substantive information about each term. This acted sort of like a citation file a lexicographer might use, but with more encyclopedic-type information. The obvious downside is that this takes a very long time. I have come to a compromise that preserves two essential components of the thesaurus: firstly that it contain terms and relationships covering all aspects of Disneyland from the earliest days to the present, and secondly that the source for terms is documented, as a nomenclature reference.

Now as I go through a source I will add its title to the SRC field for any terms which are mentioned. This project is by-and-large about proper nouns, so I wouldn't necessarily SRC the word food every time I saw it. But I would for Ron Dominguez (or R. Dominguez, or Ronald Dominguez, or Ronald K. Dominguez, as I have also found). Where there is substantive information about a term, I will also include that information in a newly created field titled (for now) TT ("to thesaurus"). As of right now, all the sources I haven't used are essentially closed. For the 1,400 Disneyland Lines I haven't yet gone through, I wouldn't be able to tell you which might have an article on the Main Street Magic Shop or on Fantasmic! This new approach promises to make it possible for me to dramatically decrease the amount of time I spend with each source, but still make the information findable.

When we published Jason's Disneyland Almanac in 2011, we were missing Park hours from 773 dates during that time period. I am pleased to report that the number of days missing Park hours is now down to 34, between January 5, 1997, and June 30, 1999. Having access to the Los Angeles Times through allowed me to see the Calendar section from the 1980s and 1990s, which frequently contained the Park hours and is not available in the paper's text database. For more recent years I went through all the Twitter posts of @DisneylandToday. In addition to being a reliable source publishing hours day-of (always preferred to far in advance), the account also posted updates on the rare occasion of a change. Additionally, I brought the weather information in the thesaurus up to date through 2017.

Finally, I made some progress on cleaning up the hierarchy. This is a never-ending battle as new terms and (especially) concepts are introduced. I know some parts of the thesaurus desperately need work. It is very difficult to see the hierarchy from within the thesaurus construction program itself; I was lucky to have a friend create a way for me to expand and contract the hierarchy at various levels to help me as I make decisions in this vein.

What's next for 2019? Well, tying up some of the loose ends I mentioned (in particular the 50th Anniversary Cast Members and Disney A-Z) will be a big help. I intend to continue improving the hierarchy and going through more sources in the new way detailed above. I would like to finish saving the Disney Parks Blog and bring my newspaper corpus up-to-date. There's still a lot of work to do.

Peter Mark Roget (of Roget's Thesaurus) and I share a birthday, so I have kind of looked to him as a model. Roget began work on this thesaurus in his mid-20s, but didn't publish it until he was in his 70s. (Never mind that I have the benefit of a computer, of course.) That would put me on track to complete it by Disneyland's 100th birthday in 2055. Maybe a preview edition could be ready for the 75th in 2025.

Happy New Year!

Friday, October 26, 2018

100,000 Years

You'd be forgiven for thinking that this post's title referred to the expected wait times next year for the attractions in Star Wars: Galaxy's Edge. Or the amount of time before the Disneyland thesaurus will have a reasonable amount of information. Instead, it refers to another thesaurus milestone.

While it is hard to convey how much information is in the thesaurus, one (incomplete) measure comes from the quantities of certain types of information. There are currently 93,730 terms in the thesaurus and 370,665 relationships among them (hierarchical, equivalent, and associative). Not all terms are created equal, of course. Some will have many relationships and some will have few (possibly because of the sources I have and haven't examined).

One thing that has been very important to me for as long as I've wanted to create the ultimate Disneyland reference work has been recognition of the Disneyland Cast. Back in the late 1990s, there was very little information published for the public from the Cast perspective (with Van France's Window on Main Street autobiography as the best source). When I began the thesaurus project in 2007, I had a feeling that the sheer number of Cast Members would be overwhelming. But I didn't set any sort of a significance threshold for entering names and information about the Cast into the thesaurus as I came across it—and I'm glad I didn't.

As I continued to go through internal sources such as the Disneyland Line and Backstage, I found that some names showed up over and over. It was small reward to occasionally find a reference to someone starting at the Park, another mention years down the road, and finally a "Golden Ears" retirement profile. (I have also sometimes run across their obituaries in the newspaper articles I've saved.) Listings in telephone directories through the years have helped to build profiles of individuals through the years.

As of this writing, I have 24,224 Cast Members in the thesaurus. With this many individuals, and with the small amount of information I often have, it is likely that some people appear under more than one name (such as if a maiden name changed to a married name) or if I had inadvertently lumped two distinct people with the same name into one term. I have tried to take these considerations into account, but sometimes I just have to take a guess. Here is how the Cast is broken down by decades:

1950s Cast Members: 2,436
1960s Cast Members: 3,782
1970s Cast Members: 4,272
1980s Cast Members: 3,471
1990s Cast Members: 4,942
2000s Cast Members: 13,669
2010s Cast Members: 4,315

A different way to quantify the Cast information is to look at the service anniversaries I have documented in the thesaurus, which brings us back to the post's title. This morning I reached 100,000 years of service anniversaries. ONE HUNDRED THOUSAND YEARS! That accounts for the anniversaries recognized by Disney at five-year intervals. I have gone off published service lists. I have also used retirement mentions if they include the years of service. I have not assumed service anniversaries, even if I see that somebody has worked at the Resort in different decades. Here's how they break down by length of service (I only assign the longest anniversary):

5-year Cast Members: 949 [this category has not been published in the Line in years]
10-year Cast Members: 2,955
15-year Cast Members: 1,483
20-year Cast Members: 706
25-year Cast Members: 413
30-year Cast Members: 301
35-year Cast Members: 126
40-year Cast Members: 88
45-year Cast Members: 44
50-year Cast Members: 5
55-year Cast Members: 1
60-year Cast Members: 1

If you've ever worked at Disneyland, thank you for your service!

Sunday, January 28, 2018

Hey, That's My Scan!

Do you remember my blog post from April 17, 2008? Good. It showed what the very second publication of the Disneyland Line looked like. Unfortunately, I still have been unable to track down the very first edition of this longstanding publication. And I recently learned that perhaps Disney doesn't have it available easily internally, either.

The January 13, 2011 Disneyland Resort Line featured this item:

This could, of course, be from any scan of the April 11, 1969, publication. It's cropped a little more than my scan, the color balance is different, the bleed-through from the second page has been corrected, and the hole punch in the "Disneyland" is not there.

What is apparent, however, is the evidence of the lower two hole punches and the overall fading around the edges. (The logo hole punch could have been corrected because it was more visible.) The first thing that caught my attention here was that they "chose" to use an image of the second issue published, rather than the first. When I first saw the item, I was hoping to finally see what the very first issue looked like! It wouldn't make sense to choose the second if you had the first. And then I began examining it and found similarities to the scan I had posted almost three years before this issue.

What do you think?

Wednesday, January 10, 2018

A Preview of a Conceptual Disneyland Resort Thesaurus Output: Matterhorn Mountain

Prompted by Major Pepperidge's question about when the hell this will benefit anyone (paraphrased, to be sure), I threw together a sample of what a print output of the Disneyland Resort Thesaurus could look like. I did manual tweaking to output from the database, but did only minimal review to determine if the quality of the information could be improved.

First, a word about what the output would look like. When most people think of a thesaurus, they think of an alphabetically arranged list of words listing synonyms and, possibly antonyms. But it actually is much richer at capturing not only this kind of relation, but also hierarchy (broader and narrower terms) and other kinds of relations. A thesaurus can also have text fields to, for example, give a thesaurus user an understanding of how the use of a term has changed over time. My thesaurus is very rich in all of these areas.

The most famous thesaurus, Roget's Thesaurus, had its primary organization by subject hierarchy. Go look at an early version of the book over at the Internet Archive to see for yourself. Since I'm not trying to capture the relations between all of human knowledge as represented by words in the English language, I already have a much more manageable field of knowledge to tackle than Roget. The alphabetical index was more of an afterthought to facilitate use of the hierarchy, but in practice this list became the primary way that the thesaurus was referenced.

I would envision both a hierarchical output and an alphabetical. For this little project, I focused on Matterhorn Mountain, since it is old enough and important enough to have a fair number of connections. Here is what a hierarchical page which includes the mountain could look like:
(Note: As this is modeled after an actual reference work, you will need to expand the image.)

As I implied, there is still work to be done in refining the classification. The Fantasyland Break Area, for example, could be part of one of the buildings (such as Fantasyland East Building), and I just haven't yet come across that information. I try to only go as far as the facts of sources yet referenced will take me. And if you look closely you'll see that I'm also behind at keeping up with the latest tenant of what I collectively call the Fantasyland East Castle stores.

You would find much more information in the alphabetical output, as seen in two fanciful pages below:

You'll immediately notice that there are a number of different types of things here, from people (Mattey, Bob, Sr.) to ride vehicles (Matterhorn Bobsleds bobsleds) to attractions (Matterhorn Bobsleds) to companies (Mattel) to generic roles (matte painters).

What do all the codes mean? Well, BT (broader term) and NT (narrower term) are somewhat self-explanatory. UF (use for) is the inverse of USE, and indicates that that is the preferred term, which may be based on official Disney nomenclature, literary warrant, or my own determination of the most widely used term to define a concept. The SRC note field is to capture the precise way that something has been written. This accounts for not only different terminology (Matterhorn Bobsled Ride versus Matterhorn Bobsleds), but also spelling variations and misspellinngs (Matte, Bob versus Mattey, Bob, Sr.). Some of the terms have brief descriptions of what they are, which are drawn from categories within the thesaurus. The decades are drawn from a "Disneyland Resort Era" category, to help me assign terms to their relative periods of significance.

RT stands for related term, and RTI for related term instance. SPOF (which here you'll only find under Mattel) stands for sponsored, formerly. CTBF stands for contributions from, and is a broad relationship to capture any type of contribution to the creation or execution of something, without going into more detail (the inverse is CTBT, contributed to). For example, Mattey, Bob, Sr. contributed to Haunted Mansion, The, but I don't specify what that was. (In this case, it's because I don't know—I got it from his D23 profile.)

DN stands for date note and usually captures either the dates of a person's birth and death or of the operational dates of an attraction.

I hope you've enjoyed this peak behind the curtain. If I just focus on prettying up one page per day, I should be nearing completion as Disney adds a fourth gate in Anaheim and I have to start all over again!

Monday, January 8, 2018

5,000 Pages of Disneyland Lines Thesaurused

Tonight I thesaurused my 5,000th page of Disneyland Line material. (I'll get to what that means in a minute.) Back when I was a teenager, I received a collection of of Disneyland Lines and other Cast Member materials which stretched back to the late 1970s. I considered myself well-read on Disneyland history, but the Lines had countless stories about things I otherwise knew nothing about. They were a window into a Disneyland I did not know and felt I could know no other way. I doubted then that I would ever have access to a complete collection.

Fast forward a few years and my conception for the Disneyland Compendium (the ultimate Disneyland reference work) included the need to methodically comb the entirety of the Disneyland Line run (then 30 years, now approaching 50), as well as other Cast Member publications. My belief in that approach hasn't wavered, and I was fortunate enough ten years ago to be put in touch with a Cast Member who had saved nearly a complete collection and let me borrow his binders to systematically scan. The more I reviewed, the more certain I became of my approach.

While the Lines have been a major focus of the project (as evidenced by the number of pages reviewed to date), they certainly don't tell the full Disneyland story. Other Cast Member publications on which I have relied include the Disneylander (512 pages, largely completed in 2012); Backstage Disneyland (182 pages, 9 issues); and the Inside Disneyland newsletter (240 pages, 65 issues). Other sources include The Disneyland News (260 pages, 19 issues), The "E" Ticket (620 pages, 19 issues), several books (including the 2006 Disney A-Z and the 2000 Disneyland: The Nickel Tour), and 788 newspaper articles and advertisements.

I learned early on in the thesaurus construction process that it was important to have a variety of types of sources, because it ensures the best chance to capture diverse concepts and terms. But with the benefit of time, some of the depth is also starting to come through. I may be entering information about a Cast Member in the thesaurus only to find that I had previously input a story about their arrival, or their retirement, or even occasionally an obituary. My work carefully documenting the Line's terminology helped reveal when the Park began the shift from "Disneyland employees" to "Cast Members."

I've talked a lot about the sources I've used, but haven't yet addressed my use of "thesaurus" as a verb. It means that I have read, analyzed, and transcribed the text for relevant Disneyland-centric portions and entered it into the thesaurus (database). I have also entered any new relationships that the source suggests.

5,000 pages of Disneyland Lines doesn't mean that everything on each of those pages is relevant and has been entered. The activities sponsored by the Disneyland Recreation Club, for example, occur frequently in the Line but are not quite as represented in the thesaurus. I include mentions of Cast Members, but not what those sporting events were. This linkage ensures that if someone were interested in a particular Cast Member, they could still follow the reference to find out more. This came in handy a number of years back with a co-worker whose mom had worked at Disneyland, before dying of cancer when the co-worker was young. I was able to find reference to her on a DRC bowling team, a fact which was new to her.

Let's take a look at the process, using a particular Line (December 24, 1987) that put me over 5,000 pages and is pedestrian but for one detail. This one represents my first (and I believe only) mention in the Disneyland Line. I do feel that there is some irony in that I appeared at the age of 5 and in all the years since of researching and writing about Disneyland, as well as being one of the more voracious readers of the Line, have not made it back.

While I've been writing that I treat the entire issues as a source, the reality is that I break it into manageable units based on the level of "aboutness." Practically for the Disneyland Line, this breaks on an article basis. I generally try to associate the article text with a single term in the thesaurus, which sometimes results in the creation of a new term.

For any publication, my first step is to enter the colophonic information so I know who contributed to the publication and can trace changes in the title and personnel. For example, for a time in the 1970s, the Disneyland Line logo was changed to The Disneyland Line, but other references (such as in the colophon) omitted the definite article. So, in the thesaurus, I've kept the publication as Disneyland Line from its beginnings until it was renamed Disneyland Resort Line with the December 31, 1999 issue.

The first unit following the colophon, as is often the case, is the cover story. For several years, Disneyland had a Children's Christmas Art Contest, for young relatives (children, grandchildren, brothers, sisters, nieces, and nephews) to create Christmas illustrations based on a given theme. I participated a few times, but this was the only time my failure was documented in print.

The cover page, at the beginning of the post, didn't yield any information except for the title.

This Line was a little unusual in that it had pages which folded out, so you see the first- and second-place winners before getting to the story describing them.

Each of these Cast Members has a new term created or an existing term updated, as well as linked to their work location.

Now we get to the heart of the story. I created a new term Children's Christmas Art Contest to serve as an umbrella term for all instances, and then a narrower term (NT) Children's Christmas Art Contest (1987). They were both assigned categories of Cast Member program, currency of former, and Disneyland Resort era relevance (DLRERA) of 1980s. I noted that I was mentioned in this Line. The other children, with one exception, were entered in the plain text field, but were not entered as new terms. The exception was Nate Jurczyk, because I already had him in the thesaurus as a Food Operations Cast Member.

The next page contributed to three main terms: Candlelight Procession & Ceremony (1987); Jones, Velma; and Ebierno, Fred. Town Square; Candlelight Procession, The (1987); Train Station (Main Street, U.S.A.); Main Street; and Disney Employee Choir were also referenced in the SRC field. Interestingly, although I already had 14,000 Cast Members in the thesaurus, I did not have either of the retirees, who had been there for nine and 19 years.

This page yielded little information—only SRCs for Accounting Control; Area Office; Barbara Warren; Publicity; and Disneyland Community Action Team.

Although the 1988 Tournament of Roses Parade did not have a Disneyland entry, it did have Disneyland involvement through the Float Drivers pictured here, and the float co-sponsored by The Walt Disney Company. I had only previously come across the term Float Drivers in the Disneyland Line of June 10, 1976, and had no associated Cast Members. This article contributed information to the 1988 Tournament of Roses Parade term, as well as those for the named/pictured Cast Members.

The Cast Activities department replaced the Disneyland Recreation Club in coordinating extracurricular activities for the Cast. The only thing I used from this page were SRCs for WDI and Studio (Walt Disney Studios).

The final page again contains very little information used in the thesaurus, aside from the colophon which I examined at the beginning of the issue.

The entire process for this issue took about an hour, including transcription. This Line contributed to 100 notes within the thesaurus. Some terms will have multiple notes from this issue, such as a SRC for Candlelight Procession, The (1987) and definitional information in Candlelight Procession & Ceremony (1987). The Line contributed information on 29 Cast Members, including the two children who I know went on to the role later in life. 14 of these Cast Members were new to the thesaurus, while the other 15 had appeared before—sometimes from multiple sources. For example, Jurczyk, Dee appeared frequently because of her role with Cast Activities. My earliest reference to Byrne, Bill came from the November 1970 Inside Disneyland, when he completed an introductory training program.

In all, Disneyland Lines have contributed to information in 21,585 terms across the thesaurus. It's been quite a resource to this project and I look forward to continue combing through them. Maybe in the next thirty years I'll have the opportunity to appear again!