Linking Name Authorities and Wikipedia Entries

I love the idea of linked data. As a comic book, science fiction, and fantasy geek the relationships between different works, titles, series, and the characters in them are fascinating to me and the idea of being able to explore all those relationships just by following links is basically Nerdvana. I will also confess to having spent entire evenings following link to link to link on TV Tropes  (it’s roughly as addictive as breathing oxygen) and reading Night Court/ Doctor Who crossover fan fiction on Archive of Our Own (yes, it really is a thing).

But I try to read papers on linked data and they start to talk about FRBR and triples and RDF and my eyes glaze over. I just don’t get it very well without seeing it work. But I found a way to at least create links between data that doesn’t require a massive database or knowing SPARQL or having access to paid subscription tools. I don’t think it’s real linked data because it’s one way and there’s not a described relationship, but it’s a start, and it’s something I understand so I’m going to roll with it for now and try to build on this later.

The short version is that there is a really easy way to add Library of Congress Control Numbers (LCCNs, the unique identifiers for Name Authority Records, or NARs) to Wikidata pages, which are in turn linked to Wikipedia pages. So that’s what I’m going to describe here.

The first thing we need to do is find a Wikipedia entry without an LCCN. The LCCNs are shown in the Authority Control template of the Wikipedia page. As a counter-example of what we’re looking for, I chose the Wikipedia page for Weird Al Yankovic. Of course Weird Al’s page has his LCCN listed, because he’s Polka Megastar Weird Al Yankovic. The arrow shows the location of the LCCN, which links to the Name Authority Record. Some Wikipedia pages don’t have the Authority Control template, and that is fixable, but involves editing the page and is beyond what we’re doing here so for now we’re just looking at pages with that template.

weird_al

In keeping with the novelty music theme, I found a Wikipedia page without an LCCN. That page is for Barnes & Barnes, the musical duo best known for their 1978 hit song “Fish Heads” (the video was directed by Bill Paxton, no joke, which is another reason I want linked data to take off because the world needs to know this when they Google Bill Paxton.). As you can see below, there’s a MusicBrainz ID, but no LCCN for Barnes & Barnes on their Wikipedia page. We’re going to change that.

barnes_and_barnes

We need to get the LCCN so we search authorities.loc.gov to find Barnes & Barnes Name Authority Record. Once we find the right record we copy the LCCN, but please make sure you have the right record so you don’t add an incorrect LCCN. In this case the 670 is for Barnes & Barnes 1980 album Voobaha (the reissue includes “The Vomit Song”), so we’ve got the right record. I’m trying to keep this open and accessible to everyone, but if you’re not familiar with NARs and searching authorities it might be wise to get a foundational knowledge of those before trying this. Copy the LCCN (shown here as LC control no: ).

barnes_and_barnes_record.PNG

And then go back to the Barnes & Barnes Wikipedia page and click on the Wikidata item link.

barnes_and_barnes_wikidata_item

This takes you to the Barnes & Barnes Wikidata page, which is where we’re going to add the LCCN we copied earlier. Scroll down to Identifiers, and then click on + add statement at the bottom.

barnes_and_barnes_wikidata_page.PNG

This will open up the box, just type in LCCN and the Library of Congress authority ID will appear. Click on that option.

barnes_and_barnes_LCCN.PNG

Then paste the LCCN you copied earlier into the text box. For this entry we need to (1) delete the space between the n and the 9, since URLs don’t like spaces. If there is no space in the LCCN you copied, you can just paste the number into the box as is. Then (2) click on save. If you’re not logged in, you’ll have to do a captcha to save.

LCCN_added.PNG

And once we’ve saved, we can see the LCCN in the Wikidata page. It’s a link, and you can click on it to make sure you’ve entered the right number (recommended just in case, I really suggest avoiding creating incorrect links). And there it is.

lccn_added.PNG

This won’t appear in the Wikipedia page for Barnes & Barnes right away (it seems to take about 24 hours to refresh, but that’s just a guess), but when it does we get this:

fish_heads_woohoo.PNG

Success! I didn’t add the Worldcat Identies, I suspect that was done automatically when the LCCN was added, but I’m not sure and there may be more people interested in adding links to Barnes & Barnes than I suspect. And now the Wikidata page (and the Wikipedia page through it) is linked to the Library of Congress Name Authorities. The link is only one way (we didn’t create a link from the NAR page to the Wikidata page), and there’s no relationship described so I’m pretty sure it’s not linked data, but it’s at least something I understand and can tinker with for five or ten minutes while I’m waiting on something else.

On the other hand so what, who cares? It’s a fair question. And I don’t know what this accomplishes at the moment, beyond the obvious (anyone who does know, please comment). But it at least connects NARs to a big open database, and hopefully people experimenting with that database will find a use for it. Until then, I’m happy just doing it for the sake of doing it.

If you’re looking to experiment with this yourself, for some reason it seems like a lot of musical groups don’t have their LCCN on their Wikipedia page. And they’re easy to pull out of authorities.loc.gov, just do a keyword authorities (all) search for music group and then pick one and check the Wikipedia page.

Thanks!

Advertisement

Creating Name Authority Records

I’m going to start off by saying what this post is not. It is not comprehensive instructions on creating Name Authority Records (NARs). It is not a how-to guide on creating simple NARs. It is most emphatically not all you need to know to create NARs. It is just a quick description of the mechanical steps that are (roughly) what you would do to create an NAR. So don’t take this as a guide, or even a reference because things change over time, but with that very lengthy disclaimer out of the way if you’re curious about how people create NARs, this might be helpful.

You do need to go through NACO training to add records to the file, but you can experiment with this up to the point of adding them to the authority file if you’re curious. If you try it and like it the training is done remotely for new institutions, and there is an option to become a part of a funnel if that works better for you than becoming a contributing institution (it would depend on how many records you think you’ll generate). You can find more information on all this at https://www.loc.gov/aba/pcc/naco/training/, and everyone I have worked with through NACO has been fantastic.

I work in OCLC Connexion by the way, so  that’s where my images are from. Hopefully it generalizes to other people’s cataloging software.

Step 1 is to catalog something that has an author without an existing NAR. For me, this usually means something older, like this 1909 pamphlet in our Abraham Lincoln Collection. The author is a U.S. Representative from Illinois, so when someone put the author in the 100 field they found the life dates and built the 100 field like an NAR. They didn’t create the NAR to control it, but they used the basic formatting and information you would find in an NAR, and they could do that because he was well known enough that information was available. This makes it quite a bit simpler to create the NAR, and sometimes you only get the author’s name (or even less) from the work in hand.

01

So you search the authorities to see if Rodenberg has an existing NAR. I did and he didn’t, but it’s also a good idea to search for personal names in OCLC records to see how which name he used when writing his works. He might also have written something under a variant name that might have an NAR attached (Bill Rodenberg, for example). You go through the OCLC records and the statement of responsibility in the 245 field to determine what name he actually used when writing, and once you have that figured out you can begin creating your NAR. I use the Macro in OCLC.

02

It’s literally as simple as dropping your cursor in the 100 field (or 700 field if you’re doing an additional creator), and running the macro, which gives you the below. This is the bare minimum NAR, and if we didn’t have those life dates and the name was all we could find, that would satisfy the requirements for an NAR if it was a unique string that wasn’t the same as another NAR. If there is already another author with that NAR assigned, you would need to do something to differentiate them (adding a middle name or year of birth and/or death are probably the two most common ways). In this example, we would also need to add a 670 citing OCLC for the life dates, and the 046 field with those years. But that would be enough to create a valid NAR.

03

But that’s no fun. We can add a lot of other information too, and I’m a big enough believer in linked data I’ll usually take the time to do it, especially if the creator has relevance to my area of Illinois or especially Illinois State University (the people who sign my paychecks). The record below is considerably more fleshed out, and has an

-024 field which links it to his entry in the Biographical Directory of the United States Congress (link)
-046 field giving his full date of birth and death (should be in edtf format)
-370 field giving his place of birth and (in subfield b) place of death
-373 field giving an affiliated institution
-374 field giving his profession
-375 field giving his gender (be careful with this one, mis-gendering people is not cool)
-377 field showing which language he is associated with
-378 field showing a fuller form of his name (if he had written some of his works as William Augustus Rodenberg I would have included this in the 100, but I couldn’t find any where he did)
-A 670 field showing the work in hand I generated the NAR for, and a second 670 showing where I got all that other information from

How much of  this you add depends on how much research you want to do. I had a Wikipedia article handy so I could add things quickly (and could have added more information if I wanted to or thought it was relevant).

04

We cross-check each other’s NARs here, so we save them to the online file and email each other numbers to check, but generally once you’ve done all that optional (but fun) stuff and are satisfied with your record you check again to make sure no one else has added an NAR for the creator you’re working on in the meantime (all this checking is because multiple NARs for the same creator are really bad for authority control), and then you submit it to the authority file. This is as easy as clicking a button in OCLC (normally it would be blue, I was logged into an account that doesn’t have permission to add authority files at the time).

06

It then goes into distribution, gets an ARN and LCCN, and in about a week will be in the public view for the authority file.

05

Once it’s in distribution you  get the joy of going through OCLC and controlling the headings in the bib records and seeing them turn blue (I know how to have a good time!). We don’t do it for non-English language bib records (other countries don’t necessarily use Library of Congress NARs and dislike us changing their records), but for some of the more historical authors it can really pull a lot of variant 100 fields together under a common heading.

07

And then we get into the super optional area here, but if I used a Wikipedia article to generate an NAR, I try to remember to drop the LCCN into Wikidata so the  two are linked. It’s super simple to do, you just go into Wikidata and paste the LCCN into the field and hit save.

09

Takes less than a minute and you’re done.

10

So this is super basic and very general (and I omitted or glossed over a huge amount of things and special circumstances you need to be aware of it you’re creating NARs), but I think it describes the basic mechanical process of how a creator gets from the item in hand to the Authority File well enough. Questions, comments, suggestions always welcome.

Thanks!

Genre headings for finding aids in the library catalog

We recently had an Archon problem. Due to some compatibility issues with a PHP update (I understand there is an update in the works) finding aids created or edited after a certain date no longer displayed container lists. Staff could still see them by going into the administrator side of things, but they wouldn’t display in the public view.

This was a problem (that we’ve since fixed by migrating to a local platform) and patrons could still see the rest of the information, but it made me worry about what might happen to our ability to display finding aids to the public if Archon ever went down hard. Since we create MARC records for our finding aids anyway, I asked our cataloging unit to add a genre term for “finding aids” to the MARC records in our OPAC.

The Library of Congress Genre/Form terms includes “finding aids” as an official term. By adding this to our MARC finding aid records using a 655 _ 7 Finding aids $2 lcgft field we tied our finding aids together with a controlled vocabulary term.

We can’t do a search by just a genre heading, but by using Rayfield (the name of the archives) as a keyword (it shows up in the preferred citation field in every finding aid record) and the genre term finding aids we could create a pretty simple canned search for all of our finding aids (that only returned the finding aids) in the catalog.

For anyone who wants to see it, the search is:
https://vufind.carli.illinois.edu/vf-isu/Search/Home…

A more thorough explanation of the LCGFT can be found here:
http://id.loc.gov/authorities/genreForms.html

And the complete list of forms here:
https://www.loc.gov/a…/publications/FreeLCGFT/freelcgft.html

under the Genre/Form Terms (PDF, 77 pages, 492 KB)link (warning, PDF):
https://www.loc.gov/aba/publications/FreeLCGFT/GENRE.pdf

It’s not an Earth-shattering idea, but I thought it was handy and seemed worth mentioning. Your mileage may also vary depending on how your OPAC and/or display is set up. We had a little less than a hundred finding aids to add the genre term to, and I was able to provide a list of OCLC numbers, so we just did it by hand. If you have a lot of catalog records you might want to look into a more automated way to go about this though.

Monographs in Archival Collections

A lot of archival collections come in with monographs. Either works written by the creator of the collection, or used in their research, or just left in the office/attic/basement whatever and thrown in the boxes with everything else.

Archives usually don’t want to retain mass published monographs. They’re bulky, they’re commonly available (really, who’s going to drive three hours and sit in a reading room to get something they can pick up at the local library or bookstore), and once something is in the archives it’s usually there for a very, very long time. So you’re paying to store those books for a very, very long time and taking up valuable shelf space.

But what do you do with them?

One approach is to photocopy the title pages, put all the title pages in a folder in the collection, and get rid of the books. Maybe you check if it’s the last copy or rare, but the point is the books don’t stay in the collection. Or if you have an attached library you can ship them over to the stacks.

But what if you want to preserve the intellectual place those books had in the collection, still put them in the stacks, and also create links between the finding aid and the catalog records? One of our donors, Bruce “Charlie” Johnson is a clown and has donated not only his personal papers, but is in the process of donating his extensive library of books on clowns and clowning.

I also wanted to get the books in the library catalog and shelved rather than in boxes. I also wanted to create links between the archival material and the books. That way patrons could find the books  through the catalog (instead of just the finding aid), and staff could pull them from the shelves rather than sorting through boxes. Here’s what I ended up doing:

We use Archon for our finding aids. The noting the books in the finding aid part was simple. Under “Related Publications” in Archon I added a list of the books. For the ones I have cataloged, I also included the call number and a link out to the catalog record. Like so:

3 Ball Juggling by Ken Benge. GV1558.B46 1982 Link to catalog record

That’s a line from the finding aid, or you can see the whole thing here: http://spcfindingaids.library.illinoisstate.edu/index.php?p=collections/findingaid&id=9&q=

This shows someone looking at the finding aid that the book is part of the archival collection, the call number is right there if they want staff to pull it, and if they want more details about the book (edition, publisher, whatever) they can click and go to the catalog record. The books without this information I haven’t gotten cataloged yet, but at least they’re in the finding aid. And I threw in a paragraph in the scope note explaining all of this to patrons.

That was simple enough, because a finding aid is (at least in part) just a big list of stuff anyway. They’re good at showing that sort of information. The trickier part was getting the catalog records to document that a book was part of that collection, but once you know how to it’s not that difficult. If you go to the catalog record (link above) and click on “staff view” (next to the little yellow triangle) in the upper right corner, you’ll see the MARC codes.

The crucial piece here is the 730 field. The 730 is an “added entry-uniform title” field. You can see the full explanation with all the notes at https://www.loc.gov/marc/bibliographic/bd730.html

Basically it’s there so that if the item being cataloged is part of something larger, you can provide that information. In this case, the book 3 Ball Juggling by Ken Benge is part of the Bruce “Charlie” Johnson Collection. But that’s not true of all copies of 3 Ball Juggling, so we add a $5 INS field to show it only applies to our local institution. Now the catalog knows that the book at ISU is part of the archival collection (just make sure the title in the 730 matches the title of the collection exactly or the catalog will think you’re referring to something else), but another copy of 3 Ball Juggling at another library isn’t. We’re part of a cataloging consortium, so we have to be careful to make these distinctions even in our local catalog.

We also add a 541 field in our holdings record which says “Special Collections copy donated by Bruce ‘Charlie’ Johnson.” The 541 field is the “immediate source of acquisition” note, or in English, donor note. It tells us where the book came from, and we make it public so patrons can see that as well. It’s probably not as useful to the catalog, but it’s highly visible to patrons (shows up even in the brief search results) and it’s another link that might point them towards the archival collection or other books.

So I’ve got the finding aid linking the books in the archival collection to the catalog, and the catalog understands that the books are part of the archival collection. If someone searches our catalog for Bruce Charlie Johnson, it returns the books and the archival collection. And patrons don’t have to go the finding aid to figure out we have the books. Otherwise a patron searching in the catalog for Bruce Charlie Johnson would just get the archival collection (we put records for our finding aids in the catalog), have to go to the finding aid (in Archon) to find out about the books, and then go from the finding aid back to the catalog for the books. Which seems like a lot of unnecessary steps when I can do this instead.

I want to stress that I’m not arguing that archives should begin keeping monographs because this is possible. Unless you have an associated library and want to retain the books in question anyway, I wouldn’t change current practice. And I haven’t done the studies to prove that patrons even care about this. But I’m cataloging the books anyway because we have a large collection of circus monographs, and it’s not much more time to add them in a list to the finding aid and provide the call numbers and links, and it’s another access point that might get someone to something they didn’t know they were looking for, and it gets them out of boxes and on shelves which is better for our space management. I will probably keep doing this given the minimal extra time involved.

Any questions, comments, or advice? All are welcome, and I hope you found this useful.