Coding Conventions

CAO has a few simple required coding conventions for the EAD finding aids that it searches.  The requirements below ensure that CAO functions as intended. None of these coding conventions run contrary to best practices for DA:CS or EAD. EADs that are indexed in CAO must be “well-formed” and valid (either against the 2002 schema or the v.1 DTD* – soon, we anticipate that CAO will be able to accommodate EAD3). 

In general, your EADs should be clean of unnecessary spaces and line breaks, especially within tags. This is not an issue if you are using a tool like ArchivesSpace to generate your EAD. If you are encoding by hand with a text editor, validate it. 

If you are using a tool like ArchivesSpace, AtoM, Archivists’ Toolkit, Archon, etc. to generate EAD, many of the coding conventions are enforced by the software.***

1. Your finding aid must have a creator, title, abstract, biographical/historical note and a scope/content note:

A finding aid/collection must have a creator. If you use ArchivesSpace, this would be fulfilled with an agent with role as creator.

In EAD, something like this:

<origination label=”Creator“> <persname rules=”dacssource=”ulan“>Hudiakoff, Andrei, 1894-1985</persname> </origination>

A finding aid/collection must have a title.

<archdesc level="collection">
    <did>
      <repository>
        <corpname>Western Connecticut State University Archives and Special Collections</corpname>
      </repository>
      <unittitle>Andrei Hudiakoff Published Illustrations</unittitle>
      <origination label="Creator">
        <persname rules="dacs" source="ulan">Hudiakoff, Andrei, 1894-1985</persname>
      </origination>
      <unitid>MS 066</unitid>

Then make sure you have an <abstract>, <scopecontent> and <bioghist>.

2. Have a unique eadid:

<eadid countrycode="US" mainagencycode="US-ctdabn">ctdbn_ms066_hudiakoff</eadid>.

Your filename should be the same as your eadid (though, the eadid should not have a file extension).  In the example for the eadid at the top of this section, the filename of that EAD should be: ctdbn_ms066_hudiakoff.xml.  Also, keep in mind that as you are sharing your data in this database with other repositories using ids and filenames, your filename/id should attempt to be unique.  If it is not, another repository’s finding aid with the same name may replace yours in the database. If this does happen, no worries.  We’ll work with you to rename the EAD.  If your file has no eadid, it will be rejected from the indexer.**         

Filenames and <eadid>s should be primarily alphanumeric – underscores are OK.  You should not use spaces, hyphens, quotation marks, question marks, plus signs, periods  – except before the file extension- , ampersands, etc.  Clean filenames/ids (for example “rg045_rogers”, “mss0003_42”, “WillingtonUpton”, etc.) help in reducing errors that can occur in the software when file names look like a coding syntax.  

3. It MUST NOT have a normal attribute in a <unitdate> with a start date greater than the end date in a date range: (or a date range greater than 2000 years) This will cause your file to be rejected by the indexer.  A finding aid that contains something like this: <unitdate normal=”1890/1871″>1890-1871</unitdate> or <unitdate normal=”10/2023″>10-2023</unitdate> will not be indexed.

4.  At least one level declared for in the inventory and a title and/or date for each component:

  • The inventory must have at least a level designation on your first <c>, like <c01 level=”series”>, for example (ASpace forces you to set a level); 
  • you should either have a title or a date for a component.  If you do not, the system will add “untitled” to your component.
  • we have noted a number of repositories using <unitid>s as titles of series or subseries.  On testing and rollout, we altered a number of finding aids with this issue and changed their <unitid>s to <unittitle>s, but will only do this for the CAO rollout.

That’s it.  If you’re worried that this is a lot of work, you shouldn’t – again, these conventions above are all in line with EAD and DA:CS best practice. 

——————————————

For CAO ArchivesSpace users: if you are using CAO to harvest your ASpace  data, every resource must be public, have an agent with the role set to creator, and include these notes: an abstract, a biographical historical, and a scope and contents or the resource will not be exported. And ASpace users need to include an eadid just like everyone else – see the note above!  We name your file the same as the eadid.

FOR ASPACE harvests: If your files are being harvested from an ArchivesSpace instance and there is no eadid for a resource, we will generate one based on your repository code and the ASpace resource number and then save and name the EAD file with that eadid value; for example: ctrepoid_noEADID_3_45 would be the eadid for the EAD file, ctrepoid_noEADID_3_45.xml.  Where ctrepoid=your repository’s id, 3=your repository number in the ASpace instance, and 45=the resource number in the ASpace instance.  In viewing your files, you see a filename with noEADID, you can figure out which of your ASpace resources don’t have the eadid by the filename CAO’s EAD has been assigned.

* we’ve found a few anomalies with EAD v.1 – if you are still using it, you should probably update your encoding to EAD2002.

** If for some reason your file with no eadid gets indexed, it will cause all your repository’s files to fail. We will have to delete that null eadid file from the database.

*** it has come to our attention that in-line invalid XML tagging within ArchivesSpace fields is permitted and will not prevent the EAD export. The implication is that ASpace may export invalid XML. Invalid XML can’t be indexed by CAO. If you use in-line tagging, try to export a pdf version of the finding aid. If ASpace throws an error, you have invalid tagging somewhere.