XSD schema for MeSH

The National Library of Medicine distributes the entire 25,000-node tree of MeSH descriptors (Medical Subject Headings) in XML form. Although they supply a DTD, many XML tools these days are designed to use XML schemas (aka XSDs) only.

I’ve converted the MeSH DTD into an XSD, starting with an automatic conversion using the W3C’s dtd2xsd.pl script, and then manually correcting the output. I haven’t done much XSD for a while so I’d be grateful for any feedback on it. I’ve included an explanation of the structure of MeSH because it has some subtleties that can make querying it rather confusing.

A note on MeSH’s structure

One thing I didn’t entirely realise about MeSH before I embarked on this project was that it’s actually hierarchical along two entirely orthogonal axes. Say what now? Put another way, there’s a hierarchical structure both between and within the descriptors.

Each descriptor (D) is composed of one or more related concepts (C), which can each in turn be referred to by one or more terms (T). For example, consider descriptor D27.505.954.122.085, Anti-Bacterial Agents:

D Anti-Bacterial Agents
  C Anti-Bacterial Agents
    T Anti-Bacterial Agents
    T Antibacterial Agents
  C Anti-Mycobacterial Agents
    T Anti-Mycobacterial Agents
    T Antimycobacterial Agents
  C Antibiotics
    T Antibiotics
  C Bacteriocidal Agents
    T Bacteriocidal Agents
    T Bacteriocides

In general then, the concepts account for subtle differences in meaning that are too fine to use as the basis for a whole new descriptor, and the terms account for different ways of describing the same concept (e.g. spelling or phrasing variations). This hierarchy is represented explicitly within the XML schema.

However, the descriptors themselves are organized into a hierarchical structure of (apparently) unrestricted depth, and these relationships are only implicitly shown in the XML via the ‘tree numbers’ of the descriptors themselves. For example, the parent of Anti-Bacterial Agents [D27.505.954.122.085] is Anti-Infective Agents [D27.505.954.122], and its children include Antitubercular Agents [D27.505.954.122.085.255] and Leprostatic Agents [D27.505.954.122.085.777]. All of these ultimately descend from Chemical Actions and Uses [D27]. It’s all in the numbers, see — a bit like CATH codes and EC numbers.

But wait, there’s one more level of complexity. Viewed from the point of view of the tree numbers, MeSH is a forest of proper trees, but each descriptor can have more than one tree number. This means that from the perspective of the descriptors themselves, MeSH isn’t a tree but a directed graph of parent-child relations. It’s not a tree because a descriptor with multiple tree numbers has multiple parents. And it’s cyclic, because you can follow parent-to-child paths and end up back at the descriptor you started at. This results in some surprising circuits. For example, Morals [F01.829.500] is the parent of Ethics [F01.829.500.519] on the Behavior and Behavior Mechanisms [F01] tree, but Morals [K01.316.630] is a child of Ethics [K01.316] on the Humanities [K01] tree. Watch out for infinite loops if you write any code to traverse MeSH by descriptor.

Note that while the descriptor-concept-term hierarchy is enforced by the schema, the tree number hierarchy can’t be, although it is pretty easy to check using substring matching. Note also that there are other elements in MeSH that don’t fit into this structure, such as pharmacological actions, related terms and subheadings, but let’s save at least some material for a later post.

Andrew.

The Schema

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns="uri:mesh"
  targetNamespace="uri:mesh"
  elementFormDefault="qualified">

  <xs:element name="DescriptorRecordSet">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="DescriptorRecord" minOccurs="0"
maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="DescriptorRecord">
    <xs:complexType>
      <xs:sequence>
        <xs:sequence>
          <xs:element ref="DescriptorUI"/>
          <xs:element ref="DescriptorName"/>
        </xs:sequence>
        <xs:element ref="DateCreated"/>
        <xs:element ref="DateRevised" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="DateEstablished" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="ActiveMeSHYearList"/>
        <xs:element ref="AllowableQualifiersList" minOccurs="0"
maxOccurs="1"/>
        <xs:element ref="Annotation" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="HistoryNote" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="OnlineNote" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="PublicMeSHNote" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="PreviousIndexingList" minOccurs="0"
maxOccurs="1"/>
        <xs:element ref="EntryCombinationList" minOccurs="0"
maxOccurs="1"/>
        <xs:element ref="SeeRelatedList" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="ConsiderAlso" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="PharmacologicalActionList" minOccurs="0"
maxOccurs="1"/>
        <xs:element ref="RunningHead" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="TreeNumberList" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="RecordOriginatorsList"/>
        <xs:element ref="ConceptList"/>
      </xs:sequence>
      <xs:attribute name="DescriptorClass" use="optional">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="1"/>
            <xs:enumeration value="2"/>
            <xs:enumeration value="3"/>
            <xs:enumeration value="4"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

  <xs:element name="ActiveMeSHYearList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Year" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="AllowableQualifiersList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="AllowableQualifier" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="AllowableQualifier">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="QualifierReferredTo"/>
        <xs:element ref="Abbreviation"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="Annotation" type="xs:string"/>

  <xs:element name="ConsiderAlso" type="xs:string"/>

  <xs:element name="Day" type="xs:integer"/>

  <xs:element name="DescriptorUI" type="xs:string"/>

  <xs:element name="DescriptorName">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="String"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="DateCreated">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Year"/>
        <xs:element ref="Month"/>
        <xs:element ref="Day"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="DateRevised">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Year"/>
        <xs:element ref="Month"/>
        <xs:element ref="Day"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="DateEstablished">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Year"/>
        <xs:element ref="Month"/>
        <xs:element ref="Day"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="DescriptorReferredTo">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="DescriptorUI"/>
        <xs:element ref="DescriptorName"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="EntryCombinationList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="EntryCombination" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="EntryCombination">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="ECIN"/>
        <xs:element ref="ECOUT"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="ECIN">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="DescriptorReferredTo"/>
        <xs:element ref="QualifierReferredTo"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="ECOUT">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="DescriptorReferredTo"/>
        <xs:element ref="QualifierReferredTo" minOccurs="0"
maxOccurs="1"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="HistoryNote" type="xs:string"/>

  <xs:element name="Month" type="xs:integer"/>

  <xs:element name="OnlineNote" type="xs:string"/>

  <xs:element name="PublicMeSHNote" type="xs:string"/>

  <xs:element name="PreviousIndexingList">
    <xs:complexType>
      <xs:sequence maxOccurs="unbounded">
        <xs:element ref="PreviousIndexing"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="PreviousIndexing" type="xs:string"/>

  <xs:element name="RecordOriginatorsList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="RecordOriginator"/>
        <xs:element ref="RecordMaintainer" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="RecordAuthorizer" minOccurs="0" maxOccurs="1"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="RecordOriginator" type="xs:string"/>

  <xs:element name="RecordMaintainer" type="xs:string"/>

  <xs:element name="RecordAuthorizer" type="xs:string"/>

  <xs:element name="RunningHead" type="xs:string"/>

  <xs:element name="QualifierReferredTo">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="QualifierUI"/>
        <xs:element ref="QualifierName"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="QualifierUI" type="xs:string"/>

  <xs:element name="QualifierName">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="String"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="Year" type="xs:string"/>

  <xs:element name="SeeRelatedList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="SeeRelatedDescriptor" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="SeeRelatedDescriptor">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="DescriptorReferredTo"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="TreeNumberList">
    <xs:complexType>
      <xs:sequence maxOccurs="unbounded">
        <xs:element ref="TreeNumber"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="TreeNumber" type="xs:string"/>

  <xs:element name="ConceptList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Concept" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="Concept">
    <xs:complexType>
      <xs:sequence>
        <xs:sequence>
          <xs:element ref="ConceptUI"/>
          <xs:element ref="ConceptName"/>
          <xs:element ref="ConceptUMLSUI" minOccurs="0" maxOccurs="1"/>
        </xs:sequence>
        <xs:element ref="CASN1Name" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="RegistryNumber" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="ScopeNote" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="SemanticTypeList" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="RelatedRegistryNumberList" minOccurs="0"
maxOccurs="1"/>
        <xs:element ref="ConceptRelationList" minOccurs="0"
maxOccurs="1"/>
        <xs:element ref="TermList"/>
      </xs:sequence>
      <xs:attribute name="PreferredConceptYN" use="required">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="Y"/>
            <xs:enumeration value="N"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

  <xs:element name="ConceptUI" type="xs:string"/>

  <xs:element name="ConceptName">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="String"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="ConceptRelationList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="ConceptRelation" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="ConceptRelation">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Concept1UI"/>
        <xs:element ref="Concept2UI"/>
        <xs:element ref="RelationAttribute" minOccurs="0"
maxOccurs="1"/>
      </xs:sequence>
      <xs:attribute name="RelationName" use="optional">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="NRW"/>
            <xs:enumeration value="BRD"/>
            <xs:enumeration value="REL"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

  <xs:element name="Concept1UI" type="xs:string"/>

  <xs:element name="Concept2UI" type="xs:string"/>

  <xs:element name="ConceptUMLSUI" type="xs:string"/>

  <xs:element name="CASN1Name" type="xs:string"/>

  <xs:element name="PharmacologicalActionList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="PharmacologicalAction" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="PharmacologicalAction">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="DescriptorReferredTo"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="RegistryNumber" type="xs:string"/>

  <xs:element name="RelatedRegistryNumberList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="RelatedRegistryNumber" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="RelatedRegistryNumber" type="xs:string"/>

  <xs:element name="RelationAttribute" type="xs:string"/>

  <xs:element name="ScopeNote" type="xs:string"/>

  <xs:element name="SemanticTypeList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="SemanticType" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="SemanticType">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="SemanticTypeUI"/>
        <xs:element ref="SemanticTypeName"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="SemanticTypeUI" type="xs:string"/>

  <xs:element name="SemanticTypeName" type="xs:string"/>

  <xs:element name="TermList">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Term" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="Term">
    <xs:complexType>
      <xs:sequence>
        <xs:sequence>
          <xs:element ref="TermUI"/>
          <xs:element ref="String"/>
        </xs:sequence>
        <xs:element ref="DateCreated" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="Abbreviation" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="SortVersion" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="EntryVersion" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="ThesaurusIDlist" minOccurs="0" maxOccurs="1"/>
      </xs:sequence>
      <xs:attribute name="ConceptPreferredTermYN" use="required">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="Y"/>
            <xs:enumeration value="N"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="IsPermutedTermYN" use="required">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="Y"/>
            <xs:enumeration value="N"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="LexicalTag" use="required">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="ABB"/>
            <xs:enumeration value="ABX"/>
            <xs:enumeration value="ACR"/>
            <xs:enumeration value="ACX"/>
            <xs:enumeration value="EPO"/>
            <xs:enumeration value="LAB"/>
            <xs:enumeration value="NAM"/>
            <xs:enumeration value="NON"/>
            <xs:enumeration value="TRD"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="PrintFlagYN" use="required">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="Y"/>
            <xs:enumeration value="N"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="RecordPreferredTermYN" use="required">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="Y"/>
            <xs:enumeration value="N"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
    </xs:complexType>
  </xs:element>

  <xs:element name="TermUI" type="xs:string"/>

  <xs:element name="String" type="xs:string"/>

  <xs:element name="Abbreviation" type="xs:string"/>

  <xs:element name="SortVersion" type="xs:string"/>

  <xs:element name="EntryVersion" type="xs:string"/>

  <xs:element name="ThesaurusIDlist">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="ThesaurusID" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="ThesaurusID" type="xs:string"/>

</xs:schema>
Advertisements

One Response

  1. […] you import the XSD schema for MeSH into SQL Server. Well, you don’t actually need to, but it will allow SQL Server to validate […]

Comments are closed.

%d bloggers like this: