ISO 25964-1 schema version 1.4 dated 2011-09-26
Each XML instance is necessarily a full thesaurus specification, occasionally filtered for a language, concept group, …. An XML instance is never an update/change specification. Whether we have a full or a filtered version should be documented in the “current” VersionHistory.
A formal value is set in the @scope attribute on the root element.
The date created and date modified are strongly recommended but formally optional attributes of several elements (classes). They should be given in YYYY-MM-DD format, in line with ISO 8601. In the case of Thesaurus, another option is to use the simple attribute "date", in which case the value will be interpreted as applying to the date when the version to which it is attached was issued. The same format should be used.
There should not be more than 1 (one) "created" date. See also VersionHistory
The optional attribute @dc:source of Thesaurus and ThesaurusTerm, which can be used to note the reference work or individual who contributed the term in question.
@language should be given as an alpha-2 code selected from ISO 639-1 if present in that list, or an alpha-3 code from ISO 639-2 if not. These codes may be extended where necessary with the additional codes described in RFC 4646[45] and listed in the IANA subtag registry[35] (see 12.4.5).
All ThesaurusConcept-s must have a unique thesaurus concept identifier.
All ThesaurusArray-s must have a unique thesaurus array identifier.
All ConceptGroup-s must have a unique concept group identifier.
The classes SplitNonPreferredTerm and CompoundEquivalence enable representation of complex concepts by a combination of terms, as described in 8.5.
The classes SplitNonPreferredTerm and CompoundEquivalence enable representation of complex concepts by a combination of terms, as described in 8.5.
@source notes the reference work or individual who contributed the term in question
The full thesaurus is being provided in the exchange document.
Only the updated parts of a complete thesaurus (compared to its previous version) are provided.
Note: The schema is not fully ready to support this value yet.
A complete custom selection has been provided of the thesaurus.
Details are specified in the VersionHistory whit thisVersion = true
Examples:
- certain subset of the thesaurs languages
- only specific concept groups are available
All ThesaurusTerm-s (PreferredTerm, SimpleNonPreferredTerm and SplitNonPreferredTerm) have a unique thesaurus term identifier.
Metadata sheet
Identifier for the thesaurus as a whole
Person or organization who contributed to the thesaurus
Spatial or temporal coverage of the thesaurus
Person or organization primarily responsible for making
Any date associated with the thesaurus
The date the thesaurus was created
A date when this version was modified
An account of the thesaurus
The file format or physical medium of the thesaurus
Codes showing languages supported by the thesaurus
It should be given as an alpha-2 code selected from ISO 639-1 if present in that list, or an alpha-3 code from ISO 639-2 if not. These codes may be extended where necessary with the additional codes described in RFC 4646 and listed in the IANA subtag registry (see section 12.4.5 of the standard).
Entity responsible for publication
A related publication
Copyright or other rights information
Resource from which the thesaurus was derived
Used by a thesaurus variant to refer the URI of the original thesaurus for which it is a variant.
Index terms indicating the subject content
Title of the thesaurus
The genre of the vocabulary, e.g. "thesaurus"
Each concept in the thesaurus is represented by one preferred term per language, and by any number of nonpreferred terms. The notation, scope note and broader/narrower/related term relationships apply to the concept as a whole, rather than to its preferred term. A unique identifier may be assigned to each concept. In some systems, the concept is identified only by its preferred term or by the identifier of its preferred term, but this has disadvantages if the spelling of the term changes.
This schema requires the identifier on the concept.
There shall not be 2 preferred term with the same xml:lang attribute value.
A Thesaurus array cannot contain the same member concept twice.
A Thesaurus array cannot contain the same member array twice.
A concept group cannot containe a member concept twice.
The VersionHistory optionally allows any copy of a thesaurus to carry a record of versions or editions that have been created.
Although the class is optional and might not be needed when only one version exists, adoption is highly recommended as soon as there is more than one. Each version should be identified by an identifier or a date or both.
When a concept has an array of narrower concepts, the relationship to this array is not given here. To avoid redundancy, the relationship is given in one direction only, from ThesaurusArray using the child element hasSuperOrdinateConcept.
This is an optional attribute of ThesaurusConcept and ThesaurusTerm, which records whether they are, for example, approved, candidates, superseded or deprecated (see 13.6.2).
This is an optional attribute of ThesaurusConcept, ThesaurusArray and ConceptGroup (see 12.1.3 and 12.2.5.2).
If the thesaurus uses an expressive notation, then applying it to node labels will allow them to be shown in the correct place in hierarchical displays.
If there is no expressive notation, some other means should be found of outputting the node labels correctly in the display, such as a sort code attribute that is not displayed to users. In this event, the same attribute will be required at the display level for preferred terms.
The system of notation used for ConceptGroup may be quite distinct from that used for ThesaurusConcept, and one of these systems may be present without the other.
A true/false label indicating whether the concept is at the top of a hierarchy, i.e. has no broader concepts
The term used as a label for this concept. There should be one preferred term per language.
Alternative terms by which this concept could be sought
The co-occurrence of the SimpleNonPreferredTerm and the PreferredTerm under the same ThesaurusConcept implies the existence of an Equivalence relation where that SimpleNonPreferredTerm has an identical language specification as the PreferredTerm.
A note defining or clarifying the scope of the concept within this thesaurus
A note recording changes to this concept within this thesaurus
A note of any other kind relating to this concept
An additional attribute of a concept
The wording of the term.
The identifier and date attributes of ThesaurusTerm are essential for the provision of a good updating service because if the spelling of a term changes, a constant Term identifier facilitates continuity during successive updates. The use of a concept identifier is strongly recommended to promote interoperability among networked search applications.
Notes the reference work or individual who contributed the term in question
This is an optional attribute of ThesaurusConcept and ThesaurusTerm, which records whether they are, for example, approved, candidates, superseded or deprecated (see 13.6.2).
A note giving definitions of a term, not necessarily limited to the scope of the concept labelled by the term in this thesaurus
A note recording changes to this term within this thesaurus
A note for use by the thesaurus editors during the editing process
The model includes classes CustomConceptAttribute and CustomTermAttribute for custom attributes of concepts and terms. These enable recording of custom information about concepts and terms.
These are included as separate classes rather than as normal attributes so that the administrator of the thesaurus management system can specify the values of custom attributes that can be assigned. The classes have an attribute customAttributeType, allowing the administrator to specify which type of attribute is being used. Values of customAttributeType should normally be taken from a controlled list.
A yes/no flag to show whether the term may be excluded from some forms of output, e.g. for misspellings of a term.
Specification of a kind of equivalence relationship. This will normally be USE, linking the source SimpleNonPreferredTerm to the target PreferredTerm
Identifier of the SplitNonPreferredTerm.
Identifier of (one of ) the preferred term(s).
The typical (and implied) associative relationship role is: RT
Associative Relationship role types should form a controlled vocabulary.
If subtypes of RT are defined, hierarchical levels should be separated by a solidus (forward slash): /
The identifier of the related thesaurus concept of the associative relationship.
Example: "sport event" RT "sport manifestation"
- ./role = RT
- ./isRelatedConcept = identifier of the concept with Preferred Term "sport event"
- ./hasRelatedConcept = identifier of the concept with Preferred Term "sport manifestation"
The identifier of the thesaurus concept for which the associative relation is specified.
Example: "sport event" RT "sport manifestation"
- ./role = RT
- ./isRelatedConcept = identifier of the concept with Preferred Term "sport event"
- ./hasRelatedConcept = identifier of the concept with Preferred Term "sport manifestation"
For custom relationship types, the text given in the "role" attribute should be composed of (a) the name of the parent relationship type, followed by (b) the symbol forward slash "/", and finally (c) the name of the custom relationship type. If necessary, custom relationship types can be subdivided further in the same way.
The text in the 'role' attribute of HierarchicalRelationship may be one of the following, where NTX indicates some further subdivision of NTI:
NT
NT/NTP
NT/NTI
NT/NTG
NT/NTI/NTX
BT
BT/BTP
BT/BTI
BT/BTG
BT/BTI/BTX
The identifier of a thesaurus concept identifed by a hierarchical relationship.
Example: in the relationship "cow milk" BT "milk":
- ./role = BT
- ./isHierRelConcept = identifier of concept with Preferred Term "cow milk"
- ./hasHierRelConcept = identifier of concept with Preferred Term "milk"
The identifier of a thesaurus concept for which the hierarchical relationship is defined.
Example: in the relationship "cow milk" BT "milk":
- ./role = BT
- ./isHierRelConcept = identifier of concept with Preferred Term "cow milk"
- ./hasHierRelConcept = identifier of concept with Preferred Term "milk"
The identifier of a top level concept.
This concept is the top level concept of the related concept (see ../isTopConceptOf) according to a hierarchical relationship of the thesaurus.
The identifier of a thesaurus concept.
The related top level concept is identified by ../hasTopConcept.
@language should be given as an alpha-2 code selected from ISO 639-1 if present in that list, or an alpha-3 code from ISO 639-2 if not. These codes may be extended where necessary with the additional codes described in RFC 4646[45] and listed in the IANA subtag registry[35] (see 12.4.5).
The date when the note was last modified
The association between Note and ThesaurusConcept enables any note for one concept to refer to any other concept in the thesaurus. This capability is particularly useful for scope notes (see 5.3).
The person(s) or document(s) from which the definition was taken
The value 'true' indicates that the concepts and sub-arrays of the array are ordered. This imposed order is reflected by the XML document order of the XML document instance.
This is an optional attribute of ThesaurusConcept, ThesaurusArray and ConceptGroup (see 12.1.3 and 12.2.5.2).
If the thesaurus uses an expressive notation, then applying it to node labels will allow them to be shown in the correct place in hierarchical displays.
If there is no expressive notation, some other means should be found of outputting the node labels correctly in the display, such as a sort code attribute that is not displayed to users. In this event, the same attribute will be required at the display level for preferred terms.
The system of notation used for ConceptGroup may be quite distinct from that used for ThesaurusConcept, and one of these systems may be present without the other.
@language should be given as an alpha-2 code selected from ISO 639-1 if present in that list, or an alpha-3 code from ISO 639-2 if not. These codes may be extended where necessary with the additional codes described in RFC 4646[45] and listed in the IANA subtag registry[35] (see 12.4.5).
The identifier of the thesaurus concept under which this array appears.
(case that this array is not a sub-array)
The identifier of the thesaurus array under which this thesaurus sub-array appears.
(case that this array is a sub-array)
If the thesaurus array is ordered the member identifiers are listed in the order required for the array.
The identifier of a member concept of this thesaurus array.
The identifier of a sub-array of this Thesaurus Array.
A label identifying the type of group, e.g. "microthesaurus", "theme", or "subject category"
This is an optional attribute of ThesaurusConcept, ThesaurusArray and ConceptGroup (see 12.1.3 and 12.2.5.2).
If the thesaurus uses an expressive notation, then applying it to node labels will allow them to be shown in the correct place in hierarchical displays.
If there is no expressive notation, some other means should be found of outputting the node labels correctly in the display, such as a sort code attribute that is not displayed to users. In this event, the same attribute will be required at the display level for preferred terms.
The system of notation used for ConceptGroup may be quite distinct from that used for ThesaurusConcept, and one of these systems may be present without the other.
The identifier of the concepts belonging to the concept group.
A label providing a verbal description of the group. A group should have one label per language.
The identifier of a ConceptGroup that is a sub-group of ../hasSuperGroup.
The identifier of a ConceptGroup that is a super-group of ../hasSubGroup.
versionNote can be used to explain the nature of the version, e.g. whether it is an updated version, an extract or a translation, or to explain its relationship to other versions.
currentVersion is a Boolean (true/false) flag to indicate for each version whether it is still current or whether it has been superseded or withdrawn. More than one version can be current simultaneously.
thisVersion is a Boolean flag to indicate which of the versions listed is the one to which this history is attached.
Should be given as an alpha-2 code selected from ISO 639-1 if present in that list, or an alpha-3 code from ISO 639-2 if not. These codes may be extended where necessary with the additional codes described in RFC 4646[45] and listed in the IANA subtag registry[35] (see 12.4.5).
The identifier simple type is specified and and used instead of the DC identifier because dc:identifier does not have a simple type. As such the dc:identifier does not allow to specify uniqueness constraints as needed on all elements except Thesaurus.