![]() |
Text Encoding Initiative |
Tagging Guide for EETS electronic editions |
Home |
1. Overall structure of an EETS e-editionAn EETS e-edition may contain all of the following components:
These components are organized and encoded in such a way as to facilitate the automatic definition and processing of links between the various components. It should thus be possible to present on the screen corresponding parts of a text in different manuscript versions and in the edited text, to automatically align text and translation, to associate entries in the bibliography or notes with discussion elsewhere in the text. Linking is most conveniently accomplished by using the ID/IDREF mechanism of XML, which normally implies that all links be confined to a single XML document. This is easily accomplished using the standard TEI document structure, which includes support for multiple texts combined into a single object. The structure proposed for the EETS pilot e-edition may be summarised as follows. The entire edition is tagged as a single <TEI.2> element, containing (as usual) <teiHeader> followed by a <text> element. The <teiHeader> contains descriptive metadata for the entire edition, (including details of any codes used in more than one of its components such as manuscript hand identifiers). The <text> element groups together, at the top level, the following:
The attributes ID and TYPE are used to identify the functions of each <text> element as follows:
Within each text component, the <div> element is used to mark all further subdivisions. The <divGen> element is used to mark where `virtual' subdivisons are to be generated. These elements use the ID and TYPE attributes in a similar way, to indicate the identity and function of the subdivisions. They also occasionally use the N attribute where it is necessary to supply a heading or label for a division because one is not otherwise available. The following special values are currently defined for the TYPE attribute on <div> elements:
The <divGen> element is used to mark where `virtual' subdivisons are to be generated. The kind of subdivision to be generated is indicated by the TYPE attribute. The following special values are currently defined for this attribute:
2. Tagging the editorial partEach division of the editorial part of the edition should be tagged as a separate<div> element. In general, the presence of a subheading in the printed original source is a clear indication that a new division has begun. If the new division is subordinate to the current division, the new <div> element is enclosed within its parent; if it is a sibling, remember to close the current <div> before opening the new one. The typography or numbering scheme will usually indicate unambiguously whether or not the new division is subordinate. Note however that numbering of text divisions is generated automatically, and should not be copied into the tagged text. The ID attribute must be used to supply an identifier for each <div> element to which reference is made by a <ptr> element elsewhere. Identifiers are made up according to a fairly self-evident scheme (see the mssdesc.xml file for examples). Tag the title or heading of each division with the <head> element. Following the <head> element, there must be at least one <p> or <list> or other container element in a division. Cross references to other parts of the edition should be made using either the <ptr> or <ref> element, depending on whether the form of the cross reference is to be given explicitly in the tagged text. The value of the TARGET attribute on the <ptr> or <ref> should correspond with the value of the ID attribute on the item being referred to. For example, a reference to the bibliographic item Blenkinsop (1989) will appear as <ref target="Blenkinsop-1989">Blenkinsop (1989)</ref>. The bibliographic description itself will be found in a <bibl> element within the <div type="bklist"> contained by the <back> matter for the whole edition, tagged as <bibl id="Blenkinsop-1989">>. A reference to another 3. Tagging the edited textIn the edited text, the logical structure of the text is represented. Major divisions of the text are marked using the <div> element; at present, only the preface is so marked. Paragraphs are tagged with the <p> element, carrying an id attribute. Line number references are tagged with <lb> and page references with <pb>. As in transcripts, textual footnotes are tagged with <ptr> elements; descriptive footnotes are tagged with <pn> elements. 4. Tagging transcriptionsEach transcription is marked as a distinct <text> element. Its unique identifier (in the form AW-MS-x-trans where x is the manuscript siglum) is supplied as the value of the ID attribute. Each major division is enclosed by a <div> element, with a unique identifier in the form x-yyy where x is the manuscript siglum and yyy is a code for the section concerned, pref for the preface. The N attribute is used to supply a heading for the division if one is required and none is present in the transcript itself (in which case, it would appear as a <head> element within the <div>) For example, the transcript of ms N (the Nero manuscript) begins as follows: <text id="AW-MS-N-trans"> <body> <div id="N-pref" n="preface"> ... Each manuscript page being transcribed is enclosed by a <page> element, the ID attribute of which supplies its folio reference in the form x-fzz where x is the manuscript siglum and zz is the manuscript folio number. This identifier is also used as a key from which references to the page image files are generated. Insert an <lb> empty element at the start of each line, but do not insert line numbers; they will be added automatically. Paragraph divisions within the transcripts are not marked using the <p> element, since these rarely nest properly within pages. Instead, a <pp> element is used to indicate the point at which a paragraph division begins in the edited text, as further discussed below. For example, the start of the Nero transcript continues as follows: <page id="N-f1r"> <lb/> <pp target="P1"/> Within the body of a transcript, use the <ed> tag to enclose editorial additions, e.g. expanded abbreviations or missing letters, which are to be rendered as italics. Use the <add> tag to enclose other editorial additions, e.g. supplied words, which are to be rendered within square brackets. Represent the tyronnian et as &et;. Represent the crossed thorn as &yt. Represent rubricated initial letters using the appropriate entity, e.g. &Ric for rubricated initial R. Represent special characters thorn, eth, and yogh by entitiy references þ ð and &yogh; respectively. Mark the point of attachment for textual footnotes by a <ptr> empty element, supplying the footnote number prefixed by the manuscript identifier, a hyphen, and the letters fn as the value of its target attribute. For example, assuming that footnote 2 in ms N refers to line 12 of page f1r, there will be an element <ptr target="N-fn2"/>at some point following the <lb> element which marks the start of line 12 in the text. The body of the textual note is supplied as the content of a <note> element with the same key used as the value of its id attribute, for example as <note id="N-fn2" n="f1r/12"> .... </note>. All footnote bodies are grouped together in a single <div type"notes"> element within a <back> element for the <text> containing the transcript. The folio/line number reference for the target of the note should be supplied as the value for the N attribute on the <note> element, as shown above. The label indicating the page to which the note belongs will be automatically generated. (Alternatively, the <note> element may be directly inserted in the text at the place of attachment, in which case the <ptr> element may be omitted.) Mark the point at which each paragraph in the edited text begins with a <pp> element, specifying the paragraph identifier in the form Pn as the value of its target attribute. For example, enter <pp target="P2"/> at the point in the transcript where paragraph P2 begins in the edited text. |