Creating and Documenting Electronic Texts


Chapter 1: Introduction

1.1: Aims and organisation of this Guide

The aim of this Guide is to take users through the basic steps involved in creating and documenting an electronic text or similar digital resource. The notion of 'electronic text' is interpreted very broadly, and discussion is not limited to any particular discipline, genre, language or period — although where space permits, issues that are especially relevant to these areas may be drawn to the reader's attention.

The authors have tended to concentrate on those types of electronic text which, to a greater or lesser extent, represent a transcription (or, if you prefer, a 'rendition', or 'encoding') of a non-electronic source, rather than the category of electronic texts which are primarily composed of digitized images of a source text (e.g. digital facsimile editions). However, there are a growing number of electronic textual resources which support both these approaches; for example some projects involving the digitization of rare illuminated manuscripts combine high-quality digital images (for those scholars interested in the appearance of the source) with electronic text transcriptions (for those scholars concerned with analysing aspects of the content of the source). We would hope that the creators of every type of electronic textual resource will find something of interest in this short work, especially if they are newcomers to this area of intellectual and academic endeavour.

This Guide assumes that the creators of electronic texts have a number of common concerns. For example, that they wish their efforts to remain viable and usable in the long-term, and not to be unduly constrained by the limitations of current hardware and software. Similarly, that they wish others to be able to reuse their work, for the purposes of secondary analysis, extension, or adaptation. They also want the tools, techniques, and standards that they adopt to enable them to capture those aspects of any non-electronic sources which they consider to be significant — whilst at the same time being practical and cost-effective to implement.

The Guide is organised in a broadly linear fashion, following the sequence of actions and decisions which we would expect any electronic text creation project to undertake. Not every electronic text creator will need to consider every stage, but it may be useful to read the Guide through once, if only to establish the most appropriate course of action for one's own work.

1.2: What this Guide does not cover, and why

Creating and processing electronic texts was one of the earliest areas of computational activity, and has been going on for at least half a century. This Guide does not have any pretence to be a comprehensive introduction to this complex area of digital resource creation, but the authors have attempted to highlight some of the fundamental issues which will need to be addressed — particularly by anyone working within the community of arts and humanities researchers, teachers, and learners, who may never before have undertaken this kind of work.

Crucially, this Guide will not attempt to offer a comprehensive (or even a comparative) overview of the available hardware and software technologies which might form the basis of any electronic text creation project. This is largely because the development of new hardware and software continues at such a rapid pace that anything we might review or recommend here will probably have been superseded by the time this publication becomes available in printed form. Similarly, there would have been little point in providing detailed descriptions of how to combine particular encoding or markup schemes, metadata, and delivery systems, as the needs and abilities of the creators and (anticipated) users of an electronic text should be the major factors influencing its design, construction, and method of delivery.

Instead, the authors have attempted to identify and discuss the underlying issues and key concerns, thereby helping readers to begin to develop their own knowledge and understanding of the whole subject of electronic text creation and publication. When combined with an intimate knowledge of the non-electronic source material, readers should be able to decide for themselves which approach — and thus which combinations of hardware and software, techniques and design philosophy — will be most appropriate to their needs and the needs of any other prospective users.

Although every functional aspect of computers is based upon the distinctive binary divide evidenced between 1's and 0's, true and false, presence and absence, it is rarely so easy to draw such clear distinctions at the higher levels of creating and documenting electronic texts. Therefore, whilst reading this Guide it is important to remember that there are seldom 'right' or 'wrong' ways to prepare an electronic text, although certain decisions will crucially affect the usefulness and likely long-term viability of the final resource. Readers should not assume that any course of action recommended here will necessarily be the 'best' approach in any or all given circumstances; however everything the authors say is based upon our understanding of what constitutes good practice — and results from almost twenty-five years of experience running the Oxford Text Archive (

1.3: Opening questions — Who will read your text, why, and how?

There are some fundamental questions that will recur throughout this Guide, and all of them focus upon the intended readership (or users) of the electronic text that you are hoping to produce. For example, if your main reason for creating an electronic text is to provide the raw data for computer-assisted analysis — perhaps as part of an authorship attribution study — then completeness and accuracy of the data will probably be far more important than capturing the visual appearance of the source text. Conversely, if you are hoping to produce an electronic text that will have broad functionality and appeal, and the original source contains presentational features which might be considered worthy of note, then you should be attempting to create a very different object — perhaps one where visual fidelity is more important than the absolute accuracy of any transcription. In the former case, the implicit assumption is that no-one is likely to read the electronic text (data) from start to finish, whilst in the second case it is more likely that some readers may wish to use the electronic text as a digital surrogate for the original work. As the nature of the source(s) and/or the intended resource(s) becomes more complex — for example recording variant readings of a manuscript or discrepancies between different editions of the same printed text — the same fundamental questions remain.

The first chapter of this Guide looks at how you might start to address some of these questions, by subjecting your source(s) to a process that the creators of electronic texts have come to call 'Document Analysis'.

© The right of Alan Morrison, Michael Popham and Karen Wikander to be identified as the Authors of this Work has been asserted by them in accordance with the Copyright, Designs and Patents Act 1988. 

All material supplied via the Arts and Humanities Data Service is protected by copyright, and duplication or sale of all or part of any of it is not permitted, except that material may be duplicated by you for your personal research use or educational purposes in electronic or print form. Permission for any other use must be obtained from the Arts and Humanities Data Service Electronic or print copies may not be offered, whether for sale or otherwise, to any third party. 
Arts and Humanities Data Service 
A red line
Bibliography Next Back Glossary Contents