Frequently Asked Questions
- What are the different dates that I can use to search for texts?
- How do I download a file rather then opening it in my browser?
- What is the OTA?
- What is the repository?
- Where is the OTA?
- When did the OTA start?
- Is the OTA part of the University of Oxford?
- Is the OTA part of the Oxford University Press?
- Is the OTA part of CLARIN?
- Why can’t I log in?
- What submissions do we accept?
- Do I need to create an account to download and/or make a submission?
- I see an error logging in.
- Why should I submit my data into your repository?
- What is the PID (handle) good for?
- What is the procedure for depositing and archiving data?
- What if I want or need to update the archived data?
- What if I want to withdraw the resources in the future? Can I delete the data?
- How should I cite resources?
- How safe is my data if I store it with you?
- What licence should I pick for my data?
- Where can I find more information about supported licences?
- How do I get the most of my searches?
What are the different dates that I can use to search for texts?
Date of publication
This represents, as far as possible, the date when the content of the resource was created. So, for digitized versions of printed works, this is usually the date of original publication. The value of this element is taken from the “dc.date.created” element, which can be seen in the “full item record” view.
Date of digitization
The date of the creation of the digital resource. If it is born digital, then this is the same as the date of publication, but more often it will be a later date. For legacy resources, where we don’t have sufficient information about exactly when a resource was created, the date when it was deposited in the OTA is used in this field, and its value should be interpreted as "created no later than". The value of this element is taken from the “dc.date.issued” element, which can be seen in the “full item record” view.
This is based on the date of publication, and gives a coarse-grained date range for the resource, for the purposes of grouping resources into larger periods of time. Where the date of publication of a resource is a range, the earlier date is used to assign a date range value. The value of this element is taken from the “otaterms.date.range” element, which can be seen in the “full item record” view.
How do I download a file rather than opening it in my browser?
Different browsers will behave differently when it comes to downloading or opening different file types. To make sure that a file is downloaded to your machine, right-click on the “Download file” button and select the option to save the file. Or, if you click on the darker blue “Download all local files for this item” button, then that should always download them all in a zip file.
What is the OTA?
The Oxford Text Archive (OTA) is a repository of digital literary and linguistic resources for research and teaching. The OTA was created at and remains a part of the University of Oxford. We also offer advice to resource creators about best practice for creating digital resources, and to users of digital resources on how to benefit from existing resources.
What is the “repository”?
It is like a library for digital texts, as well as for some other types of literary and linguistic data. It’s an open, online location where people can search for texts and easily download them. It’s a place where texts can be stored safely and shared with others. The goals are to make it easier to find and use the texts, and to make sure that they remain available and usable for a long time into the future.
Where is the OTA?
The OTA is part of the Bodleian Libraries at the University of Oxford. The staff responsible for the OTA are physically located at Osney One Building in Oxford.
When did the OTA start?
The OTA was founded by Lou Burnard and Susan Hockey in 1976. We celebrated our 30th birthday with a number of events in 2006, and our fortieth in 2016.
Is the OTA part of the University of Oxford?
Is the OTA part of the Oxford University Press?
Is the OTA part of CLARIN?
Yes. The OTA is one of the key centres that originally planned and started CLARIN, the European Research Infrastructure Consortium for language resources and technologies. The OTA collections can be found via the Virtual Language Observatory, and the OTA is involved in a number of initiatives to share resources via CLARIN. The OTA is a registered CLARIN C Centre, and the repository is based on the CLARIN DSpace platform.
Why can’t I log in?
This may be a problem for some users attempting to use some resources in the OTA. Resources marked as being for “Academic use” are only accessible to bona fide members of a university: users must log in via their institution to demonstrate their credentials. (The restrictions are imposed by those who created and deposited the resources in question.)
Access will be granted to log-ins from institutions in countries currently signed up to the CLARIN Federation, or to institutions that have signed up to eduGAIN. If your institution doesn’t show up in the list, you’ll need to ask someone (probably in your institutional library) to register your institution.
What submissions do we accept?
**November 2021 — Please refer to the notice above**
We accept high-quality digital texts and related resources: full text, corpora, lexicons, etc. In the case of collections of letters and other forms of correspondance, we work closely with the Electronic Enlightenment to ensure that the texts are easily found and linked to related collections.
When uploading language resources, please try to use one of the recommended formats mentioned in LRT Standards.
Do I need to create an account to download or make a submission?
- Download without restriction: data with a licence allowing for free sharing can be downloaded without restriction — just read the licence and download. This applies to all data with Creative Commons licencing and tools with open source licences.
- Download with licence restrictions: To download datasets that require you to sign a licence, you need to log in — if you are from the academic world in Europe, you probably don’t need a new account; just click "Login" and search for your academic institution. To sign in, you can use any account with an Identity Provider that is a member of EduGAIN federation. If you don’t have an academic account that works with us, please contact the OTA Help Desk.
I see an error message when I try to log in
If your institution is eligible for access but you have trouble logging in please contact the OTA Help Desk.
Occasionally (usually when you are the first one logging in using your home institution) you might see an error stating:
- The authentication was successful; however, your identity provider did not provide either your email, eppn nor targeted id.
This means your home institution did not send the OTA enough data about you to operate our service (probably to protect your personal data). We only require an email address to provide access, which they should provide as we follow the GÉANT Data Protection Code of Conduct.
If you have an account with multiple providers, and you login with different one each time, you might see error stating:
- Your email is already associated with a different user.
Please try to use the same provider each time. If that is not possible, contact the OTA Help Desk to request a change of your default address.
Why should I submit my data into your repository?
**November 2021 — Please refer to the notice above**
- It is free and safe.
- We respect your licence. We encourage open data, and believe it benefits not only users, but also the data providers.
- We also accept restricted access data, in which case we can require users to sign a licence, if that is what you need, before allowing data downloading.
- Deposited data is highly visible — for example via Google, VLO, DataCite, OLAC, Data Citation Index, arXive.org — giving you maximal exposure for your work.
- The data is easy to cite. We provide ready-to-use one-click citations in BibTex, RIS, and other popular reference formats. All the citations include permanent links created from persistent identifiers: we use handles for PIDs, and these PIDs are future-proof.
What is the PID (handle) good for?
It is a special permanent URL. It provides a permanent link that will resolve correctly even if in some distant future the data is moved: therefore, the PID should always be used as the URL of choice in citations.
How should I cite resources?
See our citation policies.
How safe is my data, if I store it with the OTA?
We constantly review our data preservation policies to ensure that all data are preserved for the long term. As well as the live copy:
- all data in the repository have an on-site backup copy;
- all data in the repository have another off-site copy.
What licence should I pick for my data/tool?
We encourage using a free and open licence. A representative selection of free licences (including Creative Commons licences appropriate for datasets) is available during submission.
Where can I find more information about supported licences?
See our list of currently Available licences. However, do not hesitate to Contact Us in case you need a specific licence not on the list. (Licences can be accompanied by various, additional requirements.)
How do I get the most out of my searches?
The search engine is SOLR, which uses “OR” as the default operator on multiple terms in a search (for more on SOLR syntax, see the SOLR documentation).
If you are not satisfied with the results of your searches, you might wish to go beyond plain-text searches. You may search only in certain fields, use negation, add score (emphasis) to some parts of the query and match more.
Examples of search queries
If searching on the two terms “national” and “corpus”: SOLR inserts an implicit “OR” between the terms; Google inserts an implicit “AND” between the terms:
- national corpus
- SOLR searches for all examples of “national” as well as all examples of “corpus” in all text fields; Google searches for all examples of “national corpus”.
- dc.title:B?C && -dc.title:corpus
- Returns all items having “B?C” in title — “?” stands for any character (eg. BNC) — and not having “corpus” in the title
- dc.title:"National Corpus"
- Use double quotes (") for exact matches and multiword expressions