-
Corpus
Oxford Text Archive Core Collection
Date of publication:
2020
Description:
The CorCenCC corpus contains over 11 million words (circa 14.4m tokens) from written, spoken and electronic (online, digital texts) Welsh language sources, taken from a range of genres, language varieties (regional and ...
This item contains 1 file (49.41
KB).
Publicly Available
-
-
Corpus
Oxford Text Archive Core Collection
Date of publication:
2004
Description:
Mode of access: Online. OTA website The rudimentary form of the Sheffield Corpus of Chinese contains a limited body of representative texts from Medieval (MedC) and Modern Chinese (ModC) periods. They are of two text types: ...
This item contains 2 files (145.39
KB).
Publicly Available
-
-
Corpus
Oxford Text Archive Core Collection
Date of publication:
2004
Author(s):
Unknown author
Description:
The Lancaster Corpus of Mandarin Chinese (LCMC) is designed as a Chinese match for the FLOB and FROWN corpora for modern British and American English. The corpus is suitable for use in both monolingual research into modern ...
This item contains 2 files (6.34
MB).
Publicly Available
-
-
CollectionSound
Oxford Text Archive Core Collection
Date of publication:
2015
Description:
The resource is a speech corpus, with digital audio files, text transcripts, and files containing time stamps of the phoneme boundaries.
1813 .wav files containing spoken utterances.
...
This item contains 4 files (1.98
MB).
Publicly Available
-
-
Corpus
Oxford Text Archive Core Collection
Date of publication:
2002-2004
Description:
Mode of access: Online. Application to OTA
This corpus contains 979,831 words, made up of 1723 articles taken from three daily French newspapers:
Le Monde (576 articles ...
This item contains 2 files (3.35
MB).
Publicly Available
-
-
Corpus
Oxford Text Archive Core Collection
Date of publication:
2003
Author(s):
Unknown author
Description:
The collection consists of: Thirty million words of monolingual written data (Gujarati, Tamil, Hindi, Punjabi-news website articles); 600,000 words of monolingual spoken data (Hindi, Urdu, Punjabi, Bengali, Gujarati-radio ...
This item contains 10 files (108.26
MB).
Publicly Available
-
-
Corpus
Oxford Text Archive Core Collection
Date of publication:
2004
Description:
The BAWE corpus contains 2761 pieces of proficient assessed student
writing, ranging in length from about 500 words to about 5000 words. Holdings are fairly
evenly distributed across four broad disciplinary ...
This item contains 2 files (107.9
MB).
Publicly Available
-
-
Corpus
Oxford Text Archive Core Collection
Date of publication:
2001-2009
Description:
The download now also includes an updated version of VOICE XML (VOICE 2.0 XML) and a part-of-speech tagged and lemmatized version of VOICE (VOICE POS XML). The primary language of the corpus is English as a lingua franca, ...
This item contains 2 files (48.06
MB).
-
-
CollectionSoundCollectionText
Oxford Text Archive Core Collection
Date of publication:
2001
Description:
The four major objectives of the project were: i) to establish an electronic corpus of (a) conversations, from the British National Corpus (BNC) and (b) oral narratives, from Lancaster's Centre for North Western Regional ...
This item contains 2 files (2.03
MB).
-
-
CollectionSound
Oxford Text Archive Core Collection
Date of publication:
2014-2016
Description:
Publications based on the data include:
Ayafor, Miriam and Melanie Green (2017). Cameroon Pidgin English: A comprehensive grammar [London Oriental and African Language Library 20]. Amsterdam: John ...
This item contains 2 files (1.42
MB).
Publicly Available
-