Search

Selected Filters

Date range : 2000-present Clear All

Filters

Use filters to refine the search results.


Current Filters:

New Filters:

Showing 1 to 10 out of 24 results

Corpus

Oxford Text Archive Core Collection

CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes – the National Corpus of Contemporary Welsh

Date of publication:
2020

Author(s):

Dawn Knight

Description:

The CorCenCC corpus contains over 11 million words (circa 14.4m tokens) from written, spoken and electronic (online, digital texts) Welsh language sources, taken from a range of genres, language varieties (regional and ...

This item contains 1 file (49.41 KB).

Publicly Available
Corpus

Oxford Text Archive Core Collection

Sheffield Corpus of Chinese

Date of publication:
2004

Author(s):

Hu, Xiaoling ; Williamson, Nigel ; McLaughlin, Jamie

Description:

Mode of access: Online. OTA website The rudimentary form of the Sheffield Corpus of Chinese contains a limited body of representative texts from Medieval (MedC) and Modern Chinese (ModC) periods. They are of two text types: ...

This item contains 2 files (145.39 KB).

Publicly Available
Corpus

Oxford Text Archive Core Collection

The Lancaster Corpus of Mandarin Chinese

Date of publication:
2004

Author(s):

Unknown author

Description:

The Lancaster Corpus of Mandarin Chinese (LCMC) is designed as a Chinese match for the FLOB and FROWN corpora for modern British and American English. The corpus is suitable for use in both monolingual research into modern ...

This item contains 2 files (6.34 MB).

Publicly Available
CollectionSound

Oxford Text Archive Core Collection

Arabic Speech Corpus

Date of publication:
2015

Author(s):

Nawar Halabi

Description:

The resource is a speech corpus, with digital audio files, text transcripts, and files containing time stamps of the phoneme boundaries. 1813 .wav files containing spoken utterances. ...

This item contains 4 files (1.98 MB).

Publicly Available
Corpus

Oxford Text Archive Core Collection

The Chambers-Rostand Corpus of Journalistic French

Date of publication:
2002-2004

Author(s):

Chambers, Angela ; Rostand, Séverine ; University of Limerick, Ireland

Description:

Mode of access: Online. Application to OTA This corpus contains 979,831 words, made up of 1723 articles taken from three daily French newspapers: Le Monde (576 articles ...

This item contains 2 files (3.35 MB).

Publicly Available
Corpus

Oxford Text Archive Core Collection

The Emille Corpus (Beta Release Version)

Date of publication:
2003

Author(s):

Unknown author

Description:

The collection consists of: Thirty million words of monolingual written data (Gujarati, Tamil, Hindi, Punjabi-news website articles); 600,000 words of monolingual spoken data (Hindi, Urdu, Punjabi, Bengali, Gujarati-radio ...

This item contains 10 files (108.26 MB).

Publicly Available
Corpus

Oxford Text Archive Core Collection

British Academic Written English Corpus

Date of publication:
2004

Author(s):

Nesi, Hilary ; Gardner, Sheena ; Thompson, Paul ; Wickens, Paul

Description:

The BAWE corpus contains 2761 pieces of proficient assessed student writing, ranging in length from about 500 words to about 5000 words. Holdings are fairly evenly distributed across four broad disciplinary ...

This item contains 2 files (107.9 MB).

Publicly Available
Corpus

Oxford Text Archive Core Collection

VOICE: Vienna-Oxford International Corpus of English

Date of publication:
2001-2009

Author(s):

Barbara Seidlhofer ; Angelika Breiteneder ; Theresa Klimpfinger ; Stefan Majewski ; Ruth Osimk-Teasdale (POS-tagged versions) ; Marie-Luise Pitzl ; Michael Radeka (POS-tagged versions)

Description:

The download now also includes an updated version of VOICE XML (VOICE 2.0 XML) and a part-of-speech tagged and lemmatized version of VOICE (VOICE POS XML). The primary language of the corpus is English as a lingua franca, ...

This item contains 2 files (48.06 MB).
CollectionSoundCollectionText

Oxford Text Archive Core Collection

The Lancaster Speech, Writing and Thought Presentation Spoken Corpus

Date of publication:
2001

Author(s):

Short, Mick ; Semino, Elena ; McEnery, Tony ; Heywood, John ; McIntyre, Dan

Description:

The four major objectives of the project were: i) to establish an electronic corpus of (a) conversations, from the British National Corpus (BNC) and (b) oral narratives, from Lancaster's Centre for North Western Regional ...

This item contains 2 files (2.03 MB).
CollectionSound

Oxford Text Archive Core Collection

A Spoken Corpus of Cameroon Pidgin English: pilot study

Date of publication:
2014-2016

Author(s):

Dr. Melanie Green, University of Sussex ; Dr. Miriam Ayafor, University of Yaoundé I ; Dr. Gabriel Ozon, University of Sheffield

Description:

Publications based on the data include: Ayafor, Miriam and Melanie Green (2017). Cameroon Pidgin English: A comprehensive grammar [London Oriental and African Language Library 20]. Amsterdam: John ...

This item contains 2 files (1.42 MB).

Publicly Available