University of Leicester
2 files

Speaker sample and metadata for BNClab-M subcorpus (a modified version of the BNClab subcorpus)

posted on 2024-05-13, 09:31 authored by Nicholas SmithNicholas Smith, Cristiano Broccias, Cathleen Waters

This file lists speakers and their characteristics in what we call the BNClab-M subcorpus. This subcorpus is a modified version of the BNClab subcorpus, created at Lancaster University (Brezina et al. 2018) as a sociolinguistically balanced and comparable subset from two demographically-sampled conversation corpora - the British National Corpus of 1994 (Burnard 2007) and the British National Corpus of 2014 (Love et al. 2017). 

The speaker sample in BNClab-M largely follows that in BNClab, but has been modified in an attempt to improve cross-time comparability. Modifications include i) restriction to speakers from England only, and to speakers designated as either working class or middle class; ii) reassignment of the social class of some speakers; and iii) addition of a few speakers from the parent BNCs that were not included in BNClab.

Speaker classifications are by year (1994 or 2014), gender (female or male), age (18-44 or 45-and-over), region (five regions of England), and social class (working class or middle class). 

We gratefully acknowledge permission to use content from the Demographic Spoken component of BNC1994, owned by the British National Corpus Consortium, and content from the Spoken BNC2014, owned by Cambridge University Press, in accordance with their respective licences ( and
A research article (Smith et al. forthcoming) describing the design of the BNClab-M sample, and the rationale for modifications to the original BNClab sample, is to be published in the journal Research in Corpus Linguistics in December 2024. 


Brezina, Vaclav, Dana Gablasova and Susan Reichelt. 2018. BNClab. Lancaster University. (10 April, 2024.)
Burnard, Lou ed. 2007. Reference Guide for the British National Corpus (XML Edition). Oxford University. (10 April, 2023.)
Love, Robbie, Claire Dembry, Andrew Hardie, Vaclav Brezina and Tony McEnery. 2017. The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics 22(3): 319–344.
Smith, Nicholas, Broccias, Cristiano and Waters, Cathleen. 2024. Addressing comparability and retrieval issues in conversation corpora: A case study on the spoken British National Corpora (1994 and 2014), using the past perfect. Research in Corpus Linguistics, 12(2): 80–110.