Addressing comparability and retrieval issues in conversation corpora: A case study on the spoken British National Corpora (1994 and 2014), using the past perfect

Smith, Nicholas; Broccias, Cristiano; Waters, Cathleen

321-Article Text-2474-1-6-20240418.pdf (745.52 kB)

Addressing comparability and retrieval issues in conversation corpora: A case study on the spoken British National Corpora (1994 and 2014), using the past perfect

Version 2 2024-06-19, 10:58

Version 1 2024-04-25, 13:38

journal contribution

posted on 2024-06-19, 10:58 authored by Nicholas SmithNicholas Smith, Cristiano Broccias, Cathleen Waters

This paper addresses issues in comparison and analysis of conversation corpora. We focus on the demographically-sampled spoken portions of the British National Corpora (BNC), representing British English in 1994 and 2014, for the purposes of studying recent language change and sociolinguistic variation. Issues of comparability and representativeness of the two BNCs have been raised before (see Love 2020), with several measures taken to ensure backwards compatibility of the Spoken BNC2014 with its 1994 counterpart. However, we believe further considerations and solutions merit attention, relating to sampling, transcription, annotation, and corpus querying. The BNClab subcorpus (Brezina et al. 2018a), a sociolinguistic judgment sample derived from the parent BNCs, provides a very promising basis for analysis, although arguably its mixed geographical representativeness affects cross-time comparability. To address this, we make some proposals for modifying the BNClab subcorpus to improve comparability. Then, we use the modified sample to address issues in retrieval and quantification of grammatical constructions in the spoken BNCs, namely a) determining an appropriate frequency metric, b) retrieving a comprehensive but manageable set of examples from ‘messy’ spoken data, and c) handling transcription inaccuracies. Finally, we discuss the case study findings and wider methodological implications for users of these corpora.

History

Author affiliation

College of Social Sci Arts and Humanities/Education

Version

VoR (Version of Record)

Published in

Research in Corpus Linguistics

Publisher

Asociación Española de Lingüística de Corpus

eissn

2243-4712

Copyright date

2024

Available date

2024-06-19

Publisher DOI

http://doi.org/10.32714/ricl.12.02.05

Language

en

Publisher version

https://ricl.aelinco.es/index.php/ricl

Deposited by

Dr Nicholas Smith

Deposit date

2024-04-23

Rights Retention Statement

No

Usage metrics

Keywords

Uncategorised value

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Addressing comparability and retrieval issues in conversation corpora: A case study on the spoken British National Corpora (1994 and 2014), using the past perfect

History

Author affiliation

Version

Published in

Publisher

eissn

Copyright date

Available date

Publisher DOI

Language

Publisher version

Deposited by

Deposit date

Rights Retention Statement

Usage metrics

Categories

Keywords

Licence

Exports