University of Leicester
Browse
Kemp_2024_preproof.pdf (1.5 MB)

How do I know this Law corpus is reliable and valid? Using a representativeness argument for corpus validation

Download (1.5 MB)
journal contribution
posted on 2024-07-12, 14:21 authored by Jenny Kemp

Corpus findings are only useful if the corpus adequately represents the content and language of the target domain; yet few studies evaluate or report representativeness. This paper argues that corpus linguists should focus explicitly on the validation process. It introduces the innovative concept of a Representativeness Argument, which is an explicit statement of reliability and validity to enable defensible applications of a corpus for a specifically defined purpose and audience. Adapted from Toulmin's (1958/2003) argument model, its originality lies in its attention to both target domain and linguistic representativeness, and in the critical role played by expert judgements. To illustrate this approach, I present a representativeness argument for the 1.98-million-word ‘DSVC-IL’ corpus, which was compiled to investigate the discipline-specific vocabulary required for reading postgraduate International Law texts. The corpus is demonstrated to adequately represent target domain content, established by analysing modules and reading lists, and confirmed by experts. The language is shown to adequately reflect the domain through analysis of a 1026-flemma Single Word List, extracted using measures of frequency, keyness, range and evenness of distribution. List items are evenly-distributed in randomly-split corpus halves (rs=.98, p<.00). The list provides similar coverage of the DSVC-IL (26.37%) and other texts from the domain (23.87%). Moreover, Law experts confirmed the majority of list items were Law words. Together, the evidence supports the usefulness of the corpus and list for its explicitly defined purpose.

History

Author affiliation

College of Social Sci Arts and Humanities Education

Version

  • AM (Accepted Manuscript)

Published in

Applied Corpus Linguistics

Pagination

100099

Publisher

Elsevier BV

issn

2666-7991

Copyright date

2024

Available date

2024-07-12

Language

en

Deposited by

Dr Jenny Kemp

Deposit date

2024-07-11

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC