University of Leicester
Browse
- No file added yet -

A quantitative analysis of global gazetteers: Patterns of coverage for common feature types

Download (2.47 MB)
journal contribution
posted on 2017-05-12, 13:43 authored by Elise Acheson, Stefano De Sabbata, Ross Purves
Gazetteers are important tools used in a wide variety of workflows that depend on linking natural language text to geographical space. The spatial properties of these data sources, such as coverage, balance, and completeness, affect the performance of common tasks such as geoparsing and geocoding. However, little attention has focused on how these properties vary in global gazetteers, particularly across country boundaries and according to feature types. In this paper, we present a detailed investigation of the spatial properties of two open gazetteers with worldwide coverage: GeoNames, and the Getty Thesaurus of Geographic Names (TGN). Using point density maps, correlations, and linear regressions, we analyze the global spatial coverage of each data source for the full set of features and for top feature types: populated places, streams, mountains, and hills. Results show wide discrepancies in coverage between the two datasets, sharp changes in feature type coverage across country borders, and idiosyncratic patterns dominated by a few countries for the more sparsely covered natural features. As more and more systems rely on recognizing and grounding named places, these patterns can influence the analysis of growing amounts of online text content and reinforce or amplify existing inequalities.

History

Citation

Computers, Environment and Urban Systems, 2017, 64, pp. 309-320

Author affiliation

/Organisation/COLLEGE OF SCIENCE AND ENGINEERING/Department of Geography

Version

  • VoR (Version of Record)

Published in

Computers

Publisher

Elsevier

issn

0198-9715

Acceptance date

2017-03-13

Copyright date

2017

Available date

2017-05-12

Publisher version

http://www.sciencedirect.com/science/article/pii/S0198971516302496

Language

en

Usage metrics

    University of Leicester Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC