QA catalogue for analysing library data

British Library     last data update: 2020-11-26     number of records: 18,787,911

Thompson—Traill completeness

These scores are the implementation of the following paper:

Kelly Thompson and Stacie Traill (2017) Implementation of the scoring algorithm described in Leveraging Python to improve ebook metadata selection, ingest, and management, Code4Lib Journal, Issue 38, 2017-10-18. http://journal.code4lib.org/articles/12828

Their approach to calculate the quality of ebook records comming from different data sources.

histogram

  • y axis: number of records
  • x axis: total score of a record
Each record get a score based on a number of criteria. Each criteria results in a positive score. The final score is the summary of these criteria scores.
Record Element MARC field/position/subfield How counted
1. ISBN 020 1 point for each occurrence of field
2. Authors 100, 110, 111 1 point for each occurrence of field(s)
3. Alternative Titles 246 1 point for each occurrence of field
4. Edition 250 1 point for each occurrence of field
5. Contributors 700, 710, 711, 720 1 point for each occurrence of field(s)
6. Series 440, 490, 800, 810, 830 1 point for each occurrence of field(s)
7. Table of Contents and Abstract 505, 520 2 points if both fields exist; 1 point if either field exists
8. Date (MARC 008) 008/7-10 1 point if valid coded date exists
9. Date (MARC 26X) 260$c or 264$c 1 point if 4-digit date exists; 1 point if matches 008 date.
10. LC/NLM Classification 050, 060, 090 1 point if any field exists
11. Subject Headings: Library of Congress 600, 610, 611, 630, 650, 651 second indicator 0 1 point for each field up to 10 total points
12. Subject Headings: MeSH 600, 610, 611, 630, 650, 651 second indicator 2 1 point for each field up to 10 total points
13. Subject Headings: FAST 600, 610, 611, 630, 650, 651 second indicator 7, $2 fast 1 point for each field up to 10 total points
14. Subject Headings: GND
(This was not part of the original algorithm)
600, 610, 611, 630, 650, 651 second indicator 7, $2 fast 1 point for each field up to 10 total points
15. Subject Headings: Other 600, 610, 611, 630, 650, 651, 653 if above criteria are not met 1 point for each field up to 5 total points
16. Description 008/23=o and 300$a “online resource” 2 points if both elements exist; 1 point if either exists
17. Language of Resource 008/35-37 1 point if likely language code exists
18. Country of Publication Code 008/15-17 1 point if likely country code exists
19. Language of Cataloging 040$b 1 point if either no language is specified, or if English is specified
20. Descriptive cataloging standard 040$e 1 point if value is “rda”

components

The histograms of the individual components:

2. ISBN

3. Authors

4. Alternative Titles

5. Edition

6. Contributors

7. Series

8. Table of Contents and Abstract

9. Date 008

10. Date 26X

11. LC/NLM Classification

12. Subject Headings: Library of Congress

13. Subject Headings: Mesh

14. Subject Headings: Fast

15. Subject Headings: GND

16. Subject Headings: Other

17. Online

18. Language of Resource

19. Country of Publication

20. Language of Cataloging

21. Descriptive cataloging standard is RDA