British Library last data update: 2020-11-26 number of records: 18,787,911
These scores are the implementation of the following paper:
Kelly Thompson and Stacie Traill (2017) Implementation of the scoring algorithm described in Leveraging Python to improve ebook metadata selection, ingest, and management, Code4Lib Journal, Issue 38, 2017-10-18. http://journal.code4lib.org/articles/12828
Their approach to calculate the quality of ebook records comming from different data sources.
Record Element | MARC field/position/subfield | How counted |
---|---|---|
1. ISBN | 020 | 1 point for each occurrence of field |
2. Authors | 100, 110, 111 | 1 point for each occurrence of field(s) |
3. Alternative Titles | 246 | 1 point for each occurrence of field |
4. Edition | 250 | 1 point for each occurrence of field |
5. Contributors | 700, 710, 711, 720 | 1 point for each occurrence of field(s) |
6. Series | 440, 490, 800, 810, 830 | 1 point for each occurrence of field(s) |
7. Table of Contents and Abstract | 505, 520 | 2 points if both fields exist; 1 point if either field exists |
8. Date (MARC 008) | 008/7-10 | 1 point if valid coded date exists |
9. Date (MARC 26X) | 260$c or 264$c | 1 point if 4-digit date exists; 1 point if matches 008 date. |
10. LC/NLM Classification | 050, 060, 090 | 1 point if any field exists |
11. Subject Headings: Library of Congress | 600, 610, 611, 630, 650, 651 second indicator 0 | 1 point for each field up to 10 total points |
12. Subject Headings: MeSH | 600, 610, 611, 630, 650, 651 second indicator 2 | 1 point for each field up to 10 total points |
13. Subject Headings: FAST | 600, 610, 611, 630, 650, 651 second indicator 7, $2 fast | 1 point for each field up to 10 total points |
14. Subject Headings: GND (This was not part of the original algorithm) |
600, 610, 611, 630, 650, 651 second indicator 7, $2 fast | 1 point for each field up to 10 total points |
15. Subject Headings: Other | 600, 610, 611, 630, 650, 651, 653 if above criteria are not met | 1 point for each field up to 5 total points |
16. Description | 008/23=o and 300$a “online resource” | 2 points if both elements exist; 1 point if either exists |
17. Language of Resource | 008/35-37 | 1 point if likely language code exists |
18. Country of Publication Code | 008/15-17 | 1 point if likely country code exists |
19. Language of Cataloging | 040$b | 1 point if either no language is specified, or if English is specified |
20. Descriptive cataloging standard | 040$e | 1 point if value is “rda” |
The histograms of the individual components:
2. ISBN |
3. Authors |
|
4. Alternative Titles |
5. Edition |
6. Contributors |
7. Series |
8. Table of Contents and Abstract |
9. Date 008 |
10. Date 26X |
11. LC/NLM Classification |
12. Subject Headings: Library of Congress |
13. Subject Headings: Mesh |
14. Subject Headings: Fast |
15. Subject Headings: GND |
16. Subject Headings: Other |
17. Online |
18. Language of Resource |
19. Country of Publication |
20. Language of Cataloging |
21. Descriptive cataloging standard is RDA |