Exploring library catalogues as data – a lesson for ProgrammingHistorian

This repository is to support writing activities of a lesson provisionally titled as "Exploring library catalogues as data". These activities belong to ProgramminHistorian's ENABLAR programme

ENABling Library and Archive participation in digital Research co-learning communities.

The repository will contain not only the text of the lesson and supplementary materials (images, tables etc.) but some data and scripts as well.

Planned topics of the tutorial

Methods/tools:

MarcEdit (Argula)
describing some important MARC data elements (Péter)
programming MARC records with Python [PyMarc] (and showing alternatives) (Péter)
small/local LLMs? (Michele)
Gephi for visualisation (Argula)

Data analyses:

extraction
- place and personal name extraction and normalisation (with roles) (Péter)
  - enriching subject indexing/ information (Argula) – adding additional information about named entities in records (Argula) – external data to match with smaller datasets: FAST, VIAF, etc. (Argula)
  - create a map (Péter)
- date extraction and normalisation (Péter)
  - create a timeline (Péter)
  - Consider using bibliographic data to visualise the history of early book editions from the time of their first printing (geomapping and timelines). (Arnoud)
- subjects
  - analysing subject coverage in a collection (Argula)
how to work across two datasets computationally (Argula)

Objectives, principles, general questions:

Assist catalogers/metadata librarians in their day-to-day work and specialized projects (Doreen)
How to help book historians in answering their questions? (Péter)
Copy cataloging versus original cataloging depending on the topic of the project (Halie)
Also being aware of local cataloging practices that may or may not impact the analysis or computational methods (Doreen)

Datasets:

a smaller library catalogue (e.g. Estonian nat. bibl.), or a part of a larger one (Belgian, German, Czech etc.) (Péter)
https://github.com/pkiraly/qa-catalogue#datasources
https://about.muse.jhu.edu/muse/open-access-marc (Doreen)
Could you pull based on a specific category in MARC or subject headings etc and then create around that. (Halie)

Open questions:

Question for Peter (and anyone else who is interested in contributing): Is the lesson goal to show researchers how to use marc records to answer their research questions? (I.e. need to explain how to navigate a set of marc records?) Starting from sample research questions then find datasets and methods to work with them? (I am reminded of this link Peter showed me: https://bibliodata.substack.com/p/an-outline-of-an-imagined-training) Or is it to select a small set of marc records and build a model or something that can be used for i.e. lesson 2 or 6 or linked sticky notes? (Doreen)
The focus is on training and fine-tuning a small LLM to make it an expert in dealing with bibliographic data. (Arnoud)

directory structure

data: the input files containing unmodified library records
data_output: output files of the analyses
fig_output: output images of the analyses
scripts: the scripts that contribute in data analysis
figures: the figures displayed by the lesson itself

(This structure follows the suggestion of the Library Carpentry's Introduction to R lesson.)

Some Important Reminders:

Tutorials should not exceed 8,000 words (including code).
Keep your tone formal but accessible.
Talk to your reader in the second person (you).
Adopt a widely-used version of English (British, Canadian, Indian, South African etc).
The piece of writing is a "tutorial" or a "lesson" and not an "article".
Adopt open source principles
Write for a global audience
Write sustainably

The full guide: Writing and Formatting a New Lesson.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
data_output		data_output
fig_output		fig_output
figures		figures
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
lesson.md		lesson.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring library catalogues as data – a lesson for ProgrammingHistorian

Planned topics of the tutorial

directory structure

Some Important Reminders:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Exploring library catalogues as data – a lesson for ProgrammingHistorian

Planned topics of the tutorial

directory structure

Some Important Reminders:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages