Digital humanities tools for manipulating the text of Ἑλλάδος Περιήγησις
All the programs use Python. Lots of digital humanities folks run into
trouble with environments and dependencies, so I've made sure
everything works nicely with uv. Download uv from here:
https://github.com/astral-sh/uv (it's one command, so it's quick, and
it won't disrupt any other installation you might have).
The first time you run a uv command it will output something like this:
Using CPython 3.11.6 interpreter at: /Users/gregb/anaconda3/bin/python3.11
Creating virtual environment at: .venv
uv run pausanias_importer.py description_of_greece.txt
This should respond with
Successfully imported 3170 passages into PostgreSQL
I didn't have enough token allocation to run the whole corpus in one go, so
I broke it up into smaller chunks. Schedule cronscript.sh (and alter the
--stop parameter smaller if you have less allocation than me, or increase
it if you don't mind spending money).
Some words that are really proper nouns might slip past the automated
extractor. To make sure they don't influence the TF‑IDF model, you can add
them to a manual_stopwords table in the database:
psql "$PAUSANIAS_DATABASE_URL" -c "INSERT INTO manual_stopwords(word) VALUES ('Athens') ON CONFLICT DO NOTHING;"When find_predictors.py runs it combines these entries with the proper
noun list and uses the union as stop words for the mythicness model.
For skepticism-specific exclusions, use manual_skepticism_stopwords
instead. These entries are applied only to the passage- and sentence-level
skepticism models, so they will not affect mythicness:
psql "$PAUSANIAS_DATABASE_URL" -c "INSERT INTO manual_skepticism_stopwords(word) VALUES ('δοκεῖν') ON CONFLICT DO NOTHING;"The live PostgreSQL database is on raksasa, so local checking usually needs an
SSH tunnel:
ssh -N -L 6543:/var/run/postgresql/.s.PGSQL.5432 raksasaThen run the spelling checker against the tunnel:
uv run check_proper_noun_spellings.py \
--database-url "host=127.0.0.1 port=6543 dbname=pausanias user=gregb"The checker stores reviewed spelling policies in
proper_noun_spelling_policies and scan results in
proper_noun_spelling_findings. Use --apply to replace deprecated variants in
completed translation text, sentence text, and passage summaries.
To import the review report as policies, apply corrections, and keep the proper noun registry in step with the selected spellings:
uv run check_proper_noun_spellings.py \
--database-url "host=127.0.0.1 port=6543 dbname=pausanias user=gregb" \
--import-review-tsv tmp/proper_noun_spelling_review.tsv \
--apply \
--sync-registry \
--sync-derived-name-spellingsThe review importer chooses the dominant completed-prose spelling for each entity, with a small set of explicit overrides where a base name and compound name would otherwise fight each other.