Skip to content

solresol/pausanias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

146 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pausanias

Digital humanities tools for manipulating the text of Ἑλλάδος Περιήγησις

Tooling

All the programs use Python. Lots of digital humanities folks run into trouble with environments and dependencies, so I've made sure everything works nicely with uv. Download uv from here: https://github.com/astral-sh/uv (it's one command, so it's quick, and it won't disrupt any other installation you might have).

The first time you run a uv command it will output something like this:

Using CPython 3.11.6 interpreter at: /Users/gregb/anaconda3/bin/python3.11
Creating virtual environment at: .venv

Data Loading

uv run pausanias_importer.py description_of_greece.txt

This should respond with

Successfully imported 3170 passages into PostgreSQL

Daily

I didn't have enough token allocation to run the whole corpus in one go, so I broke it up into smaller chunks. Schedule cronscript.sh (and alter the --stop parameter smaller if you have less allocation than me, or increase it if you don't mind spending money).

Manual stop words

Some words that are really proper nouns might slip past the automated extractor. To make sure they don't influence the TF‑IDF model, you can add them to a manual_stopwords table in the database:

psql "$PAUSANIAS_DATABASE_URL" -c "INSERT INTO manual_stopwords(word) VALUES ('Athens') ON CONFLICT DO NOTHING;"

When find_predictors.py runs it combines these entries with the proper noun list and uses the union as stop words for the mythicness model.

For skepticism-specific exclusions, use manual_skepticism_stopwords instead. These entries are applied only to the passage- and sentence-level skepticism models, so they will not affect mythicness:

psql "$PAUSANIAS_DATABASE_URL" -c "INSERT INTO manual_skepticism_stopwords(word) VALUES ('δοκεῖν') ON CONFLICT DO NOTHING;"

Proper noun spelling checks

The live PostgreSQL database is on raksasa, so local checking usually needs an SSH tunnel:

ssh -N -L 6543:/var/run/postgresql/.s.PGSQL.5432 raksasa

Then run the spelling checker against the tunnel:

uv run check_proper_noun_spellings.py \
  --database-url "host=127.0.0.1 port=6543 dbname=pausanias user=gregb"

The checker stores reviewed spelling policies in proper_noun_spelling_policies and scan results in proper_noun_spelling_findings. Use --apply to replace deprecated variants in completed translation text, sentence text, and passage summaries.

To import the review report as policies, apply corrections, and keep the proper noun registry in step with the selected spellings:

uv run check_proper_noun_spellings.py \
  --database-url "host=127.0.0.1 port=6543 dbname=pausanias user=gregb" \
  --import-review-tsv tmp/proper_noun_spelling_review.tsv \
  --apply \
  --sync-registry \
  --sync-derived-name-spellings

The review importer chooses the dominant completed-prose spelling for each entity, with a small set of explicit overrides where a base name and compound name would otherwise fight each other.

About

Digital humanities tools for manipulating the text of Ἑλλάδος Περιήγησις

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors