pausanias

Digital humanities tools for manipulating the text of Ἑλλάδος Περιήγησις

Tooling

All the programs use Python. Lots of digital humanities folks run into trouble with environments and dependencies, so I've made sure everything works nicely with uv. Download uv from here: https://github.com/astral-sh/uv (it's one command, so it's quick, and it won't disrupt any other installation you might have).

The first time you run a uv command it will output something like this:

Using CPython 3.11.6 interpreter at: /Users/gregb/anaconda3/bin/python3.11
Creating virtual environment at: .venv

Data Loading

uv run pausanias_importer.py description_of_greece.txt

This should respond with

Successfully imported 3170 passages into PostgreSQL

Daily

I didn't have enough token allocation to run the whole corpus in one go, so I broke it up into smaller chunks. Schedule cronscript.sh (and alter the --stop parameter smaller if you have less allocation than me, or increase it if you don't mind spending money).

Manual stop words

Some words that are really proper nouns might slip past the automated extractor. To make sure they don't influence the TF‑IDF model, you can add them to a manual_stopwords table in the database:

psql "$PAUSANIAS_DATABASE_URL" -c "INSERT INTO manual_stopwords(word) VALUES ('Athens') ON CONFLICT DO NOTHING;"

When find_predictors.py runs it combines these entries with the proper noun list and uses the union as stop words for the mythicness model.

For skepticism-specific exclusions, use manual_skepticism_stopwords instead. These entries are applied only to the passage- and sentence-level skepticism models, so they will not affect mythicness:

psql "$PAUSANIAS_DATABASE_URL" -c "INSERT INTO manual_skepticism_stopwords(word) VALUES ('δοκεῖν') ON CONFLICT DO NOTHING;"

Proper noun spelling checks

The live PostgreSQL database is on raksasa, so local checking usually needs an SSH tunnel:

ssh -N -L 6543:/var/run/postgresql/.s.PGSQL.5432 raksasa

Then run the spelling checker against the tunnel:

uv run check_proper_noun_spellings.py \
  --database-url "host=127.0.0.1 port=6543 dbname=pausanias user=gregb"

The checker stores reviewed spelling policies in proper_noun_spelling_policies and scan results in proper_noun_spelling_findings. Use --apply to replace deprecated variants in completed translation text, sentence text, and passage summaries.

To import the review report as policies, apply corrections, and keep the proper noun registry in step with the selected spellings:

uv run check_proper_noun_spellings.py \
  --database-url "host=127.0.0.1 port=6543 dbname=pausanias user=gregb" \
  --import-review-tsv tmp/proper_noun_spelling_review.tsv \
  --apply \
  --sync-registry \
  --sync-derived-name-spellings

The review importer chooses the dominant completed-prose spelling for each entity, with a small set of explicit overrides where a base name and compound name would otherwise fight each other.

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.github		.github
database		database
documentation		documentation
graphic_book		graphic_book
tests		tests
website		website
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
add_proper_nouns_to_stopwords.py		add_proper_nouns_to_stopwords.py
analyse_noun_network.py		analyse_noun_network.py
build_graphic_book.py		build_graphic_book.py
check_proper_noun_spellings.py		check_proper_noun_spellings.py
create_website.py		create_website.py
cronscript.sh		cronscript.sh
description_of_greece.txt		description_of_greece.txt
extract_proper_nouns.py		extract_proper_nouns.py
find_predictors.py		find_predictors.py
find_sentence_predictors.py		find_sentence_predictors.py
generate_latex_book.py		generate_latex_book.py
import_manual_sentence_tags.py		import_manual_sentence_tags.py
lemma_text.py		lemma_text.py
link_wikidata.py		link_wikidata.py
migrate_sqlite_to_postgres.py		migrate_sqlite_to_postgres.py
mythic_sceptic_analyser.py		mythic_sceptic_analyser.py
notes.txt		notes.txt
pausanias.sqlite		pausanias.sqlite
pausanias_db.py		pausanias_db.py
pausanias_importer.py		pausanias_importer.py
phrase_translator.py		phrase_translator.py
pyproject.toml		pyproject.toml
sentence_lemmatizer.py		sentence_lemmatizer.py
sentence_mythic_sceptic_analyser.py		sentence_mythic_sceptic_analyser.py
sentence_tag_batch.py		sentence_tag_batch.py
sentence_tagging_daily.sh		sentence_tagging_daily.sh
split_sentences.py		split_sentences.py
stats_utils.py		stats_utils.py
summarise_passages.py		summarise_passages.py
sync_graphic_book_images.sh		sync_graphic_book_images.sh
to-do.txt		to-do.txt
translate_pausanias.py		translate_pausanias.py
uv.lock		uv.lock
word_lemmatizer.py		word_lemmatizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pausanias

Tooling

Data Loading

Daily

Manual stop words

Proper noun spelling checks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pausanias

Tooling

Data Loading

Daily

Manual stop words

Proper noun spelling checks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages