Skip to content

Quantum trajectory plot #1

@jeremymanning

Description

@jeremymanning

Let's take a book with long, engaging chapters (e.g., hyperion_djvu.txt). Divide it into chapters/sections (Prologue, Chapter 1, Chapter 2, ..., Chapter 6, Epilogue).

For each chapter (can be parallelized in different threads):

  • Use TinyLlama-1.1b to embed each token. Create a number-of-tokens by number-of-embedding-dimensions matrix (for this chapter).
  • For each of 100 "particles":
    • Project it forward by iteratively predicting next tokens (until we get to a stop token). If we run out of context, just slide the window forward to include only the last <length-of-context-window - 1> tokens.
    • Store the embeddings of the predicted token sequences (in a length number-of-particles list of number-of-tokens-for-that-particle by number-of-embedding-dimensions matrices)

Save everything as a pkl file (one per chapter).

Then, once all chapters' pkl files are saved out:

  • Concatenate all of the embedded tokens into an enormous total-number-of-tokens by number-of-embedding-dimensions matrices (across all chapters and particles)
  • Project into 2D using UMAP
  • Split the concatenated matrix back out into separate chapters/particles
  • Save out a pkl file with the 2D projections

Then make a plot like this (one panel per chapter):
Image

The blue lines are chapter trajectories. The red lines (projecting forward from the end of each chapter) are particles' predictions. The blue dots are the starts/ends of each chapter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions