Skip to content

eriquechen23-cyl/text-semantic-codec

Repository files navigation

Text Semantic Codec

A lightweight prototype for text semantic communication: encode a sentence into a compact semantic code, transmit the code, and reconstruct a meaning-preserving sentence.

This first prototype intentionally uses interpretable Python rules instead of a trained neural network. The goal is to make the semantic communication pipeline measurable, debuggable, and ready for later model replacement.

Quick Demo

python scripts/run_demo.py --text "The meeting has been postponed because of the heavy rain." --mode discrete --semantic-tokens 4 --codebook-size 256

Expected shape:

Original sentence:
The meeting has been postponed because of the heavy rain.

Semantic code:
[46, 52, 191, 73]

Recovered sentence:
The meeting was delayed due to heavy rain.

Prototype Architecture

flowchart LR
    A["Input sentence"] --> B["Tokenizer"]
    B --> C["Semantic encoder"]
    C --> D["Semantic frame<br/>concepts + flags"]
    D --> E{"Bottleneck mode"}
    E --> F["Discrete codebook<br/>compact token IDs"]
    E --> G["Continuous vector<br/>fixed dimensions"]
    F --> H["Semantic code"]
    G --> H
    H --> I["Semantic decoder"]
    I --> J["Recovered sentence"]
    J --> K["Evaluation metrics"]
    A --> K

    K --> L["Similarity"]
    K --> M["Compression ratio"]
    K --> N["Semantic efficiency"]
Loading

Core Method

The core method is semantic-first transmission:

  1. Extract meaning-bearing concepts from the source text.
  2. Preserve critical semantic flags such as time, quantity, location, intent, and negation.
  3. Compress the semantic frame through a bottleneck.
  4. Transmit compact semantic code instead of the original sentence.
  5. Decode the compact code into a meaning-preserving recovered sentence.
  6. Evaluate whether the receiver still understands the intended message.

The current prototype is intentionally interpretable and rule-based. It is a baseline for deciding when to replace each block with learned embeddings, vector quantization, or channel simulation.

Block Role
Tokenizer Normalize text into tokens.
Semantic encoder Extract concepts and flags.
Bottleneck Compress meaning into discrete IDs or vectors.
Semantic decoder Reconstruct a meaning-preserving sentence.
Metrics Measure similarity, compression, and efficiency.

See docs/core-method.md for the full DOCUMENT explanation.

What This Version Proves

  • Text can be converted into compact semantic codes.
  • Reconstruction can preserve meaning without exact wording.
  • BLEU and exact match are insufficient alone for semantic communication.
  • Sentence-level semantic similarity and compression ratio are better decision metrics for the next stage.

Current Limitations

  • The encoder and decoder are rule-based.
  • English text is supported first.
  • The semantic vocabulary is intentionally small.
  • Wireless channel simulation is not included yet.

Next Research Stages

  1. Replace rule-based encoder with sentence embeddings.
  2. Add continuous bottleneck experiments across dimensions.
  3. Add learned or clustered discrete codebooks.
  4. Add semantic error test cases for time, negation, quantity, location, entity, and intent.
  5. Add noisy channel simulation.

See reports/stage-01-text-prototype/report.html for the first decision report.

Web App Prototype

The repository now includes a deployable web prototype:

  • backend/: FastAPI API for text semantic conversion.
  • frontend/: Angular 21 standalone UI.
  • render.yaml: Render Blueprint with one Python web service and one static site.

Backend Local Run

pip install -r backend/requirements.txt
uvicorn backend.main:app --reload

API endpoint:

POST /api/semantic/convert

Frontend Local Run

Angular 21 requires Node 20+.

cd frontend
npm install
npm start

The frontend calls the deployed Render backend URL from application code and does not show the backend API field in the user interface.

Render Deployment

Create a Render Blueprint from this GitHub repo. Render will read render.yaml and create:

  • text-semantic-codec-api

After deployment, set the API service ALLOWED_ORIGINS value to the final Vercel frontend URL instead of * for production use.

Current backend deployment:

https://text-semantic-codec-api.onrender.com

Health check:

https://text-semantic-codec-api.onrender.com/health

Vercel Frontend Deployment

The Angular UI is configured for Vercel with frontend/vercel.json.

Current production deployment:

https://frontend-coral-psi-78.vercel.app

See reports/stage-02-render-angular/report.html for the deployment decision report.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors