Text Semantic Codec

A lightweight prototype for text semantic communication: encode a sentence into a compact semantic code, transmit the code, and reconstruct a meaning-preserving sentence.

This first prototype intentionally uses interpretable Python rules instead of a trained neural network. The goal is to make the semantic communication pipeline measurable, debuggable, and ready for later model replacement.

Quick Demo

python scripts/run_demo.py --text "The meeting has been postponed because of the heavy rain." --mode discrete --semantic-tokens 4 --codebook-size 256

Expected shape:

Original sentence:
The meeting has been postponed because of the heavy rain.

Semantic code:
[46, 52, 191, 73]

Recovered sentence:
The meeting was delayed due to heavy rain.

Prototype Architecture

flowchart LR
    A["Input sentence"] --> B["Tokenizer"]
    B --> C["Semantic encoder"]
    C --> D["Semantic frame<br/>concepts + flags"]
    D --> E{"Bottleneck mode"}
    E --> F["Discrete codebook<br/>compact token IDs"]
    E --> G["Continuous vector<br/>fixed dimensions"]
    F --> H["Semantic code"]
    G --> H
    H --> I["Semantic decoder"]
    I --> J["Recovered sentence"]
    J --> K["Evaluation metrics"]
    A --> K

    K --> L["Similarity"]
    K --> M["Compression ratio"]
    K --> N["Semantic efficiency"]

Core Method

The core method is semantic-first transmission:

Extract meaning-bearing concepts from the source text.
Preserve critical semantic flags such as time, quantity, location, intent, and negation.
Compress the semantic frame through a bottleneck.
Transmit compact semantic code instead of the original sentence.
Decode the compact code into a meaning-preserving recovered sentence.
Evaluate whether the receiver still understands the intended message.

The current prototype is intentionally interpretable and rule-based. It is a baseline for deciding when to replace each block with learned embeddings, vector quantization, or channel simulation.

Block	Role
Tokenizer	Normalize text into tokens.
Semantic encoder	Extract concepts and flags.
Bottleneck	Compress meaning into discrete IDs or vectors.
Semantic decoder	Reconstruct a meaning-preserving sentence.
Metrics	Measure similarity, compression, and efficiency.

See docs/core-method.md for the full DOCUMENT explanation.

What This Version Proves

Text can be converted into compact semantic codes.
Reconstruction can preserve meaning without exact wording.
BLEU and exact match are insufficient alone for semantic communication.
Sentence-level semantic similarity and compression ratio are better decision metrics for the next stage.

Current Limitations

The encoder and decoder are rule-based.
English text is supported first.
The semantic vocabulary is intentionally small.
Wireless channel simulation is not included yet.

Next Research Stages

Replace rule-based encoder with sentence embeddings.
Add continuous bottleneck experiments across dimensions.
Add learned or clustered discrete codebooks.
Add semantic error test cases for time, negation, quantity, location, entity, and intent.
Add noisy channel simulation.

See reports/stage-01-text-prototype/report.html for the first decision report.

Web App Prototype

The repository now includes a deployable web prototype:

backend/: FastAPI API for text semantic conversion.
frontend/: Angular 21 standalone UI.
render.yaml: Render Blueprint with one Python web service and one static site.

Backend Local Run

pip install -r backend/requirements.txt
uvicorn backend.main:app --reload

API endpoint:

POST /api/semantic/convert

Frontend Local Run

Angular 21 requires Node 20+.

cd frontend
npm install
npm start

The frontend calls the deployed Render backend URL from application code and does not show the backend API field in the user interface.

Render Deployment

Create a Render Blueprint from this GitHub repo. Render will read render.yaml and create:

text-semantic-codec-api

After deployment, set the API service ALLOWED_ORIGINS value to the final Vercel frontend URL instead of * for production use.

Current backend deployment:

https://text-semantic-codec-api.onrender.com

Health check:

https://text-semantic-codec-api.onrender.com/health

Vercel Frontend Deployment

The Angular UI is configured for Vercel with frontend/vercel.json.

Current production deployment:

https://frontend-coral-psi-78.vercel.app

See reports/stage-02-render-angular/report.html for the deployment decision report.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
backend		backend
configs		configs
data		data
docs		docs
frontend		frontend
reports		reports
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CODEX.md		CODEX.md
README.md		README.md
pyproject.toml		pyproject.toml
render.yaml		render.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Semantic Codec

Quick Demo

Prototype Architecture

Core Method

What This Version Proves

Current Limitations

Next Research Stages

Web App Prototype

Backend Local Run

Frontend Local Run

Render Deployment

Vercel Frontend Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Semantic Codec

Quick Demo

Prototype Architecture

Core Method

What This Version Proves

Current Limitations

Next Research Stages

Web App Prototype

Backend Local Run

Frontend Local Run

Render Deployment

Vercel Frontend Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages