Skip to content

Agony5757/mahjong

pymahjong

PyPI version Python versions License Documentation Build Status

A Japanese Riichi Mahjong environment for decision AI research, featuring a high-performance C++ backend with Python bindings.

Features

  • Complete Rule Implementation - Full Japanese Riichi Mahjong rules including all standard yaku
  • Gymnasium Compatible - Ready-to-use single-agent environment with pretrained opponents
  • Multi-agent Support - 4-player environment for multi-agent research
  • Oracle Observation - Access to hidden information (opponents' hands) for oracle-guided learning
  • High Performance - C++ game engine with efficient Python bindings via pybind11
  • Cross-platform - Pre-built wheels for Linux, macOS, and Windows (Python 3.10-3.14)

Installation

Using uv (recommended):

uv venv && source .venv/bin/activate
uv pip install pymahjong

Or using pip:

pip install pymahjong

Verify the installation:

from pymahjong.test import test
test()

Quick Start

Single-Agent Environment

Play against 3 opponents with a gym-like interface:

import pymahjong
import numpy as np

env = pymahjong.SingleAgentMahjongEnv(opponent_agent="random")
obs = env.reset()

while True:
    valid_actions = env.get_valid_actions()
    action = np.random.choice(valid_actions)
    obs, reward, done, _ = env.step(action)
    
    if done:
        print(f"Game over! Payoff: {reward}")
        break

Multi-Agent Environment

4 agents compete in a full game:

import pymahjong
import numpy as np

env = pymahjong.MahjongEnv()

for game in range(10):
    env.reset()
    
    while not env.is_over():
        pid = env.get_curr_player_id()
        valid_actions = env.get_valid_actions()
        obs = env.get_obs(pid)
        
        action = np.random.choice(valid_actions)
        env.step(pid, action)
    
    print(f"Game {game}: payoffs = {env.get_payoffs()}")

Documentation

Full documentation is available at https://agony5757.github.io/mahjong/

Visualization (Web UI)

A full-featured web interface is included for human vs AI, 4-AI battle, and paipu replay.

cd web

# Using uv (recommended)
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt

# Or using pip
pip install -r requirements.txt

uvicorn server:app --host 0.0.0.0 --port 8000

Then open http://localhost:8000 in your browser:

Page Route Description
Human vs AI / Play against 3 AI opponents
4 AI Battle /ai_battle Watch 4 AI agents compete in real time
Paipu Replay /replay Step through a Tenhou XML paipu file

For a quick preview without installing, see the Live Demo (embedded in the documentation).

Pretrained Models

Pretrained opponent models are available from the GitHub releases:

env = pymahjong.SingleAgentMahjongEnv(opponent_agent="path/to/model.pth")

Offline Dataset

Human demonstration data from Tenhou.net (6 dan+ players) is available for offline RL research. Download from releases.

Incremental paipu pipeline

To build your own training cache from raw Tenhou paipu, use the resumable pipeline driven by a single append-only manifest:

# Download year zip → fetch XMLs → encode to token shards
python tools/paipu_pipeline.py run --work data/tenhou --year 2024 \
    --shard-rows 65536

# Inspect / verify state at any time
python tools/paipu_pipeline.py status --work data/tenhou
python tools/paipu_pipeline.py check  --work data/tenhou           # dry run
python tools/paipu_pipeline.py check  --work data/tenhou --repair  # delete corrupt XMLs

Properties:

  • Resumable: re-running run is a no-op once everything is encoded. Each shard is flushed atomically and only then recorded in the manifest, so a crash mid-shard simply re-encodes the affected paipu next time.
  • Integrity: every paipu is hashed with sha256 on every startup and compared against the manifest; identical content under different game ids is recorded as duplicate and the redundant XML is removed.
  • Self-healing: check --repair deletes corrupt XMLs and emits corrupt events; the next run will re-fetch / re-encode them.
  • Hand-dropped XMLs: drop files into <work>/xml/<game_id>.txt and they will be adopted automatically.

See docs/advanced/paipu_pipeline.md for the full manifest schema, layout on disk, and recovery flows.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Project Policies

Citing

If you use pymahjong in your research, please cite:

@inproceedings{han2022variational,
  title     = {Variational Oracle Guiding for Reinforcement Learning},
  author    = {Dongqi Han and Tadashi Kozuno and Xufang Luo and Zhao-Yun Chen 
               and Kenji Doya and Yuqing Yang and Dongsheng Li},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://openreview.net/forum?id=pjqqxepwoMy}
}

License

This project is licensed under the Apache License 2.0.

Contact

Acknowledgements

The shanten rewrite prompted by issue #30 benefited from the practical feedback by Apricot-S, who explicitly called out the limitations of meld/taatsu-counting shortcuts and pointed to stronger exact approaches.

When revisiting the design, I also consulted the following open-source repositories as algorithm references and prior art surveys. No code from them is vendored into this repository, but they were useful in evaluating tradeoffs, testing strategy, and validation direction: