关于w8a8在mac M5 24G上的问题补充

Yes, I can provide more details.

Environment:
- Hardware: MacBook Pro with Apple M5, 24GB RAM
- mano-cua: 1.1.1
- Local model: Mano-P w8a16, path: ~/.mano/models/Mano-P/w8a16
- Python env: ~/.mano/venv
- cider: 0.7.0
- cider.is_available(): True
- vlm_service: OK
- Current config was restored to w8a8=off after the test.

Observed behavior:

1. With w8a8 unset/off:
   - Performance was much faster.
   - Previous observed speed was around:
     - prefill: ~1150 tok/s
     - decode: ~24 tok/s
     - peak memory: ~6.4GB
   - Each GUI step was still not instant, but much more usable.

2. After setting:
   mano-cua config --set w8a8 auto

   The slowdown was reproducible across multiple steps, not only the first step.

   Logs:
   - At startup:
     [cider] Converted 252 layers to CiderLinear in 27.1s

   Step 1:
   - decode: 67 tokens, 2.5 tok/s, peak_mem=6.3GB
   - step time: 48.0s
   - prefill: ~130 tok/s

   Step 2:
   - decode: 61 tokens, 2.8 tok/s, peak_mem=6.3GB
   - step time: 54.8s
   - prefill: ~128-130 tok/s

   Step 3:
   - decode: 64 tokens, 2.8 tok/s
   - step time: 55.3s

So it is not just the first step being slow due to model conversion/prewarm. The first step includes an extra 27.1s Cider conversion, but later steps are still very slow: around 50+ seconds per step, decode only ~2.5-2.8 tok/s.

3. Switching back to:
   mano-cua config --set w8a8 off

   restored the previous faster behavior.

Another thing I noticed:
The config help says w8a8 default is auto, but in visual/agents/local.py the code appears to use:

   w8a8_mode = get_config("w8a8") or "off"

So if the config is unset, it actually behaves as off, not auto. This may be a docs/config mismatch.

Summary:
- The slowdown with w8a8=auto is reproducible.
- It affects every inference step, not only the first one.
- On this machine, w8a8 makes both prefill and decode much slower:
  - prefill drops from ~1150 tok/s to ~130 tok/s
  - decode drops from ~24 tok/s to ~2.5-2.8 tok/s
- The Cider conversion itself takes ~27.1s, but the main issue is that subsequent steps remain slow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于w8a8在mac M5 24G上的问题补充 #31

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

关于w8a8在mac M5 24G上的问题补充 #31

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions