Qwen3-Embedding-8B produces NaN embeddings for token 474 ("import")

### System Info

Docker image: ghcr.io/huggingface/text-embeddings-inference:cuda-1.9
Start command: `--tokenization-workers=16 --dtype float16 --auto-truncate --max-client-batch-size 128`
Host OS: Ubuntu



### Information

- [x] Docker
- [ ] The CLI directly

### Tasks

- [ ] An officially supported command
- [ ] My own modifications

### Reproduction

I've been running Qwen3-Embedding-8B with float16 dtype and noticed that any input starts with the token "import" (token ID 474), such as "importance", "import", and "important" will cause all-NaN vectors returned.

```bash
# Reproduction

curl <TEI_URL>/embed \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"inputs": "importance"}'
# Returns: [[NaN, NaN, NaN, ...]]
```

### Expected behavior

When investigating, I found people having the exactly same issue -> https://huggingface.co/Qwen/Qwen3-Embedding-8B/discussions/21, and padding the word with a leading space does seem to mitigate it.

I tried checking out tag v1.9.2 and traced through the model layer by layer and found that the NaN originates from
an FP16 overflow in the MLP layers. Here's the chain of events:

1. RMSNorm normalizes hidden states to ~1.0 — perfectly safe in F16
2. Attention runs fine on these normalized values
3. The MLP's down_proj output, however, reaches values around ~2.95 million for this token
4. F16 can only represent values up to ~65504, so this overflows to Inf
5. The residual add (Inf + finite) stays Inf
6. The next layer's RMSNorm receives Inf and produces NaN
7. NaN propagates through every remaining layer

The overflow first appears at layer 2 and corrupts the entire output from that point on.

Not sure whether it's reasonable, my hypothesis is that Qwen3-Embedding-8B was trained in BF16 -> The MLP weights learned during BF16 training produce activations exceed what F16 range so precision mismatch error occurred.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-Embedding-8B produces NaN embeddings for token 474 ("import") #845

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Qwen3-Embedding-8B produces NaN embeddings for token 474 ("import") #845

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions