Skip to content

[Bug] [Perf] [SIT] Llama-2-7b-chat-hf Model performance has degraded. #527

@liuxianglong17

Description

@liuxianglong17

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

[Version Information] main
[Version problem] New version issue
[Assignee] wangyao
[Issue Time] 2026/5/12
[Valid issue] Yes
[Description]
The execution of the test case python/sglang/multimodal_gen/test/server/ascend/test_server_1_npu.py failed due to performance not meeting the required standards.

Reproduction

[Test Steps]
Execute the test case python/sglang/multimodal_gen/test/server/ascend/test_server_1_npu.py
[Expected Result]
The test case is passed.
[Actual Result]
The failure is caused by substandard performance.
[Opt] [TestCase Path]
python/sglang/multimodal_gen/test/server/ascend/test_server_1_npu.py

Environment

A3, single-node, PD co-deployed

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions