Skip to content

Trace File Refactor#829

Open
SkiHatDuckie wants to merge 25 commits into
vllm-project:mainfrom
SkiHatDuckie:trace-merge
Open

Trace File Refactor#829
SkiHatDuckie wants to merge 25 commits into
vllm-project:mainfrom
SkiHatDuckie:trace-merge

Conversation

@SkiHatDuckie

@SkiHatDuckie SkiHatDuckie commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

A refactoring of trace formats that separates format-agnostic trace replay functionality from format-specific functionality. Notably, all formats now work with the same dataset deserializer. Two abstract classes TraceDataArgs and TraceFormatBase are required to be implemented by all formats instead.

Additional documentation has been added to better cover all supported trace formats and their different requirements.

The unique prefixes for cache resistance found originally in trace_synthetic.py (now trace_minimal.py) was removed due to being incompatible with the new model. It may be re-added as a feature in future PRs through another means.

Details

  • Added trace_common.py
    • All trace formats use the same TraceDatasetDeserializer
    • Moved commonly used functions such as generate_token_ids and decode_prompt to trace_common.py
    • Added TraceDataArgs: an abstract class inherited by all formats
    • Added TraceFormatBase and TraceFormatRegistry: defines an interface for format-specific requirements and functionality on top of TraceExamplesIterable
  • Replaced TraceSyntheticDatasetDeserializer and TraceSyntheticDataArgs with MinimalTraceFormat and MinimalTraceFormatArgs
  • Replaced TraceMooncakeDatasetDeserializer and TraceMooncakeDataArgs with MooncakeTraceFormat and MooncakeTraceFormatArgs
  • Renamed trace_synthetic.py -> trace_minimal.py
  • Renamed test_trace_synthetic.py -> test_trace_minimal.py
  • Added test_trace_common.py, and rearranged preexisting tests accordingly
  • Updated test_replay_profile.py, test_trace_replay.py and test_trace_replay_multiprocess.py
  • Fixed a bug with Mooncake format not working with multiprocessing
  • All trace formats now work with IterableDataset for streaming
  • Added documentation trace_file_formats.md to cover all trace formats supported by GuideLLM
  • Updated documentation in getting_started/benchmark.md and guides/datasets.md
  • Updated inline documentation
  • Updated import registry in data/deserializers/__init__.py
  • Moved common dataset validation checks to load_trace_rows
  • Removed unique prefixes for cache-resistance in trace_minimal.py

Test Plan

  • tox -e test-unit
  • tox -e test-integration
  • tox -e lint-check && tox -e type-check

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes code generated or substantially modified by an AI agent
  • Includes tests generated or substantially modified by an AI agent

NOTE: the Generated-by or Assisted-by trailers should be used in git commit messages when code or tests were generated or substantially modified by an AI agent, as described in the project's DEVELOPING.md file.


git log

commit e6f5007
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 15 12:16:03 2026 -0400

Move dataset validation to load_trace_rows

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 2e237ec
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 15 13:00:51 2026 -0400

Rename `TraceColumn` in test file to `TraceColumnGenerator`

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 072cc28
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 15 15:11:25 2026 -0400

Add relative_timestamp column to deserialized dataset

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 4cd77db
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 16 09:38:05 2026 -0400

Switch to streaming datasets for synthetic trace

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit a91ceb4
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 16 09:58:18 2026 -0400

Update docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit b10a590
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Tue Jun 16 16:58:54 2026 -0400

Add trace_common.py + classes

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit d24d8a2
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 17 10:57:16 2026 -0400

Repair broken test files

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit a6cb4b4
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 17 16:34:10 2026 -0400

Instantiate/Validate/Dispatch formats through TraceFormatArgs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit c119743
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Thu Jun 18 15:09:33 2026 -0400

Rework format handling; flatten data args for CLI

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit a83a6c4
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Thu Jun 18 16:16:15 2026 -0400

Repair tests

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit e9c3d3a
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Thu Jun 18 16:18:38 2026 -0400

Remove TraceDataset from __all__

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 5010011
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Thu Jun 18 16:25:29 2026 -0400

Move common funcs to trace_common

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 4acc11c
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 11:04:41 2026 -0400

Add test_trace_common.py and rearrange tests

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 85220c8
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 11:15:03 2026 -0400

Refactor test_trace_synthetic

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 2e49c7e
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 11:28:35 2026 -0400

Rename trace_synthetic to trace_minimal

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 2c1d450
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 11:49:30 2026 -0400

Improve text coverage

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 4af294b
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 12:20:30 2026 -0400

Update inline docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 175faeb
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 16:12:02 2026 -0400

Update docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 283cfff
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Mon Jun 22 16:17:20 2026 -0400

Cleanup linting & docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit bb9c24a
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 24 10:07:04 2026 -0400

Spread `kind`ness

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit cfc23a9
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 24 10:47:16 2026 -0400

Update docs

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 2f8e9c9
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 24 11:31:31 2026 -0400

Fix: Register formats with deserializer

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit be9e85e
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Wed Jun 24 11:35:31 2026 -0400

Satisfy linting

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 42c9dcf
Author: SkiHatDuckie SkiHatDuckie@gmail.com
Date: Thu Jun 25 13:00:26 2026 -0400

Move `timestamps` outside the loop

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

Signed-off-by: SkiHatDuckie SkiHatDuckie@gmail.com

@mergify

mergify Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Hi @SkiHatDuckie, the DCO check has failed. Please click on DCO in the Checks section for instructions on how to resolve this.

@SkiHatDuckie

Copy link
Copy Markdown
Contributor Author

Sorry, messed up the rebase. Give me a minute while I clean up the history.

@dbutenhof dbutenhof left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just logging a few doc comments I caught in a quick scan. I'll get to the code tomorrow morning...

Comment thread docs/getting-started/benchmark.md Outdated
Comment thread docs/getting-started/benchmark.md Outdated
Comment thread docs/getting-started/benchmark.md Outdated
Comment thread docs/guides/datasets.md Outdated
Comment thread docs/guides/datasets.md Outdated
Comment thread docs/guides/trace_file_formats.md Outdated
Comment thread docs/guides/trace_file_formats.md Outdated
Comment thread docs/guides/trace_file_formats.md Outdated
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
@mergify

mergify Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Hi @SkiHatDuckie, the DCO check has failed. Please click on DCO in the Checks section for instructions on how to resolve this.

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
@sjmonson

Copy link
Copy Markdown
Collaborator

Congrats on breaking the update-description job 😁

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
mergify Bot pushed a commit that referenced this pull request Jun 25, 2026
…#855)

## Summary
This is a separate PR for the bug fix contained in #829, if we instead wish to just get the bug fix in for v0.7.0. This will be closed if the trace file refactor is merged, or after the release of v0.7.0.

## Details
- Fixed a bug with Mooncake format not working with multiprocessing

## Related Issues
- This is also fixed with #829 

---

- [x] "I certify that all code in this PR is my own, except as noted below."

## Use of AI

- [ ] Includes code generated or substantially modified by an AI agent
- [ ] Includes tests generated or substantially modified by an AI agent

> NOTE: the `Generated-by` or `Assisted-by` trailers should be used in git commit messages when code or tests were generated or substantially modified by an AI agent, as described in the project's [`DEVELOPING.md`](https://github.com/vllm-project/guidellm/blob/main/DEVELOPING.md) file.





---

# git log

commit 3b89ec2
Author: SkiHatDuckie <SkiHatDuckie@gmail.com>
Date:   Thu Jun 25 11:54:24 2026 -0400

    Hotfix: Add relative_timestamp column to output in Mooncake deserializer
    
    Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

commit 8d2cba0
Author: SkiHatDuckie <SkiHatDuckie@gmail.com>
Date:   Thu Jun 25 12:59:10 2026 -0400

    Move `timestamps` outside the loop
    
    Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>

---------

Signed-off-by: SkiHatDuckie <SkiHatDuckie@gmail.com>
Signed-off-by: SkiHatDuckie <63932363+SkiHatDuckie@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add Mooncake Trace Data Support

3 participants