Skip to content

feat(website): sort benchmark result.#1284

Merged
samchon merged 1 commit into
mainfrom
feat/sort
Apr 8, 2026
Merged

feat(website): sort benchmark result.#1284
samchon merged 1 commit into
mainfrom
feat/sort

Conversation

@samchon

@samchon samchon commented Apr 8, 2026

Copy link
Copy Markdown
Member

This pull request introduces improvements to the sorting logic for benchmark experiments, specifically focusing on how vendor/model strings are compared and ordered. It refactors the comparison logic into reusable functions, adds comprehensive tests for vendor comparison, and updates the main sorting call to use the new logic.

Sorting and Comparison Logic Improvements:

  • Added a new compare function to AutoBeReplayComputer that sorts benchmark results first by aggregate score (descending), then by vendor/model string using the new compareVendors function. (packages/benchmark/src/replay/AutoBeReplayComputer.ts packages/benchmark/src/replay/AutoBeReplayComputer.tsL19-R45)
  • Implemented the compareVendors function, which parses vendor/model strings to compare model families, parameter sizes, and versions in a meaningful order (e.g., larger parameter sizes and higher versions come first; vendors are sorted alphabetically otherwise). Also added a helper parsePart function for parsing string segments as numbers. (packages/benchmark/src/replay/AutoBeReplayComputer.ts [1] [2]
  • Updated the main benchmark publishing script to use the new AutoBeReplayComputer.compare function for sorting experiments, ensuring consistent and correct ordering. (test/src/archive/publish.ts test/src/archive/publish.tsL47-R47)

Testing Enhancements:

Interface Update:

@samchon samchon self-assigned this Apr 8, 2026
@samchon samchon added the documentation Improvements or additions to documentation label Apr 8, 2026
@samchon samchon added this to WrtnLabs Apr 8, 2026
@samchon samchon marked this pull request as ready for review April 8, 2026 03:37
Copilot AI review requested due to automatic review settings April 8, 2026 03:37
@samchon samchon merged commit 0afd36f into main Apr 8, 2026
7 checks passed
@samchon samchon deleted the feat/sort branch April 8, 2026 03:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines benchmark experiment sorting by centralizing comparison logic in AutoBeReplayComputer, improving vendor/model ordering semantics, and adding targeted tests for vendor comparison behavior.

Changes:

  • Added AutoBeReplayComputer.compare and compareVendors to sort by aggregate score (desc) then vendor/model.
  • Updated benchmark publishing script to use the shared comparator.
  • Added a new test suite covering vendor/model comparison ordering rules.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File Description
packages/benchmark/src/replay/AutoBeReplayComputer.ts Introduces shared comparators for benchmark sorting and a helper for parsing numeric parts.
test/src/archive/publish.ts Replaces inline sort logic with AutoBeReplayComputer.compare for consistency.
test/src/features/benchmark/test_benchmark_compare_vendors.ts Adds tests validating vendor/model sorting rules (params, variants, vendor ordering, numeric versions).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +29 to +32
export const compareVendors = (a: string, b: string): number => {
const pa = a.split("-");
const pb = b.split("-");
const len = Math.max(pa.length, pb.length);

Copilot AI Apr 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compareVendors splits the full vendor/model slug on -, which also splits vendor names that contain hyphens (e.g. z-ai/...) and mixes vendor/model into the same token stream. This can produce incorrect vendor-level alphabetical ordering and inconsistent comparisons. Consider splitting on / first (compare vendor segment(s) separately), then apply - tokenization only to the model portion.

Copilot uses AI. Check for mistakes.
Comment on lines +193 to +194
const n = Number(t);
return isNaN(n) ? null : n;

Copilot AI Apr 8, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsePart only recognizes numeric parts that are plain numbers or ...b/a...b. Real vendor/model slugs in the repo include patterns like moonshotai/kimi-k2.5 and minimax/minimax-m2.7, where the version token has a leading letter (k2.5, m2.7). Those won’t be parsed as numbers, so compareVendors falls back to lexicographic (ascending) ordering for versions, which contradicts the intended “higher versions first” behavior. Extend parsing to handle a leading alphabetic prefix + numeric version and add coverage for these cases.

Suggested change
const n = Number(t);
return isNaN(n) ? null : n;
const direct: number = Number(t);
if (!isNaN(direct)) return direct;
const prefixed: RegExpMatchArray | null = t.match(
/^([A-Za-z]+)(\d+(?:\.\d+)?)$/,
);
if (prefixed !== null) {
const numeric: number = Number(prefixed[2]);
return isNaN(numeric) ? null : numeric;
}
return null;

Copilot uses AI. Check for mistakes.
sunrabbit123 pushed a commit that referenced this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants