feat(website): sort benchmark result. by samchon · Pull Request #1284 · wrtnlabs/autobe

samchon · 2026-04-08T03:37:43Z

This pull request introduces improvements to the sorting logic for benchmark experiments, specifically focusing on how vendor/model strings are compared and ordered. It refactors the comparison logic into reusable functions, adds comprehensive tests for vendor comparison, and updates the main sorting call to use the new logic.

Sorting and Comparison Logic Improvements:

Added a new compare function to AutoBeReplayComputer that sorts benchmark results first by aggregate score (descending), then by vendor/model string using the new compareVendors function. (packages/benchmark/src/replay/AutoBeReplayComputer.ts packages/benchmark/src/replay/AutoBeReplayComputer.tsL19-R45)
Implemented the compareVendors function, which parses vendor/model strings to compare model families, parameter sizes, and versions in a meaningful order (e.g., larger parameter sizes and higher versions come first; vendors are sorted alphabetically otherwise). Also added a helper parsePart function for parsing string segments as numbers. (packages/benchmark/src/replay/AutoBeReplayComputer.ts [1] [2]
Updated the main benchmark publishing script to use the new AutoBeReplayComputer.compare function for sorting experiments, ensuring consistent and correct ordering. (test/src/archive/publish.ts test/src/archive/publish.tsL47-R47)

Testing Enhancements:

Added a comprehensive test suite for compareVendors, covering various real-world and edge cases to ensure correct sorting by parameter size, vendor, version, and model variant. (test/src/features/benchmark/test_benchmark_compare_vendors.ts test/src/features/benchmark/test_benchmark_compare_vendors.tsR1-R95)

Interface Update:

Imported the IAutoBePlaygroundBenchmark interface where needed to support the new comparison logic. (packages/benchmark/src/replay/AutoBeReplayComputer.ts packages/benchmark/src/replay/AutoBeReplayComputer.tsR5)

Copilot

Pull request overview

This PR refines benchmark experiment sorting by centralizing comparison logic in AutoBeReplayComputer, improving vendor/model ordering semantics, and adding targeted tests for vendor comparison behavior.

Changes:

Added AutoBeReplayComputer.compare and compareVendors to sort by aggregate score (desc) then vendor/model.
Updated benchmark publishing script to use the shared comparator.
Added a new test suite covering vendor/model comparison ordering rules.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File	Description
packages/benchmark/src/replay/AutoBeReplayComputer.ts	Introduces shared comparators for benchmark sorting and a helper for parsing numeric parts.
test/src/archive/publish.ts	Replaces inline sort logic with `AutoBeReplayComputer.compare` for consistency.
test/src/features/benchmark/test_benchmark_compare_vendors.ts	Adds tests validating vendor/model sorting rules (params, variants, vendor ordering, numeric versions).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T03:44:22Z

+  export const compareVendors = (a: string, b: string): number => {
+    const pa = a.split("-");
+    const pb = b.split("-");
+    const len = Math.max(pa.length, pb.length);


compareVendors splits the full vendor/model slug on -, which also splits vendor names that contain hyphens (e.g. z-ai/...) and mixes vendor/model into the same token stream. This can produce incorrect vendor-level alphabetical ordering and inconsistent comparisons. Consider splitting on / first (compare vendor segment(s) separately), then apply - tokenization only to the model portion.

Copilot · 2026-04-08T03:44:22Z

+  const n = Number(t);
+  return isNaN(n) ? null : n;


parsePart only recognizes numeric parts that are plain numbers or ...b/a...b. Real vendor/model slugs in the repo include patterns like moonshotai/kimi-k2.5 and minimax/minimax-m2.7, where the version token has a leading letter (k2.5, m2.7). Those won’t be parsed as numbers, so compareVendors falls back to lexicographic (ascending) ordering for versions, which contradicts the intended “higher versions first” behavior. Extend parsing to handle a leading alphabetic prefix + numeric version and add coverage for these cases.

Suggested change

const n = Number(t);

return isNaN(n) ? null : n;

const direct: number = Number(t);

if (!isNaN(direct)) return direct;

const prefixed: RegExpMatchArray | null = t.match(

/^([A-Za-z]+)(\d+(?:\.\d+)?)$/,

);

if (prefixed !== null) {

const numeric: number = Number(prefixed[2]);

return isNaN(numeric) ? null : numeric;

}

return null;

feat(website): sort benchmark result.

841e4ae

samchon self-assigned this Apr 8, 2026

samchon added the documentation Improvements or additions to documentation label Apr 8, 2026

samchon added this to WrtnLabs Apr 8, 2026

samchon marked this pull request as ready for review April 8, 2026 03:37

Copilot AI review requested due to automatic review settings April 8, 2026 03:37

samchon merged commit 0afd36f into main Apr 8, 2026
7 checks passed

samchon deleted the feat/sort branch April 8, 2026 03:38

Copilot started reviewing on behalf of samchon April 8, 2026 03:38 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

sunrabbit123 pushed a commit that referenced this pull request Apr 9, 2026

feat(website): sort benchmark result. (#1284)

df74275

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(website): sort benchmark result.#1284

feat(website): sort benchmark result.#1284
samchon merged 1 commit into
mainfrom
feat/sort

samchon commented Apr 8, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-  const n = Number(t);
-  return isNaN(n) ? null : n;
+  const direct: number = Number(t);
+  if (!isNaN(direct)) return direct;
+  const prefixed: RegExpMatchArray | null = t.match(
+    /^([A-Za-z]+)(\d+(?:\.\d+)?)$/,
+  );
+  if (prefixed !== null) {
+    const numeric: number = Number(prefixed[2]);
+    return isNaN(numeric) ? null : numeric;
+  }
+  return null;

Uh oh!

Conversation

samchon commented Apr 8, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants