Skip to content

Fix xgboost categorical codes#773

Open
andife wants to merge 6 commits into
onnx:mainfrom
andife:fix-xgboost-categorical-codes
Open

Fix xgboost categorical codes#773
andife wants to merge 6 commits into
onnx:mainfrom
andife:fix-xgboost-categorical-codes

Conversation

@andife

@andife andife commented Jun 9, 2026

Copy link
Copy Markdown
Member

No description provided.

andife added 6 commits June 9, 2026 17:11
XGBoost stores category codes (0, 1, 2...) in its tree JSON dump for
categorical splits. The ONNX converter builds BRANCH_EQ nodes that
compare against those codes. However, the tests were passing the actual
category values (e.g. 65, 66, 67 for 'A', 'B', 'C' after ord()) instead
of the pandas category codes (0, 1, 2...).

Since the values never matched any BRANCH_EQ split condition, all samples
fell to the same default leaf, producing a constant ONNX output and causing
assertions to fail with 100% element mismatch.

Fix: use X["f0"].cat.codes instead of X[["f0"]].values in:
- test_xgb_regressor_categorical_hist
- test_xgb_regressor_categorical_hist_native
- test_xgb_regressor_only_categorical_hist

Signed-off-by: Andreas Fehlner <fehlner@arcor.de>
LightGBM's predict_proba() with a custom objective returns raw scores
(not probabilities) and emits a warning to that effect. Newer skl2onnx
(installed alongside onnx>=1.18) correctly wraps binary classifier output
with a sigmoid node, so the ONNX model produces probabilities while
LightGBM produces raw scores — causing the assertion to fail.

Fix: detect at runtime whether the ONNX output is probabilities (rows sum
to 1) or raw scores, and apply scipy.special.expit (sigmoid) to the
LightGBM raw scores before comparing in the probability case. This keeps
backward compatibility with older skl2onnx where ONNX returns raw scores.

Signed-off-by: Andreas Fehlner <fehlner@arcor.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant