Skip to content

[fix(docker)]: build fine-tune env from source instead of the unpublished release wheel#2028

Open
devangpratap wants to merge 2 commits into
kvcache-ai:mainfrom
devangpratap:fix/1762-docker-source-install
Open

[fix(docker)]: build fine-tune env from source instead of the unpublished release wheel#2028
devangpratap wants to merge 2 commits into
kvcache-ai:mainfrom
devangpratap:fix/1762-docker-source-install

Conversation

@devangpratap

@devangpratap devangpratap commented May 31, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes #1762.

The docker/Dockerfile sft build downloads a prebuilt ktransformers wheel:

https://github.com/kvcache-ai/ktransformers/releases/download/v0.5.3/ktransformers-0.5.3+cu128torch28fancy-cp312-cp312-linux_x86_64.whl

No ktransformers release (v0.5.0 through v0.6.2) publishes wheel assets, so that URL returns 404 and curl -fsSL aborts the build.

On current main the top-level ktransformers package is a thin metapackage (kt-kernel plus [sft] extras), and the repo is already cloned into the image. This installs it from that source instead of the missing wheel, and aligns the fine-tune env with the dependencies current kt-kernel pins.

Changes

  • Remove the KTRANSFORMERS_VERSION / KTRANSFORMERS_WHEEL ARGs and the dead wheel download.
  • Build the local kt-kernel from source in the fine-tune env (same as the serve env), then install ktransformers[sft] from the cloned /workspace/ktransformers. This uses the checked-out C++ kernels and avoids pip falling back to a PyPI kt-kernel that may not exist for an unreleased version.
  • Bump the fine-tune env to torch==2.9.1, which is what kt-kernel (pulled in by [sft]) pins.
  • Switch FLASH_ATTN_WHEEL to the matching torch2.9 build from the same v2.8.3 release.
  • Add editables to the build tools, required by the editable --no-build-isolation install of LLaMA-Factory.
  • Drop the removed wheel from the cleanup step.

What was tested

Replicated the fine-tune env dependency resolution in a nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04 container (CUDA 12.8, Python 3.12), running the same commands in the same order as the Dockerfile fine-tune stage:

  • torch==2.9.1 (cu128), then LLaMA-Factory[torch,metrics], then ktransformers[sft], then the flash_attn torch2.9 wheel.
  • All install with exit code 0. torch stays at 2.9.1+cu128 after every step (no silent up/downgrade).
  • kt_kernel, llamafactory, transformers (transformers-kt 5.6.0), and accelerate (accelerate-kt 1.14.0) import.
  • pip check reports no broken requirements.

This confirms the torch/flash_attn/editables alignment and that ktransformers[sft] plus LLaMA-Factory coexist under torch 2.9.1. In that run kt-kernel came from PyPI as a stand-in for dependency resolution.

What was not tested

  • A full docker build of this Dockerfile was not run. The base image (docker.1ms.run) and the tsinghua conda/pip mirrors are not reachable from the test machine, and the single framework stage also compiles DeepEP, kt-kernel (all CPU variants), and sglang for the serve env, which this PR does not touch.
  • The local kt-kernel source build in the fine-tune env (CPUINFER_BUILD_ALL_VARIANTS=1 ./install.sh build) was not run here. It mirrors the existing serve-env build step exactly, but it is a heavy C++/CUDA compile that was not executed in this verification.
  • The editables addition depends on the hatchling version resolved at build time. It was required to reproduce a clean editable install here; it is harmless if the mirror-resolved hatchling would not have needed it.

Before submitting

  • Read the contributor guideline.
  • Did you write any new necessary tests? No automated test added; a Dockerfile change is verified by reproducing the install sequence as described above.

…shed release wheel

The sft build downloaded a prebuilt ktransformers wheel from a release URL that
404s (no ktransformers release publishes wheel assets), so docker build aborted
at the curl step. Install ktransformers[sft] from the already-cloned source
instead, and align the fine-tune env with what current kt-kernel pins: torch
2.9.1, the matching flash_attn torch2.9 wheel, and editables for the editable
no-build-isolation install of LLaMA-Factory.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Dockerfile to install ktransformers from the cloned source tree instead of downloading a prebuilt wheel, and upgrades PyTorch to version 2.9.1 along with the corresponding flash_attn wheel. A critical issue was identified where installing ktransformers[sft] directly without building the local kt-kernel package first will cause pip to download kt-kernel from PyPI, ignoring local C++ kernel modifications. It is recommended to build and install the local kt-kernel in the fine-tune environment prior to installing ktransformers.

Comment thread docker/Dockerfile
Comment on lines 359 to 362
RUN --mount=type=cache,target=/root/.cache/pip \
if [ "$FUNCTIONALITY" = "sft" ]; then \
/opt/miniconda3/envs/fine-tune/bin/pip install /workspace/${KTRANSFORMERS_WHEEL}; \
/opt/miniconda3/envs/fine-tune/bin/pip install "/workspace/ktransformers[sft]"; \
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Installing ktransformers[sft] directly from /workspace/ktransformers[sft] without first building and installing the local kt-kernel package in the fine-tune environment will cause pip to resolve and download kt-kernel from PyPI (or the configured mirror). This means any local modifications to the C++ kernels in kt-kernel/ will be completely ignored during fine-tuning, and it could also fail if the matching version of kt-kernel is not yet published.

To ensure consistency and that local changes are correctly built and used, we should build and install the local kt-kernel in the fine-tune environment using the same install.sh script, just as is done for the serve environment.

RUN --mount=type=cache,target=/root/.cache/pip \
    if [ "$FUNCTIONALITY" = "sft" ]; then \
        . /opt/miniconda3/etc/profile.d/conda.sh && conda activate fine-tune \
        && cd /workspace/ktransformers/kt-kernel \
        && CPUINFER_BUILD_ALL_VARIANTS=1 ./install.sh build \
        && pip install "/workspace/ktransformers[sft]"; \
    fi

…ng ktransformers

Per review feedback: building kt-kernel from the cloned source (as the serve env
already does) means the locally checked-out C++ kernels are used and pip does not
fall back to a PyPI kt-kernel, which may not exist for an unreleased version.
@devangpratap

Copy link
Copy Markdown
Contributor Author

@jdai0 sorry for the ping, this is ready for review, just needs the run-ci label. Thanks

@jdai0 jdai0 added the run-ci label Jun 8, 2026
@jdai0

jdai0 commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Thanks, I've added the run-ci label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dockerfile里面的whl被移除了,无法部署docker

2 participants