Dockerfile: catch errors at every step#238
Conversation
|
Prefer using |
EricLBuehler
left a comment
There was a problem hiding this comment.
Hi @sempervictus - thanks for the PR!
If you could use the SHELL instruction as in EricLBuehler/mistral.rs#1612, that would be amazing!
polarathene
left a comment
There was a problem hiding this comment.
In addition to the SHELL instruction request, there are two other things I noticed:
- Thread args being set to 64 seems unintentional?
- The oneAPI/MKL support should leverage HereDoc too where I've provided two separate suggestions. Those may be better suited as a separate PR.
Also please ensure you retain the comment for the openmpi-bin package only being required to support the NCCL feature.
| ARG RAYON_NUM_THREADS=64 | ||
| ARG RUST_NUM_THREADS=64 |
There was a problem hiding this comment.
I think this was not intended as part of the PR? CI will not like this according to the comment above these lines:
| ARG RAYON_NUM_THREADS=64 | |
| ARG RUST_NUM_THREADS=64 | |
| ARG RAYON_NUM_THREADS=4 | |
| ARG RUST_NUM_THREADS=4 |
| # MKL run dependencies | ||
| RUN curl -s https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ | ||
| | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null \ | ||
| && echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | \ | ||
| tee /etc/apt/sources.list.d/oneAPI.list \ | ||
| && apt-get update \ | ||
| && apt-get install -y intel-oneapi-hpc-toolkit |
There was a problem hiding this comment.
The migration of this block is a tad unnecessary for the focus of the PR, but isolating the support required for the feature is nice 👍
It appears that the repo addition is adapted from the official guidance, although I would suggest a slight adjustment:
| # MKL run dependencies | |
| RUN curl -s https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ | |
| | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null \ | |
| && echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | \ | |
| tee /etc/apt/sources.list.d/oneAPI.list \ | |
| && apt-get update \ | |
| && apt-get install -y intel-oneapi-hpc-toolkit | |
| # Add runtime support for the MKL feature: | |
| RUN <<HEREDOC | |
| curl -fsSL https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ | |
| | gpg --dearmor > /usr/share/keyrings/upstream-intel-oneapi.gpg | |
| echo "deb [signed-by=/usr/share/keyrings/upstream-intel-oneapi.gpg] https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/upstream-intel-oneapi.list | |
| apt-get -qq update | |
| apt-get -qq install intel-oneapi-hpc-toolkit | |
| HEREDOC |
There was a problem hiding this comment.
Or even better, something like this to adopt the DEB822 format (since the older list format is deprecated):
| # MKL run dependencies | |
| RUN curl -s https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ | |
| | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null \ | |
| && echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | \ | |
| tee /etc/apt/sources.list.d/oneAPI.list \ | |
| && apt-get update \ | |
| && apt-get install -y intel-oneapi-hpc-toolkit | |
| ### Add runtime support for the MKL feature ### | |
| # 1. Add the upstream package repository for Intel oneAPI: | |
| COPY <<HEREDOC /etc/apt/sources.list.d/upstream-intel-oneapi.sources | |
| Types: deb | |
| URIs: https://apt.repos.intel.com/oneapi | |
| Suites: all | |
| Components: main | |
| Signed-By: /usr/share/keyrings/upstream-intel-oneapi.gpg | |
| HEREDOC | |
| # 2. Add public GPG key for verifying trust and install package: | |
| RUN <<HEREDOC | |
| curl -fsSL https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ | |
| | gpg --dearmor > /usr/share/keyrings/upstream-intel-oneapi.gpg | |
| apt-get -qq update | |
| apt-get -qq install intel-oneapi-hpc-toolkit | |
| HEREDOC |
This change is not absolutely necessary, the existing .list format remains compatible for the time being but is deprecated in favor of DEB822 .sources.
Ubuntu 24.04+ and Debian 12+ already use the newer format (from /etc/apt/sources.list => /etc/apt/sources.list.d/*.sources), so it is encouraged to migrate to the DEB822 format.
|
Review feedback raised as separate PRs: |
Can change it to use that but then we would lose the ability to run conditional tests within the command chains a la |
|
@EricLBuehler - i dont have time to address the bot-generated spam from @polarathene. Flooding the zone with automated "solutions" buried in walls of LLM-generated text and thought up by an automaton is fairly common practice for racking up GH street cred when applying for jobs these days, we have interviewed a couple of bartenders-come-developers and the like over the last couple of years who are very fond of the practice and none of them got past the first round. It pollutes the source tree, the issues list, and the atmosphere. I am happy to contribute what i know to your projects but this isn't a productive work environment given the tenor and cadence. |
I don't use LLMs to assist with my dev / reviews 😕 I have provided you with legitimate feedback, it's not my problem you cannot handle the verbosity and choose to dismiss it. If anyone is contributing slop that is coming from you, as my feedback has raised concerns over numerous times. Instead of owning up to these errors (as I have done when it happens), you choose to protect your ego by throwing insults my way? 🤷♂️ My solutions aren't automated, I took the time to write everything, I'm just very thorough and lack time to revise my thoughts into something more condense. Any similarity to LLMs is due to a well known issue for autistics, you'll find a variety of resources on it if you genuinely cared, but I don't expect you to given how you've chosen to carry yourself. I apologize if you've been upset by my interactions with you, but I am critical and unbiased with my review feedback which I believe is best for the projects I contribute reviews to. I've always been open to discussing changes with you, but with the interests best aligned with the project. I'm not seeking any GH street cred. I don't find my skills to be in much demand, nor anyone in a hiring context to care about my OSS contributions given they're predominantly Docker, CI, and mail servers. |
It's already been changed via #241 Readability is preferred for maintenance. There is no need to rely on short-circuit chaining when you can use an actual conditional within the HereDoc content. See below however, unless I misunderstood you AFAIK you're mistaken anyway? 🤷♂️ Without the FROM fedora:42
ARG WITH_FEATURES="foo bar"
RUN <<HEREDOC
echo "${WITH_FEATURES}" | grep mpi >/dev/null && echo 'build with mpi'
HEREDOCIt's only a non-issue if there was something else to run as a subsequent command since the last encountered exit status would be returned. This won't happen when you're using an if clause instead of a ternary statement short-circuit behaviour: FROM fedora:42
SHELL ["/bin/bash", "-e", "-o", "pipefail", "-c"]
ARG WITH_FEATURES="foo bar"
RUN <<HEREDOC
if $(grep -q mpi <<< "${WITH_FEATURES}"); then
echo 'build with mpi'
fi
HEREDOCFor reference: FROM fedora:42
SHELL ["/bin/bash", "-e", "-o", "pipefail", "-c"]
ARG WITH_FEATURES="foo bar"
RUN <<HEREDOC
# Silently continues onto the next grep command, even with `-e`:
# This behaviour is explained here: https://github.com/EricLBuehler/candle-vllm/pull/238#discussion_r2258564391
grep -q mpi <<< "${WITH_FEATURES}" && echo 'build with mpi'
# Will fail at this point of the script with `-e`:
grep -q mpi <<< "${WITH_FEATURES}"
# Without the SHELL instruction / `-e`, the script will complete instead of fail:
echo 'done'
HEREDOCYour statement seems to contrast to the above reproduction example, where you claim that first command would have failed and bailed unintentionally as your complaint? So that difference is dependent on the ternary condition being called earlier or last, whereas it would not matter with a proper if statement clause. |
polarathene
left a comment
There was a problem hiding this comment.
I'm still in favour of the two separate focused PRs:
Where you can conditionally opt-out of the MKL package install afterwards with the suggested ARG hoisting and conditional guard:
grep -qv 'mkl' <<< "${WITH_FEATURES}" && exit 0Although at that point it may be better to as my review feedback suggests, use the check to append the package to a list?
|
|
||
| # MKL build dependencies | ||
| RUN curl -s https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ | ||
| RUN echo "$WITH_FEATURES" | grep -q "mkl" || exit 0 && curl -s https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ |
There was a problem hiding this comment.
FWIW if that exit 0 were echo 'fail' it would continue on with the &&, rather than what might be expected / implied as separate conditional branches.
Even with the requested SHELL instruction, early failure on a non-zero exit code is ignored if && or || follow the command, such that false && true would still produce exit code 1 but not bail even though the && branch was not executed.
As your intent is to bail early but not fail the RUN layer so the build continues if the feature isn't enabled, switching to HereDoc with the conditional skip added to the start would be better / clearer to grok?:
# Skip this step if the expected feature is missing:
if grep -q 'mkl' <<< "${WITH_FEATURES}"; then
exit 0
fi
# Feature enablement here...Slightly more verbose, but far more maintainer friendly?
If the if statement spanning several lines feels like a deal-breaker, you could instead do this:
# Skip this step if the expected feature is missing:
grep -qv 'mkl' <<< "${WITH_FEATURES}" && exit 0There shouldn't be much harm in adding the repo + gpg key regardless though, so I'd still suggest preferring #239
After which a follow-up PR could bring the package install back into the common RUN instruction and use a list of packages with a conditional to append the package to that list instead?
| ARG CUDA_COMPUTE_CAP=70 | ||
| ARG RAYON_NUM_THREADS=4 | ||
| ARG RUST_NUM_THREADS=4 | ||
| ARG RUSTFLAGS="-Z threads=${RUST_NUM_THREADS}" |
There was a problem hiding this comment.
Why did you move all these? They were fine where they were before, co-located with the RUN for building the project where they're relevant. These don't need to be added to each earlier RUN step.
| rm -rf /var/lib/apt/lists/* | ||
| HEREDOC | ||
|
|
||
| ARG WITH_FEATURES="cuda,cudnn,nccl,mkl,mpi" |
There was a problem hiding this comment.
Since you'd like to use this ARG in the runtime stage as well to infer if the package should be installed, I would suggest hoisting the ARG with default assignment to the the start of the Dockerfile, then leave these as ARG WITH_FEATURES to inherit that.
| HEREDOC | ||
|
|
||
| RUN curl https://sh.rustup.rs -sSf | bash -s -- -y | ||
| RUN curl https://sh.rustup.rs -sSf | bash -s -- -y && test -f /root/.cargo/bin/rustup |
There was a problem hiding this comment.
This won't be necessary when using SHELL?: #241
For example (bash subdomain instead of sh subdomain to produce failure):
SHELL ["/bin/bash", "-ex", "-o", "pipefail", "-c"]
RUN <<HEREDOC
curl -fsSL https://bash.rustup.rs | bash -s -- -y
echo 'added with HereDoc for example purposes only'
HEREDOCwould fail like this:
0.601 + curl -fsSL https://bash.rustup.rs
0.601 + bash -s -- -y
0.633 curl: (6) Could not resolve host: bash.rustup.rs
Without SHELL, the subsequent echo command would run.
Something could go wrong while the shell script piped in is run, but that's something the script itself should be responsible at detecting and returning an exit code for.
Were you running into a particular error beyond the network issue with curl that you felt was necessary to better guard against to warrant bailing early?
These changes prevent multiple commands running in any step from failing silently due to subsequent ones succeeding.