The temperature parameter on LocalLLM.__init__ is passed to Llama(...) at
construction, but current versions of llama-cpp-python apply temperature at
sampling time (per-call), not at model load. As a result, changing the value
in __init__ has no effect on output — the library's internal default takes
over for every call.
Separately, no seed is passed, so even when temperature is correctly applied,
identical responses can repeat across runs.
This affects Lesson 01's Exercise 2 ("Change the temperature in shared/llm.py
and observe the response"). The change is silently ignored.
Repro:
- Set
temperature: float = 1.5 in LocalLLM.__init__.
- Run
lesson_01_basic_chat() twice with the same prompt.
- Observe byte-identical responses.
Proposed fix:
- Store
self.temperature = temperature in __init__.
- In
generate(), fall back to self.temperature when no override is passed.
- Add
seed=-1 to the Llama(...) call so each load gets fresh randomness.
- Remove the now-unused
temperature=temperature from the Llama(...) call.
Happy to submit a PR if this is a confirmed bug.
The
temperatureparameter onLocalLLM.__init__is passed toLlama(...)atconstruction, but current versions of
llama-cpp-pythonapply temperature atsampling time (per-call), not at model load. As a result, changing the value
in
__init__has no effect on output — the library's internal default takesover for every call.
Separately, no
seedis passed, so even when temperature is correctly applied,identical responses can repeat across runs.
This affects Lesson 01's Exercise 2 ("Change the temperature in shared/llm.py
and observe the response"). The change is silently ignored.
Repro:
temperature: float = 1.5inLocalLLM.__init__.lesson_01_basic_chat()twice with the same prompt.Proposed fix:
self.temperature = temperaturein__init__.generate(), fall back toself.temperaturewhen no override is passed.seed=-1to theLlama(...)call so each load gets fresh randomness.temperature=temperaturefrom theLlama(...)call.Happy to submit a PR if this is a confirmed bug.