Skip to content

temperature parameter in shared/llm.py is silently ignored at generation time #6

@howardjaw

Description

@howardjaw

The temperature parameter on LocalLLM.__init__ is passed to Llama(...) at
construction, but current versions of llama-cpp-python apply temperature at
sampling time (per-call), not at model load. As a result, changing the value
in __init__ has no effect on output — the library's internal default takes
over for every call.

Separately, no seed is passed, so even when temperature is correctly applied,
identical responses can repeat across runs.

This affects Lesson 01's Exercise 2 ("Change the temperature in shared/llm.py
and observe the response"). The change is silently ignored.

Repro:

  1. Set temperature: float = 1.5 in LocalLLM.__init__.
  2. Run lesson_01_basic_chat() twice with the same prompt.
  3. Observe byte-identical responses.

Proposed fix:

  • Store self.temperature = temperature in __init__.
  • In generate(), fall back to self.temperature when no override is passed.
  • Add seed=-1 to the Llama(...) call so each load gets fresh randomness.
  • Remove the now-unused temperature=temperature from the Llama(...) call.

Happy to submit a PR if this is a confirmed bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions