feat(google-vertex): update model YAMLs [bot] by harshiv-26 · Pull Request #946 · truefoundry/models

harshiv-26 · 2026-05-05T13:50:17Z

Auto-generated by poc-agent for provider google-vertex.

Note

Medium Risk
Large, auto-generated sweep across many Vertex model YAMLs that changes provisioning types, limits, costs, and deprecation/status flags; mistakes could affect model selection, request validation, or pricing displays.

Overview
Updates the Google Vertex provider’s model registry YAMLs with broadly richer and more standardized metadata: adds/normalizes provisioning, status, sources, removeParams, and expands features, limits, modalities, and costs across many models.

Introduces/updates lifecycle signals (e.g., deprecationDate, isDeprecated, status transitions to deprecated/retired) and adjusts a few key specs (notably new/expanded pricing blocks for TTS/video models, added regions like global/eu/us, and token limit corrections such as higher max_tokens for claude-sonnet-4-6@default).

^{Reviewed by Cursor Bugbot for commit c01e203. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-05T13:50:29Z

/test-models

harshiv-26 · 2026-05-05T13:53:10Z

Gateway test results

Total: 263
Passed: 189
Failed: 46
Validation failed: 3
Errored: 0
Skipped: 25
Success rate: 79.41%

Provider	Model	Scenarios
`google-vertex`	`anthropic/claude-haiku-4-5@20251001`	success: tool-call, tool-call:stream, params:stream, structured-output, structured-output:stream, params, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-5`	success: tool-call, tool-call:stream, structured-output, structured-output:stream, params, params:stream, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-5@20251101`	success: parallel-tool-call, params:stream, parallel-tool-call:stream, tool-call:stream, structured-output, tool-call, params, structured-output:stream, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-6`	success: tool-call:stream, structured-output:stream, params, params:stream, tool-call, structured-output, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4-6@default`	success: structured-output, parallel-tool-call:stream, parallel-tool-call, structured-output:stream, tool-call:stream, params:stream, params, tool-call, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4@20250514`	skipped: skip-check
`google-vertex`	`anthropic/claude-sonnet-4-6`	success: structured-output:stream, tool-call:stream, structured-output, params:stream, parallel-tool-call, parallel-tool-call:stream, tool-call, params, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-sonnet-4-6@default`	success: parallel-tool-call:stream, tool-call, params, tool-call:stream, structured-output, parallel-tool-call, params:stream, structured-output:stream, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-sonnet-4@20250514`	skipped: skip-check
`google-vertex`	`deepseek-ai/deepseek-ocr-maas`	skipped: skip-check
`google-vertex`	`deepseek-ai/deepseek-v3-1`	failure: structured-output, reasoning:stream, tool-call, tool-call:stream, params:stream, params, reasoning, structured-output:stream
`google-vertex`	`deepseek-ai/deepseek-v3-2`	failure: tool-call, params:stream, structured-output, structured-output:stream, tool-call:stream, params
`google-vertex`	`gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-image`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-tts`	success: params
`google-vertex`	`gemini-2.5-pro-tts`	success: params
`google-vertex`	`gemini-3-pro-image-preview`	skipped: skip-check
`google-vertex`	`gemini-3.1-flash-image-preview`	success: params, params:stream, reasoning:stream, reasoning, params:google-genai, params:stream:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`gemini-embedding-001`	success: params
`google-vertex`	`gemini-embedding-2-preview`	failure: params
`google-vertex`	`gemini-live-2.5-flash-native-audio`	skipped: skip-check
`google-vertex`	`google/content-moderation`	skipped: skip-check
`google-vertex`	`google/face-detector`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-flash`	success: tool-call, params, params:stream, tool-call:stream, json-output:stream, json-output, structured-output:stream, structured-output, reasoning:stream, reasoning, structured-output:google-genai, params:google-genai, tool-call:stream:google-genai, tool-call:google-genai, params:stream:google-genai, structured-output:stream:google-genai, json-output:google-genai, json-output:stream:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-2.5-flash-image`	success: params, params:stream, params:stream:google-genai, params:google-genai
`google-vertex`	`google/gemini-2.5-flash-lite-preview-09-2025`	success: structured-output:stream, tool-call, structured-output, json-output:stream, params, params:stream, json-output, tool-call:stream, reasoning:stream, reasoning, params:google-genai, json-output:stream:google-genai, tool-call:google-genai, tool-call:stream:google-genai, structured-output:stream:google-genai, params:stream:google-genai, structured-output:google-genai, json-output:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-3-flash-preview`	success: tool-call, tool-call:stream, json-output:stream, params:stream, json-output, structured-output:stream, structured-output, params:google-genai, params:stream:google-genai, tool-call:stream:google-genai, tool-call:google-genai, json-output:stream:google-genai, structured-output:google-genai, params, json-output:google-genai, structured-output:stream:google-genai, reasoning:stream, reasoning, reasoning:google-genai validation_failure: reasoning:stream:google-genai
`google-vertex`	`google/gemini-3-pro-image-preview`	success: params, params:stream, params:google-genai, params:stream:google-genai, reasoning, reasoning:stream, reasoning:google-genai, reasoning:stream:google-genai
`google-vertex`	`google/gemini-3.1-flash-lite-preview`	success: params, tool-call, json-output, params:stream, tool-call:stream, structured-output, json-output:stream, structured-output:stream, reasoning:stream, reasoning, params:google-genai, structured-output:stream:google-genai, params:stream:google-genai, tool-call:google-genai, tool-call:stream:google-genai, structured-output:google-genai, json-output:stream:google-genai, json-output:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-embedding-001`	success: params
`google-vertex`	`google/gemini-embedding-2-preview`	failure: params
`google-vertex`	`google/gemma4`	failure: tool-call:stream, params, structured-output:stream, params:stream, reasoning, reasoning:stream, structured-output, tool-call
`google-vertex`	`google/language-v1-analyze-entity-sentiment`	skipped: skip-check
`google-vertex`	`google/language-v1-analyze-syntax`	skipped: skip-check
`google-vertex`	`google/multimodalembedding`	success: params
`google-vertex`	`google/object-detector`	skipped: skip-check
`google-vertex`	`google/people-blur`	skipped: skip-check
`google-vertex`	`google/ppe-detector`	skipped: skip-check
`google-vertex`	`google/pretrained-form-parser`	skipped: skip-check
`google-vertex`	`google/tag-recognizer`	skipped: skip-check
`google-vertex`	`google/text-detector`	skipped: skip-check
`google-vertex`	`google/text-translation`	skipped: skip-check
`google-vertex`	`google/translate-llm`	failure: params:stream, params
`google-vertex`	`google/video-text-detection`	skipped: skip-check
`google-vertex`	`imagen-3.0-capability-001`	skipped: skip-check
`google-vertex`	`imagen-3.0-generate-001`	skipped: skip-check
`google-vertex`	`imagen-4.0-fast-generate-001`	skipped: skip-check
`google-vertex`	`minimaxai/minimax-m2`	failure: structured-output:stream, tool-call, params, structured-output, tool-call:stream, params:stream
`google-vertex`	`mistralai/codestral-2`	success: params, params:stream
`google-vertex`	`mongodb/voyage-3.5-lite`	skipped: skip-check
`google-vertex`	`mongodb/voyage-4`	skipped: skip-check
`google-vertex`	`moonshotai/kimi-k2-thinking-maas`	success: tool-call, tool-call:stream, params:stream, params, structured-output, structured-output:stream, reasoning:stream, reasoning
`google-vertex`	`openai/gpt-oss`	failure: structured-output:stream, structured-output, reasoning:stream, params:stream, params, reasoning, tool-call, tool-call:stream
`google-vertex`	`openai/gpt-oss-120b-maas`	success: params:stream, tool-call, tool-call:stream, params validation_failure: structured-output:stream, structured-output
`google-vertex`	`qwen/qwen3-coder-480b-a35b-instruct-maas`	success: tool-call:stream, structured-output, params, tool-call, params:stream, structured-output:stream
`google-vertex`	`text-multilingual-embedding-002`	success: params
`google-vertex`	`zai-org/glm-4.7`	failure: tool-call, structured-output:stream, tool-call:stream, structured-output, params:stream, params

Failures (49)

google-vertex/google/gemini-3-flash-preview — reasoning:stream:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzoqxdbco/snippet.py", line 70, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
Exception: VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/google/gemini-3-flash-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

_chunks = []
for chunk in client.models.generate_content_stream(
    model=_model_id,
    contents=contents,
    config=config,
):
    _chunks.append(chunk)
    if chunk.candidates and chunk.candidates[0].content and chunk.candidates[0].content.parts:
        for part in chunk.candidates[0].content.parts:
            if not part.text:
                continue
            if part.thought:
                print(f"[Thinking] {part.text}", end="", flush=True)
            else:
                print(part.text, end="", flush=True)

_thought_detected = False
for _chunk in _chunks:
    if not _chunk.candidates or not _chunk.candidates[0].content:
        continue
    for _part in _chunk.candidates[0].content.parts:
        if not _part.text:
            continue
        if _part.thought:
            _thought_detected = True
            print(_part.text, end="", flush=True)
        else:
            print(_part.text, end="", flush=True)

if not _thought_detected:
    _usage = getattr(_chunks[-1], "usage_metadata", None) if _chunks else None
    if _usage and getattr(_usage, "thoughts_token_count", 0):
        _thought_detected = True

if not _thought_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpj3bof7yd/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpdvh0zgha/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpqkqiohzt/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp_6fit0dm/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpf21rlwe5/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/deepseek-ai/deepseek-v3-1 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp6xkim9i6/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/deepseek-ai/deepseek-v3-1 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpoulh2ouh/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-1 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp073t17_b/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/us-central1/publishers/deepseek-ai/models/deepseek-v3-1` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-1",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/openai/gpt-oss-120b-maas — structured-output:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp0mbgl4hq/snippet.py", line 49, in <module>
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")
Exception: VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss-120b-maas",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/openai/gpt-oss-120b-maas — structured-output (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpg218o8rk/snippet.py", line 44, in <module>
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")
Exception: VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss-120b-maas",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-2 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp3ij9qq21/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-2 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmprzf_ucll/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/deepseek-ai/deepseek-v3-2 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmppcezmo7l/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-2 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpe1vt6kxc/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-2 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpfsaj1o6i/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/deepseek-ai/deepseek-v3-2 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpa6iprkre/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/deepseek-ai/models/deepseek-v3-2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/deepseek-ai-deepseek-v3-2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/google/gemma4 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpykglh178/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/google/gemma4 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp1ox5efj6/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/google/gemma4 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp_vuwb2ne/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/google/gemma4 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp6dhh7zkb/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/google/gemma4 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpsmb_ygok/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/google/gemma4 — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpr80xnwmq/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/google/gemma4 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpq_pdn5_l/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/google/gemma4 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpiyi6n9na/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/minimaxai/minimax-m2 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpgp7f_nrg/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzfjp_603/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/minimaxai/minimax-m2 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpswgz_y9z/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/minimaxai/minimax-m2 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp767ps4p8/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/minimaxai/minimax-m2 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp1oilmadg/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpa13xw1so/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/gemini-embedding-2-preview — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpxlnvr1bu/snippet.py", line 5, in <module>
    response = client.embeddings.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.embeddings.create(
    model="test-v2-vertex/gemini-embedding-2-preview",
    input="What is the capital of France?",
    encoding_format="float",
)

output = [embed.embedding for embed in response.data]
print(output)

google-vertex/google/gemini-embedding-2-preview — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpz0go4_ri/snippet.py", line 5, in <module>
    response = client.embeddings.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.embeddings.create(
    model="test-v2-vertex/google-gemini-embedding-2-preview",
    input="What is the capital of France?",
    encoding_format="float",
)

output = [embed.embedding for embed in response.data]
print(output)

google-vertex/google/translate-llm — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpkn006puy/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-translate-llm",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/google/translate-llm — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpbohnv2gm/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-translate-llm",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/openai/gpt-oss — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpprqc_wqv/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/openai/gpt-oss — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmphbinzr_g/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/openai/gpt-oss — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmps9ui8snw/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/openai/gpt-oss — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpb_nqquop/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/openai/gpt-oss — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzvqjp_rq/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/openai/gpt-oss — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpr15o92w7/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/openai/gpt-oss — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmph6xuh22i/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/openai/gpt-oss — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpe9na7jvx/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/zai-org/glm-4.7 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp__j1nz19/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/zai-org/glm-4.7 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpq8ga2wh5/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/zai-org/glm-4.7 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmphrl7b6cs/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/zai-org/glm-4.7 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp8_p8kh10/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/zai-org/glm-4.7 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpvq37v34t/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/zai-org/glm-4.7 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpukdw2258/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

Skipped (25)

google-vertex/anthropic/claude-opus-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/anthropic/claude-sonnet-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/deepseek-ai/deepseek-ocr-maas — skip-check (skipped)

Skip reason:

Single-turn model; does not support multi-turn conversations

google-vertex/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/gemini-2.5-flash-image — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-3-pro-image-preview — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-live-2.5-flash-native-audio — skip-check (skipped)

Skip reason:

unsupported mode 'realtime'

google-vertex/google/content-moderation — skip-check (skipped)

Skip reason:

unsupported mode 'moderation'

google-vertex/google/face-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/google/language-v1-analyze-entity-sentiment — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/language-v1-analyze-syntax — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/object-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/people-blur — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/ppe-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/pretrained-form-parser — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/tag-recognizer — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/google/text-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/text-translation — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/video-text-detection — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/imagen-3.0-capability-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-3.0-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-4.0-fast-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/mongodb/voyage-3.5-lite — skip-check (skipped)

Skip reason:

Provisioned model

google-vertex/mongodb/voyage-4 — skip-check (skipped)

Skip reason:

Provisioned model

github-actions · 2026-05-07T13:25:43Z

/test-models

harshiv-26 · 2026-05-07T13:27:43Z

to review: google/ onwards

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 1fda97e. Configure here.}

harshiv-26 · 2026-05-07T13:30:47Z

Gateway test results

Total: 249
Passed: 188
Failed: 32
Validation failed: 4
Errored: 0
Skipped: 25
Success rate: 83.93%

Provider	Model	Scenarios
`google-vertex`	`anthropic/claude-haiku-4-5@20251001`	success: tool-call, structured-output, tool-call:stream, params, structured-output:stream, reasoning, params:stream, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-5`	success: tool-call:stream, structured-output, structured-output:stream, params:stream, params, tool-call, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4-5@20251101`	success: parallel-tool-call:stream, tool-call, params:stream, params, tool-call:stream, structured-output:stream, parallel-tool-call, structured-output, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4-6`	success: structured-output, tool-call, tool-call:stream, params:stream, structured-output:stream, params, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-6@default`	success: params, tool-call, parallel-tool-call, tool-call:stream, params:stream, structured-output:stream, structured-output, parallel-tool-call:stream, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4@20250514`	skipped: skip-check
`google-vertex`	`anthropic/claude-sonnet-4-6`	success: tool-call:stream, parallel-tool-call, tool-call, parallel-tool-call:stream, structured-output:stream, params:stream, params, structured-output, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-sonnet-4-6@default`	success: params:stream, tool-call:stream, structured-output:stream, structured-output, params, parallel-tool-call:stream, tool-call, parallel-tool-call, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-sonnet-4@20250514`	skipped: skip-check
`google-vertex`	`deepseek-ai/deepseek-ocr-maas`	skipped: skip-check
`google-vertex`	`gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-image`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-tts`	success: params
`google-vertex`	`gemini-2.5-pro-tts`	success: params
`google-vertex`	`gemini-3-pro-image-preview`	skipped: skip-check
`google-vertex`	`gemini-3.1-flash-image-preview`	success: params, params:stream, params:google-genai, reasoning:stream, reasoning, reasoning:google-genai, reasoning:stream:google-genai, params:stream:google-genai
`google-vertex`	`gemini-embedding-001`	success: params
`google-vertex`	`gemini-embedding-2-preview`	failure: params
`google-vertex`	`gemini-live-2.5-flash-native-audio`	skipped: skip-check
`google-vertex`	`google/content-moderation`	skipped: skip-check
`google-vertex`	`google/face-detector`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-flash`	success: params, tool-call:stream, json-output:stream, params:stream, structured-output:stream, json-output, tool-call, structured-output, reasoning:stream, reasoning, tool-call:google-genai, params:stream:google-genai, json-output:google-genai, json-output:stream:google-genai, tool-call:stream:google-genai, params:google-genai, structured-output:google-genai, structured-output:stream:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-2.5-flash-image`	success: params:stream, params, params:stream:google-genai, params:google-genai
`google-vertex`	`google/gemini-2.5-flash-lite-preview-09-2025`	success: json-output, tool-call:stream, json-output:stream, structured-output:stream, structured-output, params:stream, params, tool-call, reasoning:stream, reasoning, structured-output:stream:google-genai, tool-call:stream:google-genai, json-output:stream:google-genai, structured-output:google-genai, tool-call:google-genai, json-output:google-genai, params:google-genai, params:stream:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-3-flash-preview`	success: params:stream, tool-call:stream, json-output, structured-output:stream, json-output:stream, tool-call, params:stream:google-genai, tool-call:stream:google-genai, json-output:stream:google-genai, structured-output:stream:google-genai, json-output:google-genai, params, structured-output:google-genai, params:google-genai, structured-output, reasoning, tool-call:google-genai validation_failure: reasoning:stream, reasoning:google-genai, reasoning:stream:google-genai
`google-vertex`	`google/gemini-3-pro-image-preview`	success: reasoning:stream, params:google-genai, reasoning, params:stream, params:stream:google-genai, params, reasoning:google-genai, reasoning:stream:google-genai
`google-vertex`	`google/gemini-3.1-flash-lite-preview`	success: tool-call:stream, json-output, structured-output, params, tool-call, json-output:stream, structured-output:stream, params:stream, reasoning:stream, reasoning, tool-call:google-genai, structured-output:google-genai, json-output:google-genai, json-output:stream:google-genai, structured-output:stream:google-genai, params:stream:google-genai, tool-call:stream:google-genai, params:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-embedding-001`	success: params
`google-vertex`	`google/gemini-embedding-2-preview`	failure: params
`google-vertex`	`google/gemma4`	failure: tool-call, reasoning:stream, structured-output:stream, params:stream, params, structured-output, reasoning, tool-call:stream
`google-vertex`	`google/language-v1-analyze-entity-sentiment`	skipped: skip-check
`google-vertex`	`google/language-v1-analyze-syntax`	skipped: skip-check
`google-vertex`	`google/multimodalembedding`	success: params
`google-vertex`	`google/object-detector`	skipped: skip-check
`google-vertex`	`google/people-blur`	skipped: skip-check
`google-vertex`	`google/ppe-detector`	skipped: skip-check
`google-vertex`	`google/pretrained-form-parser`	skipped: skip-check
`google-vertex`	`google/tag-recognizer`	skipped: skip-check
`google-vertex`	`google/text-detector`	skipped: skip-check
`google-vertex`	`google/text-translation`	skipped: skip-check
`google-vertex`	`google/translate-llm`	failure: params, params:stream
`google-vertex`	`google/video-text-detection`	skipped: skip-check
`google-vertex`	`imagen-3.0-capability-001`	skipped: skip-check
`google-vertex`	`imagen-3.0-generate-001`	skipped: skip-check
`google-vertex`	`imagen-4.0-fast-generate-001`	skipped: skip-check
`google-vertex`	`minimaxai/minimax-m2`	failure: structured-output:stream, tool-call:stream, params, tool-call, params:stream, structured-output
`google-vertex`	`mistralai/codestral-2`	success: params:stream, params
`google-vertex`	`mongodb/voyage-3.5-lite`	skipped: skip-check
`google-vertex`	`mongodb/voyage-4`	skipped: skip-check
`google-vertex`	`moonshotai/kimi-k2-thinking-maas`	success: params:stream, tool-call, tool-call:stream, params, structured-output:stream, structured-output, reasoning:stream, reasoning
`google-vertex`	`openai/gpt-oss`	failure: structured-output, tool-call:stream, tool-call, structured-output:stream, reasoning, reasoning:stream, params, params:stream
`google-vertex`	`openai/gpt-oss-120b-maas`	success: tool-call:stream, structured-output, tool-call, params, params:stream validation_failure: structured-output:stream
`google-vertex`	`qwen/qwen3-coder-480b-a35b-instruct-maas`	success: tool-call:stream, tool-call, structured-output, structured-output:stream, params:stream, params
`google-vertex`	`text-multilingual-embedding-002`	success: params
`google-vertex`	`zai-org/glm-4.7`	failure: tool-call, tool-call:stream, params, structured-output:stream, structured-output, params:stream

Failures (36)

google-vertex/google/gemini-3-flash-preview — reasoning:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp74f3_5u_/snippet.py", line 35, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
Exception: VALIDATION FAILED: reasoning stream - no reasoning information in stream

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemini-3-flash-preview",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/google/gemini-3-flash-preview — reasoning:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp5qu5v_s8/snippet.py", line 67, in <module>
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
Exception: VALIDATION FAILED: reasoning - no thinking information in GenAI response

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/google/gemini-3-flash-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

response = client.models.generate_content(
    model=_model_id,
    contents=contents,
    config=config,
)

for part in response.candidates[0].content.parts:
    if not part.text:
        continue
    if part.thought:
        print(f"[Thinking] {part.text}")
    else:
        print(part.text)

_parts = response.candidates[0].content.parts
_thought_detected = False

for _part in _parts:
    if not _part.text:
        continue
    if _part.thought:
        _thought_detected = True
        print(f"Thinking: {_part.text[:200]}...")
    else:
        print(_part.text)

_usage = getattr(response, "usage_metadata", None)
if _usage and getattr(_usage, "thoughts_token_count", 0):
    _thought_detected = True

if not _thought_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/google/gemini-3-flash-preview — reasoning:stream:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp6e6lnigd/snippet.py", line 70, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
Exception: VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/google/gemini-3-flash-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

_chunks = []
for chunk in client.models.generate_content_stream(
    model=_model_id,
    contents=contents,
    config=config,
):
    _chunks.append(chunk)
    if chunk.candidates and chunk.candidates[0].content and chunk.candidates[0].content.parts:
        for part in chunk.candidates[0].content.parts:
            if not part.text:
                continue
            if part.thought:
                print(f"[Thinking] {part.text}", end="", flush=True)
            else:
                print(part.text, end="", flush=True)

_thought_detected = False
for _chunk in _chunks:
    if not _chunk.candidates or not _chunk.candidates[0].content:
        continue
    for _part in _chunk.candidates[0].content.parts:
        if not _part.text:
            continue
        if _part.thought:
            _thought_detected = True
            print(_part.text, end="", flush=True)
        else:
            print(_part.text, end="", flush=True)

if not _thought_detected:
    _usage = getattr(_chunks[-1], "usage_metadata", None) if _chunks else None
    if _usage and getattr(_usage, "thoughts_token_count", 0):
        _thought_detected = True

if not _thought_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/openai/gpt-oss-120b-maas — structured-output:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmptctfqb_5/snippet.py", line 49, in <module>
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")
Exception: VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss-120b-maas",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/google/translate-llm — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpyfslg5u0/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-translate-llm",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/google/translate-llm — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpbjbmuyv6/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/translate-llm` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-translate-llm",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/minimaxai/minimax-m2 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp7j_xlwng/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpwyhgjass/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpttt5vemb/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/minimaxai/minimax-m2 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmptzhbgw6m/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/minimaxai/minimax-m2 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp06pg5tlr/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/minimaxai/minimax-m2 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpe12z88u7/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/google/gemma4 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp_06_accr/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/google/gemma4 — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpdgj2wgj6/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/google/gemma4 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp8b9a8z6t/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/google/gemma4 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpci1aqb_3/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/google/gemma4 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp7623f9ft/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/google/gemma4 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpfqw9b1jt/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/google/gemma4 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmptfeomh70/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/google/gemma4 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpbjur9_e1/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/openai/gpt-oss — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpqjmm8anq/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/openai/gpt-oss — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp4exo4rtl/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/openai/gpt-oss — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp2ce1vpnp/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/openai/gpt-oss — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmptg8wqw4d/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/openai/gpt-oss — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp5xe1qoqq/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/openai/gpt-oss — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpe9gxrl_4/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/openai/gpt-oss — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp4ul067nf/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/openai/gpt-oss — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp7puakpa8/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/gemini-embedding-2-preview — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp3w9m0s80/snippet.py", line 5, in <module>
    response = client.embeddings.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.embeddings.create(
    model="test-v2-vertex/gemini-embedding-2-preview",
    input="What is the capital of France?",
    encoding_format="float",
)

output = [embed.embedding for embed in response.data]
print(output)

google-vertex/google/gemini-embedding-2-preview — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp7oiq99rr/snippet.py", line 5, in <module>
    response = client.embeddings.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemini-embedding-2-preview` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.embeddings.create(
    model="test-v2-vertex/google-gemini-embedding-2-preview",
    input="What is the capital of France?",
    encoding_format="float",
)

output = [embed.embedding for embed in response.data]
print(output)

google-vertex/zai-org/glm-4.7 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpazz7w7sg/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/zai-org/glm-4.7 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpc8_01hgf/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/zai-org/glm-4.7 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp0xa_9vpp/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/zai-org/glm-4.7 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzk19gcnz/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/zai-org/glm-4.7 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpr5dgxp60/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/zai-org/glm-4.7 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp_0iiwwm5/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

Skipped (25)

google-vertex/anthropic/claude-opus-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/anthropic/claude-sonnet-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/deepseek-ai/deepseek-ocr-maas — skip-check (skipped)

Skip reason:

Single-turn model; does not support multi-turn conversations

google-vertex/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/gemini-2.5-flash-image — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-3-pro-image-preview — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-live-2.5-flash-native-audio — skip-check (skipped)

Skip reason:

unsupported mode 'realtime'

google-vertex/google/content-moderation — skip-check (skipped)

Skip reason:

unsupported mode 'moderation'

google-vertex/google/face-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/google/language-v1-analyze-entity-sentiment — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/language-v1-analyze-syntax — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/object-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/people-blur — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/ppe-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/pretrained-form-parser — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/tag-recognizer — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/google/text-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/text-translation — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/video-text-detection — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/imagen-3.0-capability-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-3.0-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-4.0-fast-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/mongodb/voyage-3.5-lite — skip-check (skipped)

Skip reason:

Provisioned model

google-vertex/mongodb/voyage-4 — skip-check (skipped)

Skip reason:

Provisioned model

github-actions · 2026-05-14T17:42:16Z

/test-models

harshiv-26 · 2026-05-14T17:42:21Z

check meta to zai inclusive

github-actions · 2026-05-14T17:44:55Z

/test-models

harshiv-26 · 2026-05-14T17:45:04Z

Gateway test results

Total: 246
Passed: 188
Failed: 28
Validation failed: 5
Errored: 0
Skipped: 25
Success rate: 85.07%

Provider	Model	Scenarios
`google-vertex`	`anthropic/claude-haiku-4-5@20251001`	success: structured-output, params:stream, tool-call:stream, tool-call, params, reasoning:stream, structured-output:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4-5`	success: structured-output, params, params:stream, reasoning, structured-output:stream, tool-call, tool-call:stream, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-5@20251101`	success: params, tool-call, structured-output, tool-call:stream, structured-output:stream, reasoning, params:stream, reasoning:stream, parallel-tool-call:stream, parallel-tool-call
`google-vertex`	`anthropic/claude-opus-4-6`	success: tool-call:stream, structured-output, tool-call, structured-output:stream, params, params:stream, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-6@default`	success: structured-output:stream, tool-call:stream, structured-output, tool-call, reasoning:stream, parallel-tool-call:stream, params:stream, reasoning, params, parallel-tool-call
`google-vertex`	`anthropic/claude-opus-4@20250514`	skipped: skip-check
`google-vertex`	`anthropic/claude-sonnet-4-6`	success: tool-call:stream, parallel-tool-call, tool-call, params, structured-output:stream, parallel-tool-call:stream, params:stream, structured-output, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-sonnet-4-6@default`	success: tool-call, structured-output, parallel-tool-call:stream, reasoning:stream, reasoning, parallel-tool-call, structured-output:stream, params, params:stream, tool-call:stream
`google-vertex`	`anthropic/claude-sonnet-4@20250514`	skipped: skip-check
`google-vertex`	`deepseek-ai/deepseek-ocr-maas`	skipped: skip-check
`google-vertex`	`gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-image`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-tts`	success: params
`google-vertex`	`gemini-2.5-pro-tts`	success: params
`google-vertex`	`gemini-3-pro-image-preview`	skipped: skip-check
`google-vertex`	`gemini-3.1-flash-image-preview`	success: params:stream, params:google-genai, params, params:stream:google-genai, reasoning, reasoning:stream validation_failure: reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`gemini-embedding-001`	success: params
`google-vertex`	`gemini-embedding-2-preview`	success: params
`google-vertex`	`gemini-live-2.5-flash-native-audio`	skipped: skip-check
`google-vertex`	`google/content-moderation`	skipped: skip-check
`google-vertex`	`google/face-detector`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-flash`	success: tool-call:stream, json-output:google-genai, structured-output:stream, tool-call:stream:google-genai, structured-output:stream:google-genai, params, tool-call, json-output:stream:google-genai, params:stream, tool-call:google-genai, json-output, reasoning:stream, params:stream:google-genai, reasoning, structured-output, json-output:stream, structured-output:google-genai, params:google-genai, reasoning:google-genai, reasoning:stream:google-genai
`google-vertex`	`google/gemini-2.5-flash-image`	success: params, params:google-genai, params:stream:google-genai, params:stream
`google-vertex`	`google/gemini-2.5-flash-lite-preview-09-2025`	success: params, json-output, reasoning:stream, json-output:stream, structured-output, params:stream, tool-call, tool-call:google-genai, params:google-genai, json-output:stream:google-genai, structured-output:stream, tool-call:stream, structured-output:stream:google-genai, tool-call:stream:google-genai, reasoning, json-output:google-genai, structured-output:google-genai, params:stream:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-3-flash-preview`	success: tool-call:stream, structured-output:stream, params, json-output:stream, tool-call, params:stream, json-output, structured-output, reasoning, tool-call:stream:google-genai, params:google-genai, structured-output:google-genai, tool-call:google-genai, params:stream:google-genai, structured-output:stream:google-genai, json-output:stream:google-genai, json-output:google-genai validation_failure: reasoning:stream, reasoning:google-genai, reasoning:stream:google-genai
`google-vertex`	`google/gemini-3-pro-image-preview`	success: params:stream, params:google-genai, params:stream:google-genai, params, reasoning:stream:google-genai, reasoning, reasoning:google-genai, reasoning:stream
`google-vertex`	`google/gemini-3.1-flash-lite-preview`	success: params, tool-call:stream, params:google-genai, structured-output:google-genai, structured-output:stream, json-output:stream, params:stream, json-output, reasoning, structured-output, json-output:google-genai, params:stream:google-genai, json-output:stream:google-genai, reasoning:stream:google-genai, tool-call:stream:google-genai, tool-call, tool-call:google-genai, structured-output:stream:google-genai, reasoning:stream, reasoning:google-genai
`google-vertex`	`google/gemini-embedding-001`	success: params
`google-vertex`	`google/gemma4`	failure: tool-call, reasoning, params:stream, structured-output, params, reasoning:stream, structured-output:stream, tool-call:stream
`google-vertex`	`google/language-v1-analyze-entity-sentiment`	skipped: skip-check
`google-vertex`	`google/language-v1-analyze-syntax`	skipped: skip-check
`google-vertex`	`google/object-detector`	skipped: skip-check
`google-vertex`	`google/ppe-detector`	skipped: skip-check
`google-vertex`	`google/pretrained-form-parser`	skipped: skip-check
`google-vertex`	`google/tag-recognizer`	skipped: skip-check
`google-vertex`	`google/text-detector`	skipped: skip-check
`google-vertex`	`google/text-multilingual-embedding-002`	success: params
`google-vertex`	`google/text-translation`	skipped: skip-check
`google-vertex`	`google/translate-llm`	skipped: skip-check
`google-vertex`	`google/video-text-detection`	skipped: skip-check
`google-vertex`	`imagen-3.0-capability-001`	skipped: skip-check
`google-vertex`	`imagen-3.0-generate-001`	skipped: skip-check
`google-vertex`	`imagen-4.0-fast-generate-001`	skipped: skip-check
`google-vertex`	`minimaxai/minimax-m2`	failure: tool-call:stream, structured-output, tool-call, structured-output:stream, params, params:stream
`google-vertex`	`mistralai/codestral-2`	success: params:stream, params
`google-vertex`	`mongodb/voyage-3.5-lite`	skipped: skip-check
`google-vertex`	`mongodb/voyage-4`	skipped: skip-check
`google-vertex`	`moonshotai/kimi-k2-thinking-maas`	success: structured-output:stream, tool-call, params, structured-output, params:stream, tool-call:stream, reasoning:stream, reasoning
`google-vertex`	`openai/gpt-oss`	failure: structured-output:stream, reasoning, params, reasoning:stream, params:stream, structured-output, tool-call:stream, tool-call
`google-vertex`	`openai/gpt-oss-120b-maas`	success: structured-output:stream, params:stream, tool-call, tool-call:stream, params, structured-output
`google-vertex`	`qwen/qwen3-coder-480b-a35b-instruct-maas`	success: params, tool-call, tool-call:stream, structured-output:stream, structured-output, params:stream
`google-vertex`	`text-multilingual-embedding-002`	success: params
`google-vertex`	`zai-org/glm-4.7`	failure: structured-output, params, tool-call, tool-call:stream, params:stream, structured-output:stream

Failures (33)

google-vertex/google/gemini-3-flash-preview — reasoning:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpc8j169id/snippet.py", line 35, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
Exception: VALIDATION FAILED: reasoning stream - no reasoning information in stream

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemini-3-flash-preview",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/google/gemini-3-flash-preview — reasoning:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp8_iza35g/snippet.py", line 67, in <module>
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
Exception: VALIDATION FAILED: reasoning - no thinking information in GenAI response

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/google/gemini-3-flash-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

response = client.models.generate_content(
    model=_model_id,
    contents=contents,
    config=config,
)

for part in response.candidates[0].content.parts:
    if not part.text:
        continue
    if part.thought:
        print(f"[Thinking] {part.text}")
    else:
        print(part.text)

_parts = response.candidates[0].content.parts
_thought_detected = False

for _part in _parts:
    if not _part.text:
        continue
    if _part.thought:
        _thought_detected = True
        print(f"Thinking: {_part.text[:200]}...")
    else:
        print(_part.text)

_usage = getattr(response, "usage_metadata", None)
if _usage and getattr(_usage, "thoughts_token_count", 0):
    _thought_detected = True

if not _thought_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/google/gemini-3-flash-preview — reasoning:stream:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpy3i57az7/snippet.py", line 70, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
Exception: VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/google/gemini-3-flash-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

_chunks = []
for chunk in client.models.generate_content_stream(
    model=_model_id,
    contents=contents,
    config=config,
):
    _chunks.append(chunk)
    if chunk.candidates and chunk.candidates[0].content and chunk.candidates[0].content.parts:
        for part in chunk.candidates[0].content.parts:
            if not part.text:
                continue
            if part.thought:
                print(f"[Thinking] {part.text}", end="", flush=True)
            else:
                print(part.text, end="", flush=True)

_thought_detected = False
for _chunk in _chunks:
    if not _chunk.candidates or not _chunk.candidates[0].content:
        continue
    for _part in _chunk.candidates[0].content.parts:
        if not _part.text:
            continue
        if _part.thought:
            _thought_detected = True
            print(_part.text, end="", flush=True)
        else:
            print(_part.text, end="", flush=True)

if not _thought_detected:
    _usage = getattr(_chunks[-1], "usage_metadata", None) if _chunks else None
    if _usage and getattr(_usage, "thoughts_token_count", 0):
        _thought_detected = True

if not _thought_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/gemini-3.1-flash-image-preview — reasoning:stream:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp9pcdaz03/snippet.py", line 70, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
Exception: VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/gemini-3.1-flash-image-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

_chunks = []
for chunk in client.models.generate_content_stream(
    model=_model_id,
    contents=contents,
    config=config,
):
    _chunks.append(chunk)
    if chunk.candidates and chunk.candidates[0].content and chunk.candidates[0].content.parts:
        for part in chunk.candidates[0].content.parts:
            if not part.text:
                continue
            if part.thought:
                print(f"[Thinking] {part.text}", end="", flush=True)
            else:
                print(part.text, end="", flush=True)

_thought_detected = False
for _chunk in _chunks:
    if not _chunk.candidates or not _chunk.candidates[0].content:
        continue
    for _part in _chunk.candidates[0].content.parts:
        if not _part.text:
            continue
        if _part.thought:
            _thought_detected = True
            print(_part.text, end="", flush=True)
        else:
            print(_part.text, end="", flush=True)

if not _thought_detected:
    _usage = getattr(_chunks[-1], "usage_metadata", None) if _chunks else None
    if _usage and getattr(_usage, "thoughts_token_count", 0):
        _thought_detected = True

if not _thought_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/gemini-3.1-flash-image-preview — reasoning:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp6zv2c4r_/snippet.py", line 67, in <module>
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
Exception: VALIDATION FAILED: reasoning - no thinking information in GenAI response

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/gemini-3.1-flash-image-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

response = client.models.generate_content(
    model=_model_id,
    contents=contents,
    config=config,
)

for part in response.candidates[0].content.parts:
    if not part.text:
        continue
    if part.thought:
        print(f"[Thinking] {part.text}")
    else:
        print(part.text)

_parts = response.candidates[0].content.parts
_thought_detected = False

for _part in _parts:
    if not _part.text:
        continue
    if _part.thought:
        _thought_detected = True
        print(f"Thinking: {_part.text[:200]}...")
    else:
        print(_part.text)

_usage = getattr(response, "usage_metadata", None)
if _usage and getattr(_usage, "thoughts_token_count", 0):
    _thought_detected = True

if not _thought_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/minimaxai/minimax-m2 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp6ck7dp4v/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp967a1flg/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/minimaxai/minimax-m2 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpkkf3d6c8/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/minimaxai/minimax-m2 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpdqclm9hq/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp4vad4lh4/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/minimaxai/minimax-m2 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpl1epx2dh/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/google/gemma4 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpr3gfftea/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/google/gemma4 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp5wmr9g6o/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/google/gemma4 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpfssox6sj/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/google/gemma4 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpsw0f4ip_/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/google/gemma4 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpveguyldk/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/google/gemma4 — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmputn0ozkl/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/google/gemma4 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp5sinmftf/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/google/gemma4 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpkq3i6c58/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/openai/gpt-oss — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp31apjrm6/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/openai/gpt-oss — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp1_3k47e9/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/openai/gpt-oss — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpjbrmdwxq/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/openai/gpt-oss — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpiafb7uuw/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/openai/gpt-oss — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp5v4bjg_c/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/openai/gpt-oss — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp6lsn3mzr/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/openai/gpt-oss — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp2jjrs73i/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/openai/gpt-oss — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp9cpjkgff/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/zai-org/glm-4.7 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmppf33vrl2/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/zai-org/glm-4.7 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpsjnkuurn/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/zai-org/glm-4.7 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpjv64fldp/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/zai-org/glm-4.7 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpkvchnels/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/zai-org/glm-4.7 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpzr9ss_64/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/zai-org/glm-4.7 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp1eg_lb_r/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

Skipped (25)

google-vertex/anthropic/claude-opus-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/anthropic/claude-sonnet-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/deepseek-ai/deepseek-ocr-maas — skip-check (skipped)

Skip reason:

Single-turn model; does not support multi-turn conversations

google-vertex/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/gemini-2.5-flash-image — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-3-pro-image-preview — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-live-2.5-flash-native-audio — skip-check (skipped)

Skip reason:

unsupported mode 'realtime'

google-vertex/google/content-moderation — skip-check (skipped)

Skip reason:

unsupported mode 'moderation'

google-vertex/google/face-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/google/language-v1-analyze-entity-sentiment — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/language-v1-analyze-syntax — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/object-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/ppe-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/pretrained-form-parser — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/tag-recognizer — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/google/text-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/text-translation — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/translate-llm — skip-check (skipped)

Skip reason:

Provisioned model

google-vertex/google/video-text-detection — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/imagen-3.0-capability-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-3.0-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-4.0-fast-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/mongodb/voyage-3.5-lite — skip-check (skipped)

Skip reason:

Provisioned model

google-vertex/mongodb/voyage-4 — skip-check (skipped)

Skip reason:

Provisioned model

harshiv-26 · 2026-05-14T17:47:34Z

Gateway test results

Total: 246
Passed: 189
Failed: 28
Validation failed: 4
Errored: 0
Skipped: 25
Success rate: 85.52%

Provider	Model	Scenarios
`google-vertex`	`anthropic/claude-haiku-4-5@20251001`	success: tool-call:stream, params:stream, params, structured-output:stream, reasoning, tool-call, reasoning:stream, structured-output
`google-vertex`	`anthropic/claude-opus-4-5`	success: params, tool-call, params:stream, reasoning, structured-output:stream, structured-output, tool-call:stream, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-5@20251101`	success: tool-call, params:stream, tool-call:stream, parallel-tool-call, structured-output:stream, params, reasoning, structured-output, parallel-tool-call:stream, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-6`	success: structured-output:stream, tool-call, params:stream, structured-output, params, tool-call:stream, reasoning, reasoning:stream
`google-vertex`	`anthropic/claude-opus-4-6@default`	success: params:stream, params, structured-output:stream, tool-call, parallel-tool-call, structured-output, tool-call:stream, reasoning:stream, parallel-tool-call:stream, reasoning
`google-vertex`	`anthropic/claude-opus-4@20250514`	skipped: skip-check
`google-vertex`	`anthropic/claude-sonnet-4-6`	success: tool-call:stream, structured-output:stream, tool-call, params:stream, structured-output, params, parallel-tool-call:stream, parallel-tool-call, reasoning:stream, reasoning
`google-vertex`	`anthropic/claude-sonnet-4-6@default`	success: parallel-tool-call, params:stream, structured-output, tool-call:stream, parallel-tool-call:stream, structured-output:stream, tool-call, reasoning, params, reasoning:stream
`google-vertex`	`anthropic/claude-sonnet-4@20250514`	skipped: skip-check
`google-vertex`	`deepseek-ai/deepseek-ocr-maas`	skipped: skip-check
`google-vertex`	`gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-image`	skipped: skip-check
`google-vertex`	`gemini-2.5-flash-tts`	success: params
`google-vertex`	`gemini-2.5-pro-tts`	success: params
`google-vertex`	`gemini-3-pro-image-preview`	skipped: skip-check
`google-vertex`	`gemini-3.1-flash-image-preview`	success: params, reasoning:stream, params:stream:google-genai, params:google-genai, params:stream, reasoning validation_failure: reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`gemini-embedding-001`	success: params
`google-vertex`	`gemini-embedding-2-preview`	success: params
`google-vertex`	`gemini-live-2.5-flash-native-audio`	skipped: skip-check
`google-vertex`	`google/content-moderation`	skipped: skip-check
`google-vertex`	`google/face-detector`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-computer-use-preview-10-2025`	skipped: skip-check
`google-vertex`	`google/gemini-2.5-flash`	success: params:stream, structured-output:stream, json-output:stream, json-output, structured-output, params:stream:google-genai, tool-call, tool-call:stream, reasoning, params, params:google-genai, reasoning:stream, structured-output:google-genai, structured-output:stream:google-genai, tool-call:stream:google-genai, json-output:stream:google-genai, tool-call:google-genai, json-output:google-genai, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-2.5-flash-image`	success: params:stream:google-genai, params:stream, params:google-genai, params
`google-vertex`	`google/gemini-2.5-flash-lite-preview-09-2025`	success: params, tool-call, structured-output:stream:google-genai, tool-call:stream, structured-output:stream, reasoning:stream, json-output, reasoning, json-output:stream:google-genai, structured-output, params:stream, reasoning:stream:google-genai, json-output:stream, params:stream:google-genai, params:google-genai, structured-output:google-genai, tool-call:stream:google-genai, json-output:google-genai, tool-call:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-3-flash-preview`	success: params:stream, params:google-genai, tool-call:stream:google-genai, tool-call, json-output:stream, structured-output:stream, tool-call:stream, structured-output:stream:google-genai, json-output:stream:google-genai, params:stream:google-genai, structured-output, params, json-output, tool-call:google-genai, json-output:google-genai, structured-output:google-genai, reasoning, reasoning:stream:google-genai validation_failure: reasoning:stream, reasoning:google-genai
`google-vertex`	`google/gemini-3-pro-image-preview`	success: params:stream:google-genai, params, reasoning, params:stream, params:google-genai, reasoning:stream, reasoning:stream:google-genai, reasoning:google-genai
`google-vertex`	`google/gemini-3.1-flash-lite-preview`	success: structured-output:stream, tool-call:google-genai, json-output:stream, params:stream, structured-output, structured-output:stream:google-genai, tool-call, json-output, params, reasoning:stream, params:stream:google-genai, reasoning, params:google-genai, json-output:stream:google-genai, structured-output:google-genai, tool-call:stream, tool-call:stream:google-genai, reasoning:google-genai, reasoning:stream:google-genai, json-output:google-genai
`google-vertex`	`google/gemini-embedding-001`	success: params
`google-vertex`	`google/gemma4`	failure: structured-output:stream, tool-call, tool-call:stream, reasoning:stream, params:stream, params, reasoning, structured-output
`google-vertex`	`google/language-v1-analyze-entity-sentiment`	skipped: skip-check
`google-vertex`	`google/language-v1-analyze-syntax`	skipped: skip-check
`google-vertex`	`google/object-detector`	skipped: skip-check
`google-vertex`	`google/ppe-detector`	skipped: skip-check
`google-vertex`	`google/pretrained-form-parser`	skipped: skip-check
`google-vertex`	`google/tag-recognizer`	skipped: skip-check
`google-vertex`	`google/text-detector`	skipped: skip-check
`google-vertex`	`google/text-multilingual-embedding-002`	success: params
`google-vertex`	`google/text-translation`	skipped: skip-check
`google-vertex`	`google/translate-llm`	skipped: skip-check
`google-vertex`	`google/video-text-detection`	skipped: skip-check
`google-vertex`	`imagen-3.0-capability-001`	skipped: skip-check
`google-vertex`	`imagen-3.0-generate-001`	skipped: skip-check
`google-vertex`	`imagen-4.0-fast-generate-001`	skipped: skip-check
`google-vertex`	`minimaxai/minimax-m2`	failure: params:stream, structured-output:stream, structured-output, params, tool-call:stream, tool-call
`google-vertex`	`mistralai/codestral-2`	success: params, params:stream
`google-vertex`	`mongodb/voyage-3.5-lite`	skipped: skip-check
`google-vertex`	`mongodb/voyage-4`	skipped: skip-check
`google-vertex`	`moonshotai/kimi-k2-thinking-maas`	success: tool-call, params, tool-call:stream, structured-output:stream, params:stream, reasoning:stream, structured-output, reasoning
`google-vertex`	`openai/gpt-oss`	failure: structured-output:stream, tool-call, reasoning, structured-output, params, params:stream, reasoning:stream, tool-call:stream
`google-vertex`	`openai/gpt-oss-120b-maas`	success: params, params:stream, tool-call:stream, structured-output:stream, tool-call, structured-output
`google-vertex`	`qwen/qwen3-coder-480b-a35b-instruct-maas`	success: tool-call:stream, params, tool-call, params:stream, structured-output:stream, structured-output
`google-vertex`	`text-multilingual-embedding-002`	success: params
`google-vertex`	`zai-org/glm-4.7`	failure: tool-call, params:stream, tool-call:stream, structured-output:stream, structured-output, params

Failures (32)

google-vertex/google/gemini-3-flash-preview — reasoning:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpikqeuojz/snippet.py", line 35, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
Exception: VALIDATION FAILED: reasoning stream - no reasoning information in stream

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemini-3-flash-preview",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/google/gemini-3-flash-preview — reasoning:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpp8d82pw7/snippet.py", line 67, in <module>
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
Exception: VALIDATION FAILED: reasoning - no thinking information in GenAI response

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/google/gemini-3-flash-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

response = client.models.generate_content(
    model=_model_id,
    contents=contents,
    config=config,
)

for part in response.candidates[0].content.parts:
    if not part.text:
        continue
    if part.thought:
        print(f"[Thinking] {part.text}")
    else:
        print(part.text)

_parts = response.candidates[0].content.parts
_thought_detected = False

for _part in _parts:
    if not _part.text:
        continue
    if _part.thought:
        _thought_detected = True
        print(f"Thinking: {_part.text[:200]}...")
    else:
        print(_part.text)

_usage = getattr(response, "usage_metadata", None)
if _usage and getattr(_usage, "thoughts_token_count", 0):
    _thought_detected = True

if not _thought_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/minimaxai/minimax-m2 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpogfg8lsj/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/minimaxai/minimax-m2 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpnx2ygd9d/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpc6p4zegp/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/minimaxai/minimax-m2 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp78ow8395/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/minimaxai/minimax-m2 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpbfoey8jt/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/minimaxai/minimax-m2 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpdgt0tbw0/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/minimaxai/models/minimax-m2` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/minimaxai-minimax-m2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/gemini-3.1-flash-image-preview — reasoning:stream:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp8juhhb0g/snippet.py", line 70, in <module>
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
Exception: VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/gemini-3.1-flash-image-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

_chunks = []
for chunk in client.models.generate_content_stream(
    model=_model_id,
    contents=contents,
    config=config,
):
    _chunks.append(chunk)
    if chunk.candidates and chunk.candidates[0].content and chunk.candidates[0].content.parts:
        for part in chunk.candidates[0].content.parts:
            if not part.text:
                continue
            if part.thought:
                print(f"[Thinking] {part.text}", end="", flush=True)
            else:
                print(part.text, end="", flush=True)

_thought_detected = False
for _chunk in _chunks:
    if not _chunk.candidates or not _chunk.candidates[0].content:
        continue
    for _part in _chunk.candidates[0].content.parts:
        if not _part.text:
            continue
        if _part.thought:
            _thought_detected = True
            print(_part.text, end="", flush=True)
        else:
            print(_part.text, end="", flush=True)

if not _thought_detected:
    _usage = getattr(_chunks[-1], "usage_metadata", None) if _chunks else None
    if _usage and getattr(_usage, "thoughts_token_count", 0):
        _thought_detected = True

if not _thought_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no thinking information in GenAI stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/gemini-3.1-flash-image-preview — reasoning:google-genai (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpjh3ihiu6/snippet.py", line 67, in <module>
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
Exception: VALIDATION FAILED: reasoning - no thinking information in GenAI response

Code snippet

from google import genai
from google.genai import types

_endpoint = "https://internal.devtest.truefoundry.tech/api/llm"
_api_key = "***"
_full_model = "test-v2-vertex/gemini-3.1-flash-image-preview"

_parts = _full_model.split("/")
_provider_account = _parts[0]
_model_id = "/".join(_parts[1:])
if "/" in _model_id:
    _model_id = _model_id.rsplit("/", 1)[-1]

_base_url = f"{_endpoint}/gemini/{_provider_account}/proxy"

client = genai.Client(
    api_key=_api_key,
    http_options=types.HttpOptions(base_url=_base_url),
)

contents = [
    types.Content(role="user", parts=[types.Part.from_text(text="Hi")]),
    types.Content(role="model", parts=[types.Part.from_text(text="Hi, how can I help you")]),
    types.Content(role="user", parts=[types.Part.from_text(text="How to calculate 3^3^3^3? Think step by step and show all reasoning.")]),
]

config = types.GenerateContentConfig(
    system_instruction="You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps.",
    thinking_config=types.ThinkingConfig(
        include_thoughts=True,
        thinking_budget=5000,
    ),
)

response = client.models.generate_content(
    model=_model_id,
    contents=contents,
    config=config,
)

for part in response.candidates[0].content.parts:
    if not part.text:
        continue
    if part.thought:
        print(f"[Thinking] {part.text}")
    else:
        print(part.text)

_parts = response.candidates[0].content.parts
_thought_detected = False

for _part in _parts:
    if not _part.text:
        continue
    if _part.thought:
        _thought_detected = True
        print(f"Thinking: {_part.text[:200]}...")
    else:
        print(_part.text)

_usage = getattr(response, "usage_metadata", None)
if _usage and getattr(_usage, "thoughts_token_count", 0):
    _thought_detected = True

if not _thought_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no thinking information in GenAI response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/openai/gpt-oss — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp49koj7cr/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/openai/gpt-oss — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpv4lrctpl/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/openai/gpt-oss — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpn8e1iiby/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/openai/gpt-oss — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpgcfrol__/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/openai/gpt-oss — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpf8d6lrgh/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/openai/gpt-oss — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmptn5f_30n/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/openai/gpt-oss — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmppaovr6zf/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/openai/gpt-oss — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp23zak2r8/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/openai/models/gpt-oss` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/openai-gpt-oss",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/google/gemma4 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpiaxnz6j3/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/google/gemma4 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp3cbfd4ro/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/google/gemma4 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp8t_af10y/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/google/gemma4 — reasoning:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp4vn4zrao/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=True,
)

_reasoning_detected = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if getattr(delta, "reasoning_content", None) is not None:
            _reasoning_detected = True
        if getattr(delta, "reasoning", None) is not None:
            _reasoning_detected = True

    _usage = getattr(chunk, "usage", None)
    if _usage is not None:
        _details = getattr(_usage, "completion_tokens_details", None)
        if _details and getattr(_details, "reasoning_tokens", 0) > 0:
            _reasoning_detected = True

if not _reasoning_detected:
    raise Exception("VALIDATION FAILED: reasoning stream - no reasoning information in stream")
print("\nVALIDATION: reasoning stream SUCCESS")

google-vertex/google/gemma4 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp47zlwyln/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/google/gemma4 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp0zb_bgcz/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

google-vertex/google/gemma4 — reasoning (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpay79frdb/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You MUST think step by step and show your reasoning. Never skip reasoning steps."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "How to calculate 3^3^3^3? Think step by step and show all reasoning."},
    ],
    reasoning_effort="medium",
    stream=False,
)

_usage = getattr(response, "usage", None)
_reasoning_detected = False

_choices = getattr(response, "choices", None)
if _choices and len(_choices) > 0:
    _message = getattr(_choices[0], "message", None)
else:
    _message = None

if _message and getattr(_message, "content", None) is not None:
    print(_message.content)

if _usage is not None:
    _output_token_details = getattr(_usage, "completion_tokens_details", None)
    if _output_token_details and getattr(_output_token_details, "reasoning_tokens", 0) > 0:
        _reasoning_detected = True
    elif getattr(_usage, "reasoning", None) is not None:
        _reasoning_detected = True

if getattr(_message, "reasoning_content", None) is not None:
    _reasoning_detected = True
elif getattr(_message, "reasoning", None) is not None:
    _reasoning_detected = True

if not _reasoning_detected:
    print("Response: ", response)
    raise Exception("VALIDATION FAILED: reasoning - no reasoning information in response")
print("VALIDATION: reasoning SUCCESS")

google-vertex/google/gemma4 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpdlf1ul8i/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/truefoundry-devtest/locations/global/publishers/google/models/gemma4` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/google-gemma4",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/zai-org/glm-4.7 — tool-call (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp7ggnr_fc/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=False,
)

_message = response.choices[0].message
if _message.tool_calls:
    for _tc in _message.tool_calls:
        print(f"Function: {_tc.function.name}")
        print(f"Arguments: {_tc.function.arguments}")
else:
    print(_message.content)

if not _message.tool_calls or len(_message.tool_calls) == 0:
    raise Exception("VALIDATION FAILED: tool-call - no tool calls in response")
print("VALIDATION: tool-call SUCCESS")

google-vertex/zai-org/glm-4.7 — params:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpk_ngevcf/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=True,
)

for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)

google-vertex/zai-org/glm-4.7 — tool-call:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmph6qw9t7q/snippet.py", line 27, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city name, e.g. London",
                    },
                },
                "required": ["location"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
]

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant with access to tools. You MUST strictly use the provided tools to answer. Never respond with plain text when a tool is available."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Use the get_weather tool to check the weather in London. You must call the tool, do not respond with plain text."},
    ],
    tools=tools,
    tool_choice="auto",
    stream=True,
)

_tool_calls_made = False
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            print(delta.content, end="", flush=True)
        if delta.tool_calls:
            _tool_calls_made = True
            for _tc in delta.tool_calls:
                if _tc.function:
                    print(_tc.function.arguments or "", end="", flush=True)

if not _tool_calls_made:
    raise Exception("VALIDATION FAILED: tool-call stream - no tool calls received")
print("\nVALIDATION: tool-call stream SUCCESS")

google-vertex/zai-org/glm-4.7 — structured-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpw5u57iqs/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: structured-output stream - no content received")

_parsed = _json.loads(_accumulated)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output stream - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output stream - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output stream - unexpected keys present: {set(_parsed.keys())}"
    )

print("\nVALIDATION: structured-output stream SUCCESS")

google-vertex/zai-org/glm-4.7 — structured-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp3ddm8pv3/snippet.py", line 21, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI
import json

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response_schema = json.loads('''{
  "title": "CalendarEvent",
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "date": { "type": "string" },
    "participants": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["name", "date", "participants"],
  "additionalProperties": false
}''')

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "Extract the event information as JSON."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday. Extract the event details as JSON."},
    ],
    response_format={"type": "json_schema", "json_schema": {"name": "CalendarEvent", "schema": response_schema}},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: structured-output - response content is empty")

_parsed = _json.loads(_content)

if "name" not in _parsed or "date" not in _parsed or "participants" not in _parsed:
    raise Exception("VALIDATION FAILED: structured-output - missing expected fields (name, date, participants)")

if not isinstance(_parsed.get("participants"), list):
    raise Exception("VALIDATION FAILED: structured-output - 'participants' is not a list, schema not enforced")

if set(_parsed.keys()) != {"name", "date", "participants"}:
    raise Exception(
        f"VALIDATION FAILED: structured-output - unexpected keys present: {set(_parsed.keys())}"
    )

print("VALIDATION: structured-output SUCCESS")

google-vertex/zai-org/glm-4.7 — params (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpxsxdqdcz/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'status': 'failure', 'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'error': {'message': 'vertex error: Publisher Model `projects/248190060486/locations/global/publishers/zai-org/models/glm-4.7` was not found or your project does not have access to it. Please ensure you are using a valid model version. For more information, see: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions', 'type': 'APIError', 'code': '404'}, 'error_origin_level': 'api_error', 'provider': 'google-vertex'}

Code snippet

from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-vertex/zai-org-glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Hi, how can I help you"},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
    stream=False,
)

print(response.choices[0].message.content)

Skipped (25)

google-vertex/anthropic/claude-opus-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/anthropic/claude-sonnet-4@20250514 — skip-check (skipped)

Skip reason:

deprecated or retired model

google-vertex/deepseek-ai/deepseek-ocr-maas — skip-check (skipped)

Skip reason:

Single-turn model; does not support multi-turn conversations

google-vertex/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/gemini-2.5-flash-image — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-3-pro-image-preview — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/gemini-live-2.5-flash-native-audio — skip-check (skipped)

Skip reason:

unsupported mode 'realtime'

google-vertex/google/content-moderation — skip-check (skipped)

Skip reason:

unsupported mode 'moderation'

google-vertex/google/face-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/gemini-2.5-computer-use-preview-10-2025 — skip-check (skipped)

Skip reason:

Requires the Computer Use tool to be enabled

google-vertex/google/language-v1-analyze-entity-sentiment — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/language-v1-analyze-syntax — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/object-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/ppe-detector — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/google/pretrained-form-parser — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/tag-recognizer — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/google/text-detector — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/text-translation — skip-check (skipped)

Skip reason:

unsupported mode 'unknown'

google-vertex/google/translate-llm — skip-check (skipped)

Skip reason:

Provisioned model

google-vertex/google/video-text-detection — skip-check (skipped)

Skip reason:

unsupported mode 'video'

google-vertex/imagen-3.0-capability-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-3.0-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/imagen-4.0-fast-generate-001 — skip-check (skipped)

Skip reason:

unsupported mode 'image'

google-vertex/mongodb/voyage-3.5-lite — skip-check (skipped)

Skip reason:

Provisioned model

google-vertex/mongodb/voyage-4 — skip-check (skipped)

Skip reason:

Provisioned model

feat(google-vertex): update model YAMLs [bot]

0061e3d

cursor Bot reviewed May 5, 2026

View reviewed changes

Comment thread providers/google-vertex/google/gemini-2.5-flash-tts.yaml

revert regions and provisioning

1fda97e

cursor Bot reviewed May 7, 2026

View reviewed changes

Comment thread providers/google-vertex/gemini-2.5-pro-tts.yaml

google check

0f79d4d

lint

c01e203

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(google-vertex): update model YAMLs [bot]#946

feat(google-vertex): update model YAMLs [bot]#946
harshiv-26 wants to merge 4 commits into
mainfrom
bot/update-google-vertex-20260505-135015

harshiv-26 commented May 5, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

harshiv-26 commented May 5, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

harshiv-26 commented May 7, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

harshiv-26 commented May 7, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

harshiv-26 commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

harshiv-26 commented May 14, 2026

Uh oh!

harshiv-26 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

harshiv-26 commented May 5, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

harshiv-26 commented May 5, 2026

Gateway test results

Uh oh!

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

harshiv-26 commented May 7, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

harshiv-26 commented May 7, 2026

Gateway test results

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

harshiv-26 commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

harshiv-26 commented May 14, 2026

Gateway test results

Uh oh!

harshiv-26 commented May 14, 2026

Gateway test results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

harshiv-26 commented May 5, 2026 •

edited by cursor Bot

Loading