What I was trying to do
I am building a PDF processing pipeline where I use Docling to parse PDFs and then use docling-agent to fix the heading hierarchy of the parsed DoclingDocument before applying HybridChunker for RAG chunking. For the LLM backend I am using Azure OpenAI directly (without a LiteLLM proxy) with the litellm backend type.
from docling.document_converter import DocumentConverter
from docling_agent.agents import DoclingEditingAgent, BackendConfig, ModelConfig, create_backend
import os
result = DocumentConverter().convert("document.pdf")
doc = result.document
backend = create_backend(
BackendConfig(
type="litellm",
base_url=(
f"{os.getenv('AZURE_OPENAI_ENDPOINT').rstrip('/')}"
f"/openai/deployments/{os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT')}"
),
api_key_env="AZURE_OPENAI_API_KEY",
options={"api_version": os.getenv("AZURE_OPENAI_API_VERSION")},
models=ModelConfig(
reasoning=f"azure/{os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT')}",
writing=f"azure/{os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT')}",
),
)
)
agent = DoclingEditingAgent(backend=backend, tools=[])
fixed_doc = agent.run(
task="Ensure that the section headings have the correct level.",
document=doc,
)
The Problem
Azure OpenAI requires api-version as a URL query parameter on every request:
POST https://<resource>.openai.azure.com/openai/deployments//chat/completions?api-version=2024-02-01
OpenAICompatibleSession only supports two destinations for configuration:
| Field |
Where it ends up |
base_url |
httpx client base URL |
options |
JSON request body |
There is no way to pass query parameters. I hit this wall through two failed attempts:
Attempt 1 Pass api_version via options
BackendConfig(
base_url="https://<resource>.openai.azure.com/openai/deployments/<deployment>",
options={"api_version": "2024-02-01"},
...
)
options is spread into the JSON body via **self.options at openai_compatible.py:102, so api_version lands in the request body not the URL. Azure ignores it there.
Actual error:
HTTPStatusError: Client error '404 Resource Not Found'for url
'https://<resource>.openai.azure.com/openai/deployments/gpt-4o/chat/completions'
Notice there is no ?api-version=... in the URL it silently went into the body instead.
Attempt 2 Embed ?api-version in base_url
BackendConfig(
base_url=(
"https://<resource>.openai.azure.com"
"/openai/deployments/<deployment>"
"?api-version=2024-02-01"
),
...
)
httpx appends the string "/chat/completions" literally to base_url. Since ? already terminated the path, /chat/completions becomes part of the query string value instead of the URL path producing a malformed URL.
Actual error:
HTTPStatusError: Client error '404 Resource Not Found' for url
'https://<resource>.openai.azure.com/openai/deployments/gpt-4o?api-version=2024-02-01/chat/completions'
Notice /chat/completions ended up after ?api-version=2024-02-01 completely wrong position.
Root Cause
In openai_compatible.py, instruct() hardcodes the post path with no support for query params:
# openai_compatible.py:97-104
response = self._client.post(
"/chat/completions",
json={
"model": self.model,
"messages": [message.copy() for message in self._messages],
**self.options, # only JSON body, no query params possible
},
)
The options field is documented as "backend-specific options passed through to the provider" a reasonable user would expect api_version to work there, but there is no supported code path to forward anything as a URL query parameter.
Minimal Reproduction Code
Confirmed reproducible on a clean install of docling-agent==0.5.0.
Requirements:
uv add docling-agent
Environment variables required:
AZURE_OPENAI_ENDPOINT e.g. https://<resource>.openai.azure.com
AZURE_OPENAI_CHAT_DEPLOYMENT e.g. gpt-4o
AZURE_OPENAI_API_VERSION e.g. 2024-02-01
AZURE_OPENAI_API_KEY your Azure API key
import os
from docling_core.types.doc.document import DoclingDocument
from docling_core.types.doc.labels import DocItemLabel
from docling_agent.agents import (
DoclingEditingAgent,
BackendConfig,
ModelConfig,
create_backend,
)
# Minimal DoclingDocument no real PDF needed to trigger the bug
doc = DoclingDocument(name="test")
doc.add_heading(text="Introduction", level=2)
doc.add_text(label=DocItemLabel.TEXT, text="This is the introduction section.")
doc.add_heading(text="Background", level=3)
doc.add_text(label=DocItemLabel.TEXT, text="This is background context.")
# -------------------------------------------------------------------
# Attempt 1: api_version via options{} → lands in JSON body, not URL
# Expected URL: .../chat/completions?api-version=2024-02-01
# Actual URL: .../chat/completions <- no query param
# -------------------------------------------------------------------
print("=" * 60)
print("Attempt 1: api_version via options{}")
print("=" * 60)
backend_attempt1 = create_backend(
BackendConfig(
type="litellm",
base_url=(
f"{os.getenv('AZURE_OPENAI_ENDPOINT').rstrip('/')}"
f"/openai/deployments/{os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT')}"
),
api_key_env="AZURE_OPENAI_API_KEY",
options={"api_version": os.getenv("AZURE_OPENAI_API_VERSION")},
models=ModelConfig(
reasoning=f"azure/{os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT')}",
writing=f"azure/{os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT')}",
),
)
)
try:
agent = DoclingEditingAgent(backend=backend_attempt1, tools=[])
agent.run(
task="Ensure that the section headings have the correct level.",
document=doc,
)
print("UNEXPECTED: No error raised")
except Exception as e:
print(f"EXPECTED ERROR: {type(e).__name__}: {e}")
# -------------------------------------------------------------------
# Attempt 2: ?api-version embedded in base_url → malformed URL
# Expected URL: .../chat/completions?api-version=2024-02-01
# Actual URL: ...?api-version=2024-02-01/chat/completions
# -------------------------------------------------------------------
print()
print("=" * 60)
print("Attempt 2: ?api-version embedded in base_url")
print("=" * 60)
backend_attempt2 = create_backend(
BackendConfig(
type="litellm",
base_url=(
f"{os.getenv('AZURE_OPENAI_ENDPOINT').rstrip('/')}"
f"/openai/deployments/{os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT')}"
f"?api-version={os.getenv('AZURE_OPENAI_API_VERSION')}"
),
api_key_env="AZURE_OPENAI_API_KEY",
options={},
models=ModelConfig(
reasoning=f"azure/{os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT')}",
writing=f"azure/{os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT')}",
),
)
)
try:
agent2 = DoclingEditingAgent(backend=backend_attempt2, tools=[])
agent2.run(
task="Ensure that the section headings have the correct level.",
document=doc,
)
print("UNEXPECTED: No error raised")
except Exception as e:
print(f"EXPECTED ERROR: {type(e).__name__}: {e}")
Actual output:
============================================================
Attempt 1: api_version via options{}
============================================================
EXPECTED ERROR: HTTPStatusError: Client error '404 Resource Not Found'
for url 'https://`<resource>`.openai.azure.com/openai/deployments
/gpt-4o/chat/completions'
============================================================
Attempt 2: ?api-version embedded in base_url
============================================================
EXPECTED ERROR: HTTPStatusError: Client error '404 Resource Not Found'
for url 'https://`<resource>`.openai.azure.com/openai/deployments
/gpt-4o?api-version=2024-02-01/chat/completions'
Proposed Fix
Add a general query_params: dict[str, str] field toBackendConfig and pass it to httpx.Client(params=...) at construction time. The httpx client then appends these parameters to every request URL automatically no per-request injection needed.
docling_agent/task_model.py
class BackendConfig(BaseModel):
...
options: Annotated[
dict[str, Any],
Field(description="Backend-specific options passed through to the provider."),
] = {}
+ query_params: Annotated[
+ dict[str, str],
+ Field(
+ description="Query parameters appended to every request URL (e.g. api-version for Azure OpenAI).",
+ ),
+ ] = {}
docling_agent/backends/openai_compatible.py
class OpenAICompatibleSession(BaseSession):
def __init__(
self,
*,
backend_type: str,
model: str,
system_prompt: str | None = None,
base_url: str,
timeout: float,
api_key_env: str | None = None,
options: dict[str, Any] | None = None,
+ query_params: dict[str, str] | None = None,
) -> None:
...
- self._client = httpx.Client(base_url=base_url.rstrip("/"), timeout=timeout, headers=headers)
+ self._client = httpx.Client(
+ base_url=base_url.rstrip("/"),
+ timeout=timeout,
+ headers=headers,
+ params=query_params or {},
+ )
class OpenAICompatibleBackend(BaseBackend):
def __init__(self, *, config: BackendConfig) -> None:
...
self.options = config.options or {}
+ self.query_params = config.query_params or {}
def create_session(self, *, model: str, system_prompt: str | None = None) -> BaseSession:
return OpenAICompatibleSession(
...
options=self.options,
+ query_params=self.query_params,
)
Azure usage then becomes clean and explicit:
BackendConfig(
type="litellm",
base_url="https://<resource>.openai.azure.com/openai/deployments/<deployment>",
api_key_env="AZURE_OPENAI_API_KEY",
query_params={"api-version": "2024-02-01"},
models=ModelConfig(reasoning="azure/gpt-4o", writing="azure/gpt-4o"),
)
Fully backward-compatible existing configs without query_params are unaffected. Other providers that require query params (Cloudflare AI Gateway, AWS Bedrock proxies, etc.) also benefit.
Happy to submit a PR with this approach once confirmed.
What I was trying to do
I am building a PDF processing pipeline where I use Docling to parse PDFs and then use docling-agent to fix the heading hierarchy of the parsed
DoclingDocumentbefore applyingHybridChunkerfor RAG chunking. For the LLM backend I am using Azure OpenAI directly (without a LiteLLM proxy) with thelitellmbackend type.The Problem
Azure OpenAI requires api-version as a URL query parameter on every request:
POST https://
<resource>.openai.azure.com/openai/deployments//chat/completions?api-version=2024-02-01OpenAICompatibleSession only supports two destinations for configuration:
base_urloptionsThere is no way to pass query parameters. I hit this wall through two failed attempts:
Attempt 1 Pass
api_versionviaoptionsoptionsis spread into the JSON body via**self.optionsatopenai_compatible.py:102, soapi_versionlands in the request body not the URL. Azure ignores it there.Notice there is no
?api-version=...in the URL it silently went into the body instead.Attempt 2 Embed
?api-versioninbase_urlhttpx appends the string
"/chat/completions"literally tobase_url. Since?already terminated the path,/chat/completionsbecomes part of the query string value instead of the URL path producing a malformed URL.Notice
/chat/completionsended up after?api-version=2024-02-01completely wrong position.Root Cause
In
openai_compatible.py,instruct()hardcodes the post path with no support for query params:The
optionsfield is documented as "backend-specific options passed through to the provider" a reasonable user would expectapi_versionto work there, but there is no supported code path to forward anything as a URL query parameter.Minimal Reproduction Code
Confirmed reproducible on a clean install of docling-agent==0.5.0.
Proposed Fix
Add a general
query_params: dict[str, str]field toBackendConfigand pass it tohttpx.Client(params=...)at construction time. Thehttpxclient then appends these parameters to every requestURLautomatically no per-request injection needed.docling_agent/task_model.py
docling_agent/backends/openai_compatible.py
Azure usage then becomes clean and explicit:
Fully backward-compatible existing configs without
query_paramsare unaffected. Other providers that require query params (Cloudflare AI Gateway, AWS Bedrock proxies, etc.) also benefit.Happy to submit a PR with this approach once confirmed.