Chat Renderer¶
Unified synchronous HTTP client for the Ollama API with connection pooling.
app/chat_renderer.py¶
Unified synchronous HTTP client for the Ollama API.
- Provides two interfaces:
ChatRenderer.render()— fire-and-forget chat call that swallows errors and returnsNoneon any failure (used by the Chat Translation page, matching mud_server behaviour).ChatRenderer.generate()— same transport but raises on failure (used by the Character Description page, where the route handler maps each exception to an HTTPException).ChatRenderer.list_models()— static helper that queries/api/tagsand returns a sorted list of pulled model names.
Both generation methods use Ollama’s /api/chat endpoint with the
OpenAI-compatible messages array (system + user roles), which is what the
production MUD translation layer also uses. This ensures that any
model-behaviour differences between /api/generate (flat prompt) and
/api/chat (messages) are visible during lab testing.
Connection pooling¶
HTTP connections are reused across requests via a module-level client pool
keyed by (host, connect_timeout, read_timeout). This avoids the
overhead of a fresh TCP handshake + TLS negotiation on every call and
prevents cold-start failures when Character B fires immediately after
Character A. Call close_all_clients() at shutdown to release
pooled connections cleanly.
Sync rationale¶
The lab’s route handlers are synchronous (FastAPI runs them in a
thread-pool executor), so a blocking httpx call here does not stall the
async event loop. Using an async client would require asyncio.run() or
restructuring the handler, neither of which is worth the complexity for a
single-user tool.
Request structure sent to Ollama¶
{
"model": "<model-tag>",
"stream": false,
"keep_alive": "5m",
"messages": [
{"role": "system", "content": "<rendered system prompt>"},
{"role": "user", "content": "<ooc message>"}
],
"options": {
"temperature": <float>,
"num_predict": <int>,
"seed": <int> // only when seed is not None
}
}
The stream: false flag is required to get a single JSON response body
rather than a series of newline-delimited chunks.
The keep_alive field tells Ollama how long to keep the model loaded in
memory after responding (default "5m"). This prevents cold-start
latency on back-to-back requests (e.g. Character A then Character B).
Environment variables¶
- OLLAMA_HOST – Base URL of the Ollama server (default: http://localhost:11434).
Read once at import time so the value is consistent for the lifetime of the process.
- app.chat_renderer.close_all_clients()[source]¶
Close all pooled HTTP clients and clear the pool.
Call this during application shutdown to release TCP connections cleanly.
- class app.chat_renderer.ChatRenderer(*, host, model, timeout_seconds=120.0, temperature=0.7, seed=None, max_tokens=128, keep_alive='5m')[source]¶
Bases:
objectSynchronous Ollama client that calls the
/api/chatendpoint.Requests reuse a shared
httpx.Clientfrom a module-level pool keyed by(host, connect_timeout, read_timeout). This enables HTTP Keep-Alive across calls and avoids cold-start latency when multiple requests target the same Ollama instance in quick succession.- Parameters:
host (str) – Ollama server base URL, e.g.
'http://localhost:11434'. A trailing slash is stripped automatically./api/chatis appended internally.model (str) – Ollama model tag, e.g.
'gemma2:2b'. Must match a model that has been pulled in Ollama.timeout_seconds (float) – HTTP read timeout in seconds. Applies to waiting for the model to finish generating. Defaults to 120 s to accommodate slow hardware or large models. The connect timeout is always 10 s.
temperature (float) – Sampling temperature forwarded to Ollama’s
options.temperature. 0.0 is deterministic (greedy decoding); higher values add randomness.seed (int | None) – Optional integer forwarded to Ollama’s
options.seed. When provided, Ollama uses this as the random seed for token sampling, which makes the output reproducible for the same input. WhenNone, theseedkey is omitted from the options object and Ollama chooses its own seed.max_tokens (int) –
num_predictceiling for the generation. Ollama stops after this many tokens even if the model would continue.keep_alive (str) – Duration string telling Ollama how long to keep the model loaded in memory after responding (e.g.
"5m","1h","0"to unload immediately). Defaults to"5m"to prevent cold-start latency on back-to-back requests.
- __init__(*, host, model, timeout_seconds=120.0, temperature=0.7, seed=None, max_tokens=128, keep_alive='5m')[source]¶
- render(system_prompt, user_message)[source]¶
POST to Ollama /api/chat and return the raw response content.
Builds the request payload, sends it to
self._endpoint, and extracts the model’s response fromdata["message"]["content"].The
system_promptanduser_messageare sent as separate entries in themessagesarray using the"system"and"user"roles respectively. This matches the format used by the production MUD translation layer.No content-level validation is performed here; that is handled downstream by
OutputValidator.- Parameters:
- Returns:
The stripped
message.contentstring on success, orNoneon any of the following failure conditions:TimeoutException: Ollama took longer than
timeout_secondsto respond.ConnectError: Ollama is not reachable at the configured endpoint (wrong host, not running, firewall).
Any other exception: Unexpected HTTP or JSON parsing error.
All failure paths log a warning/error via the module logger.
Nonereturn causes the endpoint to report"fallback.api_error"in the translation result.- Return type:
str | None
- generate(system_prompt, user_message)[source]¶
POST to /api/chat; return (text, usage). Raises on any failure.
Same payload structure as
render(), but exceptions propagate to the caller instead of being caught. This matches the contract expected by the/api/generateroute handler, which maps each exception type to an HTTPException.- Parameters:
- Returns:
str— Strippedmessage.content.dict—{"prompt_eval_count": int|None, "eval_count": int|None}
- Return type:
- Raises:
httpx.HTTPStatusError – Non-2xx response from Ollama.
httpx.TimeoutException – Request timed out.
ValueError – Response is missing the
"message"key.
- static list_models(host=None)[source]¶
Sorted model names from /api/tags. Returns [] on any error.
- Parameters:
host (str | None) – Optional Ollama server base URL. When
None, the module-levelOLLAMA_HOSTconstant is used.- Returns:
Sorted list of model name strings, e.g.
["gemma2:2b", "llama3.2:1b"]. Returns an empty list if Ollama is unreachable or returns an error, allowing the frontend to degrade gracefully.- Return type: