ai
AI functionality
The ai module is available in kite (all-in-one) and kiteai. It is not available in kitecmd or kitecloud. Install via make build-ai (produces ./bin/kiteai) or download the kiteai-{os}-{arch} release binary. See AI Edition for setup.
The ai module wraps Firebase Genkit to provide one-shot generation, multi-turn chat, tool calling, and autonomous agent loops across Anthropic, OpenAI, Google AI, and Ollama.
Functions¶
| Function | Returns | Description |
|---|---|---|
ai.config(default_model=, api_keys=, base_urls=, timeout=) |
None |
Set module-wide defaults for provider credentials and endpoints |
ai.generate(prompt, model=, system=, tools=, stream=, schema=, …) |
Response or StreamValue |
Generate a one-shot completion |
ai.chat(model=, system=, tools=, history=, …) |
Chat |
Create a multi-turn chat session |
ai.tool(fn, description=, params=) |
Tool |
Wrap a Starlark callable as an LLM tool |
ai.run_until(chat, initial, stop_when=, max_steps=, follow_up=) |
Response |
Run a chat to completion driven by a stop predicate |
Model Strings¶
Every model is identified by a provider/model-name string. The provider prefix selects the backend:
| Prefix | Example | Env var for API key |
|---|---|---|
anthropic/ |
anthropic/claude-sonnet-4-5 |
ANTHROPIC_API_KEY |
openai/ |
openai/gpt-4o-mini |
OPENAI_API_KEY |
googleai/ |
googleai/gemini-1.5-pro |
GOOGLE_API_KEY |
ollama/ |
ollama/llama3.2 |
(none — local) |
The Anthropic, OpenAI, and Google AI plugins read the corresponding env var at first use; set it before invoking ai.generate() / ai.chat(). For Ollama, the default endpoint is http://localhost:11434; override per-call or via ai.config(base_urls=...).
ai.config()¶
Sets module-wide defaults. Calling ai.config() again replaces previously set values.
ai.config(
default_model = "anthropic/claude-sonnet-4-5",
api_keys = {"openai": "sk-..."},
base_urls = {"ollama": "http://remote-ollama:11434"},
timeout = "60s",
)
| Kwarg | Type | Description |
|---|---|---|
default_model |
string | Used when ai.generate(...) or ai.chat(...) omits model= |
api_keys |
dict | Per-provider override of env-var keys. Keys are lowercase provider names (openai, anthropic, googleai) |
base_urls |
dict | Per-provider override of the endpoint URL (primarily for Ollama or OpenAI-compatible proxies) |
timeout |
duration string | Global request timeout (e.g., "30s", "2m") |
ai.generate(prompt, **kwargs)¶
One-shot completion. The prompt positional argument is the user message; all other parameters are keyword-only.
resp = ai.generate("Summarize this changelog in 3 bullets", model="anthropic/claude-sonnet-4-5")
print(resp.text)
| Kwarg | Type | Description |
|---|---|---|
model |
string | provider/model-name. Required unless ai.config(default_model=...) is set |
system |
string | System prompt |
temperature |
float | Sampling temperature |
max_tokens |
int | Cap on output tokens |
top_p |
float | Nucleus sampling |
top_k |
int | Top-K sampling (provider-dependent) |
stop |
list[string] | Stop sequences |
api_key |
string | Per-call override |
base_url |
string | Per-call endpoint override |
stream |
bool | Returns a StreamValue (iterable of StreamChunk) when True |
schema |
dict | JSON Schema; when set, the response's .data is parsed structured output |
tools |
list | Tool callables (plain functions or ai.tool(fn) wrappers) |
max_iterations |
int | Max tool-call rounds before halting (default 10) |
on_tool_error |
"feedback" or "halt" |
Behavior when a tool raises (default "feedback" — send the error back to the model) |
Response (returned when stream=False)¶
| Attribute | Type | Description |
|---|---|---|
.text |
string | Assistant's text response |
.model |
string | Model that produced the response |
.usage.input |
int | Input tokens consumed |
.usage.output |
int | Output tokens generated |
.usage.total |
int | .input + .output |
.data |
any or None | Parsed structured output (only set when schema= was provided) |
StreamValue (returned when stream=True)¶
Iterable of StreamChunk:
StreamChunk¶
| Attribute | Type | Description |
|---|---|---|
.text |
string | The text delta for this chunk. Concatenating .text across all chunks reproduces the final response text |
A StreamChunk is truthy when .text is non-empty and converts to its text via str().
Streaming and schema= are mutually exclusive in this version. Streaming with tools= is also not yet supported.
ai.chat(**kwargs)¶
Create a stateful multi-turn conversation. Returns a Chat object.
chat = ai.chat(
model = "anthropic/claude-sonnet-4-5",
system = "You are a concise assistant.",
tools = [search_docs, run_query],
)
resp = chat.send("Find all production deployments older than 30 days")
print(resp.text)
resp = chat.send("Now delete the three oldest.")
| Kwarg | Type | Description |
|---|---|---|
model |
string | provider/model-name. Required unless ai.config(default_model=...) is set |
system |
string | System prompt (applied to every turn) |
tools |
list | Tools available for every .send() |
history |
list[dict] | Seed the chat with prior messages (same dict shape chat.history returns). Enables resume and forking |
temperature, max_tokens, top_p, top_k, stop, api_key, base_url, max_iterations, on_tool_error |
— | Same meaning as on ai.generate(); set as per-chat defaults |
Chat methods and attributes¶
| Member | Description |
|---|---|
.send(msg, stream=, schema=, tools=, …) |
Advance the conversation. Returns a Response. Per-call kwargs override chat defaults |
.history |
Read-only snapshot list of message dicts. Mutating the list does not change chat state |
.reset() |
Clear history. Defaults (model, system, tools) are preserved |
History dict shape¶
Each entry in chat.history is a dict with these keys:
| Key | Present on | Value |
|---|---|---|
role |
always | "user", "assistant", or "tool" |
content |
always (may be empty on assistant tool-request turns) | string |
tool_name |
assistant tool-request and tool response | string |
tool_input |
assistant tool-request | arbitrary (JSON-convertible) |
tool_output |
tool response | arbitrary |
tool_error |
tool response when the Starlark tool raised | string |
Round-trip example:
old_chat = ai.chat(model="...", system="...")
old_chat.send("Hello")
# later...
new_chat = ai.chat(model="...", system="...", history=old_chat.history)
ai.tool(fn, description=, params=)¶
Wrap a Starlark callable as an LLM tool. In most cases you don't need to call ai.tool explicitly — passing a plain function to ai.chat(tools=[...]) or ai.generate(tools=[...]) auto-wraps it with inferred metadata.
def check_url(url):
"""Check whether a URL responds with 2xx."""
r = http.url(url).get(timeout="5s")
return {"status": r.status_code, "ok": r.status_code < 400}
# Either pass directly (auto-inferred):
chat = ai.chat(model="...", tools=[check_url])
# Or wrap explicitly for overrides:
tool = ai.tool(check_url,
description = "Returns the HTTP status of a URL.",
params = {
"type": "object",
"properties": {"url": {"type": "string"}},
"required": ["url"],
},
)
chat = ai.chat(model="...", tools=[tool])
Inference rules (when kwargs are omitted)¶
| Piece | Source |
|---|---|
name |
Function name (not overridable) |
description |
Docstring (first line); empty if no doc |
params |
Inferred from the function signature: parameter names become required properties; types guessed from default values ("" → string, 0 → integer, True → boolean, [] → array, {} → object) |
Non-Starlark callables (e.g., builtins) have no introspectable signature — you must supply both description= and params= explicitly.
Tool attributes¶
| Attribute | Description |
|---|---|
.name |
The tool's name (from fn.__name__) |
.description |
The description string |
ai.run_until(chat, initial, **kwargs)¶
Drive a chat to completion with a stop predicate. Sends initial as the first user message, then re-sends follow_up (default "continue") each turn until stop_when(resp) returns truthy or max_steps is reached. Returns the final Response.
chat = ai.chat(
model = "anthropic/claude-sonnet-4-5",
system = "You are an SRE. Say DONE when finished.",
tools = [check_service, restart_service],
)
resp = ai.run_until(chat,
"The 'api' service is reportedly down. Diagnose and fix.",
stop_when = lambda r: "DONE" in r.text,
max_steps = 15,
)
print(resp.text)
| Kwarg | Type | Description |
|---|---|---|
stop_when |
callable | lambda resp: bool. When truthy, return resp immediately. If omitted, run until max_steps |
max_steps |
int | Max turns (default 10). Safety cap; prevents runaway loops from unbounded spend |
follow_up |
string | User message sent on every turn after the first (default "continue") |
See the agents guide for patterns built around ai.run_until.
Examples¶
Structured output¶
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"bullets": {"type": "array", "items": {"type": "string"}},
},
"required": ["title", "bullets"],
}
resp = ai.generate(
"Summarize this RFC: ...",
model = "openai/gpt-4o-mini",
schema = schema,
)
print(resp.data["title"])
for b in resp.data["bullets"]:
printf("- %s\n", b)
Streaming¶
for chunk in ai.generate("write a haiku about kubernetes",
model="ollama/llama3.2",
stream=True):
print(chunk.text, end="")
Chat with tools¶
def list_pods(namespace):
"""List pods in a Kubernetes namespace."""
return [p.name for p in k8s.list("pods", namespace=namespace)]
chat = ai.chat(
model = "anthropic/claude-sonnet-4-5",
system = "You help operators investigate clusters.",
tools = [list_pods],
)
resp = chat.send("Which pods are running in 'default'?")
print(resp.text)