Skip to content

ai

AI functionality

The ai module is available in kite (all-in-one) and kiteai. It is not available in kitecmd or kitecloud. Install via make build-ai (produces ./bin/kiteai) or download the kiteai-{os}-{arch} release binary. See AI Edition for setup.

The ai module wraps Firebase Genkit to provide one-shot generation, multi-turn chat, tool calling, and autonomous agent loops across Anthropic, OpenAI, Google AI, and Ollama.

Functions

Function Returns Description
ai.config(default_model=, api_keys=, base_urls=, timeout=) None Set module-wide defaults for provider credentials and endpoints
ai.generate(prompt, model=, system=, tools=, stream=, schema=, …) Response or StreamValue Generate a one-shot completion
ai.chat(model=, system=, tools=, history=, …) Chat Create a multi-turn chat session
ai.tool(fn, description=, params=) Tool Wrap a Starlark callable as an LLM tool
ai.run_until(chat, initial, stop_when=, max_steps=, follow_up=) Response Run a chat to completion driven by a stop predicate

Model Strings

Every model is identified by a provider/model-name string. The provider prefix selects the backend:

Prefix Example Env var for API key
anthropic/ anthropic/claude-sonnet-4-5 ANTHROPIC_API_KEY
openai/ openai/gpt-4o-mini OPENAI_API_KEY
googleai/ googleai/gemini-1.5-pro GOOGLE_API_KEY
ollama/ ollama/llama3.2 (none — local)

The Anthropic, OpenAI, and Google AI plugins read the corresponding env var at first use; set it before invoking ai.generate() / ai.chat(). For Ollama, the default endpoint is http://localhost:11434; override per-call or via ai.config(base_urls=...).

ai.config()

Sets module-wide defaults. Calling ai.config() again replaces previously set values.

ai.config(
    default_model = "anthropic/claude-sonnet-4-5",
    api_keys      = {"openai": "sk-..."},
    base_urls     = {"ollama": "http://remote-ollama:11434"},
    timeout       = "60s",
)
Kwarg Type Description
default_model string Used when ai.generate(...) or ai.chat(...) omits model=
api_keys dict Per-provider override of env-var keys. Keys are lowercase provider names (openai, anthropic, googleai)
base_urls dict Per-provider override of the endpoint URL (primarily for Ollama or OpenAI-compatible proxies)
timeout duration string Global request timeout (e.g., "30s", "2m")

ai.generate(prompt, **kwargs)

One-shot completion. The prompt positional argument is the user message; all other parameters are keyword-only.

resp = ai.generate("Summarize this changelog in 3 bullets", model="anthropic/claude-sonnet-4-5")
print(resp.text)
Kwarg Type Description
model string provider/model-name. Required unless ai.config(default_model=...) is set
system string System prompt
temperature float Sampling temperature
max_tokens int Cap on output tokens
top_p float Nucleus sampling
top_k int Top-K sampling (provider-dependent)
stop list[string] Stop sequences
api_key string Per-call override
base_url string Per-call endpoint override
stream bool Returns a StreamValue (iterable of StreamChunk) when True
schema dict JSON Schema; when set, the response's .data is parsed structured output
tools list Tool callables (plain functions or ai.tool(fn) wrappers)
max_iterations int Max tool-call rounds before halting (default 10)
on_tool_error "feedback" or "halt" Behavior when a tool raises (default "feedback" — send the error back to the model)

Response (returned when stream=False)

Attribute Type Description
.text string Assistant's text response
.model string Model that produced the response
.usage.input int Input tokens consumed
.usage.output int Output tokens generated
.usage.total int .input + .output
.data any or None Parsed structured output (only set when schema= was provided)

StreamValue (returned when stream=True)

Iterable of StreamChunk:

for chunk in ai.generate("write a haiku", model="...", stream=True):
    print(chunk.text, end="")

StreamChunk

Attribute Type Description
.text string The text delta for this chunk. Concatenating .text across all chunks reproduces the final response text

A StreamChunk is truthy when .text is non-empty and converts to its text via str().

Streaming and schema= are mutually exclusive in this version. Streaming with tools= is also not yet supported.

ai.chat(**kwargs)

Create a stateful multi-turn conversation. Returns a Chat object.

chat = ai.chat(
    model  = "anthropic/claude-sonnet-4-5",
    system = "You are a concise assistant.",
    tools  = [search_docs, run_query],
)

resp = chat.send("Find all production deployments older than 30 days")
print(resp.text)

resp = chat.send("Now delete the three oldest.")
Kwarg Type Description
model string provider/model-name. Required unless ai.config(default_model=...) is set
system string System prompt (applied to every turn)
tools list Tools available for every .send()
history list[dict] Seed the chat with prior messages (same dict shape chat.history returns). Enables resume and forking
temperature, max_tokens, top_p, top_k, stop, api_key, base_url, max_iterations, on_tool_error Same meaning as on ai.generate(); set as per-chat defaults

Chat methods and attributes

Member Description
.send(msg, stream=, schema=, tools=, …) Advance the conversation. Returns a Response. Per-call kwargs override chat defaults
.history Read-only snapshot list of message dicts. Mutating the list does not change chat state
.reset() Clear history. Defaults (model, system, tools) are preserved

History dict shape

Each entry in chat.history is a dict with these keys:

Key Present on Value
role always "user", "assistant", or "tool"
content always (may be empty on assistant tool-request turns) string
tool_name assistant tool-request and tool response string
tool_input assistant tool-request arbitrary (JSON-convertible)
tool_output tool response arbitrary
tool_error tool response when the Starlark tool raised string

Round-trip example:

old_chat = ai.chat(model="...", system="...")
old_chat.send("Hello")
# later...
new_chat = ai.chat(model="...", system="...", history=old_chat.history)

ai.tool(fn, description=, params=)

Wrap a Starlark callable as an LLM tool. In most cases you don't need to call ai.tool explicitly — passing a plain function to ai.chat(tools=[...]) or ai.generate(tools=[...]) auto-wraps it with inferred metadata.

def check_url(url):
    """Check whether a URL responds with 2xx."""
    r = http.url(url).get(timeout="5s")
    return {"status": r.status_code, "ok": r.status_code < 400}

# Either pass directly (auto-inferred):
chat = ai.chat(model="...", tools=[check_url])

# Or wrap explicitly for overrides:
tool = ai.tool(check_url,
    description = "Returns the HTTP status of a URL.",
    params = {
        "type": "object",
        "properties": {"url": {"type": "string"}},
        "required": ["url"],
    },
)
chat = ai.chat(model="...", tools=[tool])

Inference rules (when kwargs are omitted)

Piece Source
name Function name (not overridable)
description Docstring (first line); empty if no doc
params Inferred from the function signature: parameter names become required properties; types guessed from default values ("" → string, 0 → integer, True → boolean, [] → array, {} → object)

Non-Starlark callables (e.g., builtins) have no introspectable signature — you must supply both description= and params= explicitly.

Tool attributes

Attribute Description
.name The tool's name (from fn.__name__)
.description The description string

ai.run_until(chat, initial, **kwargs)

Drive a chat to completion with a stop predicate. Sends initial as the first user message, then re-sends follow_up (default "continue") each turn until stop_when(resp) returns truthy or max_steps is reached. Returns the final Response.

chat = ai.chat(
    model  = "anthropic/claude-sonnet-4-5",
    system = "You are an SRE. Say DONE when finished.",
    tools  = [check_service, restart_service],
)

resp = ai.run_until(chat,
    "The 'api' service is reportedly down. Diagnose and fix.",
    stop_when = lambda r: "DONE" in r.text,
    max_steps = 15,
)
print(resp.text)
Kwarg Type Description
stop_when callable lambda resp: bool. When truthy, return resp immediately. If omitted, run until max_steps
max_steps int Max turns (default 10). Safety cap; prevents runaway loops from unbounded spend
follow_up string User message sent on every turn after the first (default "continue")

See the agents guide for patterns built around ai.run_until.

Examples

Structured output

schema = {
    "type": "object",
    "properties": {
        "title":   {"type": "string"},
        "bullets": {"type": "array", "items": {"type": "string"}},
    },
    "required": ["title", "bullets"],
}

resp = ai.generate(
    "Summarize this RFC: ...",
    model  = "openai/gpt-4o-mini",
    schema = schema,
)
print(resp.data["title"])
for b in resp.data["bullets"]:
    printf("- %s\n", b)

Streaming

for chunk in ai.generate("write a haiku about kubernetes",
                         model="ollama/llama3.2",
                         stream=True):
    print(chunk.text, end="")

Chat with tools

def list_pods(namespace):
    """List pods in a Kubernetes namespace."""
    return [p.name for p in k8s.list("pods", namespace=namespace)]

chat = ai.chat(
    model  = "anthropic/claude-sonnet-4-5",
    system = "You help operators investigate clusters.",
    tools  = [list_pods],
)

resp = chat.send("Which pods are running in 'default'?")
print(resp.text)

Resuming a chat from history

saved = json.decode(fs.read_text("chat.json"))
chat  = ai.chat(model="anthropic/claude-sonnet-4-5", history=saved)
resp  = chat.send("Continue where we left off.")
fs.write_text("chat.json", json.encode(chat.history))