LLM Auto-Capture

Nozle’s LLM wrappers intercept OpenAI and Anthropic API calls, extract token usage, and automatically send billing events — no manual tracking code needed. Cost calculation happens server-side via the Go engine’s cost model system. The SDK only sends raw token counts.

OpenAI

pip install nozle-sdk[openai]  # installs openai>=1.0

from openai import OpenAI
from nozle import Nozle, wrap_openai

nozle = Nozle(api_key="sk_live_...")
openai = wrap_openai(
    OpenAI(),
    nozle,
    customer_id="cust_123",
    feature="code_completion",   # optional: tag for entitlement tracking
    metric_code="llm_tokens",    # optional: defaults to "llm_tokens"
)

# Use OpenAI normally — tracking happens automatically
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

Streaming

Streaming is fully supported. Usage is captured from the final chunk:

stream = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
# Token usage is automatically tracked after the stream completes

Anthropic

pip install nozle-sdk[anthropic]  # installs anthropic>=0.30.0

from anthropic import Anthropic
from nozle import Nozle, wrap_anthropic

nozle = Nozle(api_key="sk_live_...")
anthropic = wrap_anthropic(
    Anthropic(),
    nozle,
    customer_id="cust_123",
    feature="code_completion",
)

message = anthropic.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

Parameters

Parameter	Type	Required	Description
`customer_id`	str	Yes	Customer to bill for this usage
`metric_code`	str	No	Billable metric code (default: `"llm_tokens"`)
`feature`	str	No	Feature tag for entitlement tracking

What gets tracked

Each LLM call sends a single event via nozle.track() with these properties:

Property	Source	Description
`model`	Response	Model name (e.g. `gpt-4o`, `claude-sonnet-4-20250514`)
`input_tokens`	Response usage	Prompt/input token count
`output_tokens`	Response usage	Completion/output token count
`latency_ms`	Measured	End-to-end call duration
`feature`	wrap options	Feature tag (if provided)

The SDK does not calculate costs. The Go engine matches the model property against your cost models with per_model type and calculates cost_cents server-side. Make sure you have a cost model configured for the llm_tokens metric with rates for your models.

Privacy

Wrappers never capture prompt content or completion text — only metadata (model name, token counts, latency). No PII passes through the billing pipeline.

Manual tracking

If you prefer manual control or use a provider without a wrapper, you can track LLM usage directly:

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=messages,
)

nozle.track("cust_123", "llm_tokens", metadata={
    "model": response.model,
    "input_tokens": response.usage.prompt_tokens,
    "output_tokens": response.usage.completion_tokens,
})

​OpenAI

​Streaming

​Anthropic

​Parameters

​What gets tracked

​Privacy

​Manual tracking

OpenAI

Streaming

Anthropic

Parameters

What gets tracked

Privacy

Manual tracking