> ## Documentation Index
> Fetch the complete documentation index at: https://documentation.nozle.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM Auto-Capture

> Automatically track LLM token usage for billing

Nozle's LLM wrappers intercept OpenAI and Anthropic API calls, extract token usage, and automatically send billing events — no manual tracking code needed.

Cost calculation happens **server-side** via the Go engine's [cost model](/guides/margin/cost-models) system. The SDK only sends raw token counts.

## OpenAI

```bash theme={null}
pip install nozle-sdk[openai]  # installs openai>=1.0
```

```python theme={null}
from openai import OpenAI
from nozle import Nozle, wrap_openai

nozle = Nozle(api_key="sk_live_...")
openai = wrap_openai(
    OpenAI(),
    nozle,
    customer_id="cust_123",
    feature="code_completion",   # optional: tag for entitlement tracking
    metric_code="llm_tokens",    # optional: defaults to "llm_tokens"
)

# Use OpenAI normally — tracking happens automatically
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
```

### Streaming

Streaming is fully supported. Usage is captured from the final chunk:

```python theme={null}
stream = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
# Token usage is automatically tracked after the stream completes
```

## Anthropic

```bash theme={null}
pip install nozle-sdk[anthropic]  # installs anthropic>=0.30.0
```

```python theme={null}
from anthropic import Anthropic
from nozle import Nozle, wrap_anthropic

nozle = Nozle(api_key="sk_live_...")
anthropic = wrap_anthropic(
    Anthropic(),
    nozle,
    customer_id="cust_123",
    feature="code_completion",
)

message = anthropic.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)
```

## Parameters

| Parameter     | Type | Required | Description                                    |
| ------------- | ---- | -------- | ---------------------------------------------- |
| `customer_id` | str  | Yes      | Customer to bill for this usage                |
| `metric_code` | str  | No       | Billable metric code (default: `"llm_tokens"`) |
| `feature`     | str  | No       | Feature tag for entitlement tracking           |

## What gets tracked

Each LLM call sends a single event via `nozle.track()` with these properties:

| Property        | Source         | Description                                            |
| --------------- | -------------- | ------------------------------------------------------ |
| `model`         | Response       | Model name (e.g. `gpt-4o`, `claude-sonnet-4-20250514`) |
| `input_tokens`  | Response usage | Prompt/input token count                               |
| `output_tokens` | Response usage | Completion/output token count                          |
| `latency_ms`    | Measured       | End-to-end call duration                               |
| `feature`       | wrap options   | Feature tag (if provided)                              |

<Info>
  The SDK does **not** calculate costs. The Go engine matches the `model` property against your [cost models](/guides/margin/cost-models) with `per_model` type and calculates `cost_cents` server-side. Make sure you have a cost model configured for the `llm_tokens` metric with rates for your models.
</Info>

## Privacy

Wrappers **never** capture prompt content or completion text — only metadata (model name, token counts, latency). No PII passes through the billing pipeline.

## Manual tracking

If you prefer manual control or use a provider without a wrapper, you can track LLM usage directly:

```python theme={null}
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=messages,
)

nozle.track("cust_123", "llm_tokens", metadata={
    "model": response.model,
    "input_tokens": response.usage.prompt_tokens,
    "output_tokens": response.usage.completion_tokens,
})
```
