The single provider
All model inference in Studio goes through AWS Bedrock. The desktop app does not call Anthropic’s hosted API directly. It does not call OpenAI, Google, or any other provider. There is no per-user “bring your own API key” path that bypasses Bedrock. This is a deliberate architectural choice with three properties:- Region locked. Bedrock calls run in
us-east-1. There is no path that ships your inference data to another region. - AWS contractual scope. Anthropic’s terms with AWS for Bedrock prohibit training on inference data and constrain handling. Direct-to-Anthropic-API would be governed by a different contract surface.
- One audit point. Every inference call is signed by your short-term AWS credential and observable in CloudTrail under the same account that hosts your Studio backend.
The models
Studio’s model catalog is defined at build time. The active models are:| Model | Bedrock identifier | Context | Typical use |
|---|---|---|---|
| Claude Opus 4.6 | us.anthropic.claude-opus-4-6-v1 | 200,000 tokens | Default. Long conversations, complex multi-step reasoning, procedure authoring. |
| Claude Sonnet 4.6 | us.anthropic.claude-sonnet-4-6 | 200,000 tokens | Faster than Opus, suitable for most operational work. |
| Claude Sonnet 4.5 | global.anthropic.claude-sonnet-4-5-20250929-v1:0 | 200,000 tokens | Cross-region inference profile. |
| Claude Haiku 4.5 | us.anthropic.claude-haiku-4-5-20251001-v1:0 | 200,000 tokens | Cheapest, fastest. Background tasks, classification, summary. |
| Amazon Nova Micro | amazon.nova-micro-v1:0 | (Bedrock-defined) | Fallback for very small operations. |
/model slash command for per-conversation override.
There is no open-weight or local LLM in the path. Local-only ML in Studio is limited to the embeddings model, used for semantic search; that model is small enough to run on every Studio install.
The data flow
A single Copilot turn moves through this path:The user composes a prompt
Local. The prompt sits in the Electron renderer process. Attached context (active tab, terminal selection, image, voice, hosts, memories) is gathered locally.
Pre-send redaction
Local. A redaction pass scrubs known-secret patterns from the assembled context — Authorization headers, bearer tokens, password-like strings. The redaction is conservative: false positives mean the model sees
[REDACTED], false negatives are the known limit. It does not catch a credential the operator deliberately substituted into a procedure prompt.Cache assembly
Local. The system context is assembled in three tiers: global system prompt, organization-level context, session-level context. Each tier is bounded with a Bedrock cache point so the upstream cache can hit.
Sign and send to Bedrock
The request is signed with the user’s short-term AWS credential and sent to the Bedrock model invocation endpoint in
us-east-1. TLS 1.2+ on the wire.Bedrock returns streaming tokens
Bedrock streams tokens back. Studio parses tool calls as they arrive and routes them through the approval gate before execution.
Layered prompt caching
Bedrock supports prompt caching: marking blocks of context as cacheable so subsequent calls with the same prefix don’t re-tokenize the cached portion. Studio assembles the model’s context in layers — broadly, the parts that change rarely sit before the parts that change per conversation, and the parts that change every turn sit at the end. The cacheable layers are marked so the upstream cache hits across calls; the volatile layer at the tail is what gets re-tokenized. The exact layering and breakpoint placement is a tuning surface we keep refining and don’t publish. The user-visible properties are: long conversations stay fast, repeated work in the same session is cheap, and the model’s per-call cost stays predictable as the workspace scales.What does and doesn’t reach the model
| Item | Reaches model context? |
|---|---|
| Your prompt text | Yes. |
| Active tab content you attached | Yes. |
| Terminal selection you sent to Copilot | Yes. |
| Image you attached | Yes (multimodal call). |
| Voice transcription you sent | Yes (transcribed locally via AWS Transcribe). |
| Host inventory metadata (names, addresses, vendor) | Yes, when the model needs it for a tool call. |
| Memories you saved | Yes, when retrieval surfaces them. |
| Procedure body | Yes, when running. |
| Tool descriptions and tool argument schemas | Yes (system prompt). |
| Tool call results | Yes (next user message). |
| Plaintext credentials | No, by design. Credentials live in the vault; the model sees a reference, not the secret. The exception is procedure substitution, which is the known limit. |
| Cached plaintext DEKs | No. Never leaves the OS keychain or sidecar memory. |
| Other organizations’ data | No. Cryptographic isolation; the model can’t see what your AWS credential can’t fetch. |
| Generated artifact source | No. Artifacts are stored encrypted; the model receives only what’s currently attached. |
The boundary between local and cloud
Some computation in Studio is genuinely local. Some is genuinely in the cloud. The boundary is worth being explicit about:| Computation | Where |
|---|---|
| LLM inference | Bedrock (us-east-1). |
| Voice transcription (when used) | AWS Transcribe (us-east-1). |
| Image processing for multimodal calls | Bedrock (multimodal model). |
| Semantic search index (embeddings) | Local. A small transformer runs inside the Go sidecar. |
| Knowledge graph indexing | Local. SQLite-backed in the agent. |
| Session recording | Local. Stored encrypted; uploaded only when explicitly shared. |
| Packet capture and live diagnostics | Local. Never leaves the device unless you save and share the artifact. |
| Procedure run state | Persisted to your organization’s encrypted store. |
| Conversation transcripts | Persisted to your organization’s encrypted store. |
Extended thinking
Anthropic models in Studio support extended thinking — internal reasoning that the model performs before producing a visible response. When extended thinking is enabled (configurable per conversation and per sub-agent), the thinking trace streams to the right-side panel so you can see what the model is reasoning about. Thinking content is not stored in the conversation transcript by default. It is observable while the run happens; the saved record is the visible response and the tool calls.Compaction and context management
Conversations grow. When the context window approaches the model limit, Studio runs a compaction pass:- Old turns are summarized.
- Tool outputs are kept in summary form.
- Pinned context (host inventory snippets, attached artifacts) is preserved.
- The compacted history is the new starting point for subsequent turns.
/compact slash command runs the same pipeline manually when you want to free room before a heavy turn.
What we do not do
- No fine-tuning on your data. Studio does not run Anthropic fine-tuning on your conversations, your procedures, your memories, or your tool outputs. Bedrock-side, Anthropic does not train on inference traffic.
- No off-region failover. The Bedrock identifier is region-pinned. A regional outage degrades availability; it does not silently route inference to another region.
- No multi-provider mux. There is no path that ships prompts to a non-Bedrock provider. If a future Studio version adds one, it will be opt-in, region-disclosed, and documented.
- No “improve our models with your data” toggle. It does not exist. Your data is not training material.
Related
Human in the loop
The classifier and approval gate that decides what tool calls actually run.
Known limits
Including AI context scrubbing of substituted secrets — the most important honest limit on this page.