Skip to main content
The AI surface is the part of Studio that talks to a hosted model on every interaction. This page documents the full data path: which provider, which models, which region, what gets cached, what gets redacted, and what is structurally impossible to send.

The single provider

All model inference in Studio goes through AWS Bedrock. The desktop app does not call Anthropic’s hosted API directly. It does not call OpenAI, Google, or any other provider. There is no per-user “bring your own API key” path that bypasses Bedrock. This is a deliberate architectural choice with three properties:
  • Region locked. Bedrock calls run in us-east-1. There is no path that ships your inference data to another region.
  • AWS contractual scope. Anthropic’s terms with AWS for Bedrock prohibit training on inference data and constrain handling. Direct-to-Anthropic-API would be governed by a different contract surface.
  • One audit point. Every inference call is signed by your short-term AWS credential and observable in CloudTrail under the same account that hosts your Studio backend.

The models

Studio’s model catalog is defined at build time. The active models are:
ModelBedrock identifierContextTypical use
Claude Opus 4.6us.anthropic.claude-opus-4-6-v1200,000 tokensDefault. Long conversations, complex multi-step reasoning, procedure authoring.
Claude Sonnet 4.6us.anthropic.claude-sonnet-4-6200,000 tokensFaster than Opus, suitable for most operational work.
Claude Sonnet 4.5global.anthropic.claude-sonnet-4-5-20250929-v1:0200,000 tokensCross-region inference profile.
Claude Haiku 4.5us.anthropic.claude-haiku-4-5-20251001-v1:0200,000 tokensCheapest, fastest. Background tasks, classification, summary.
Amazon Nova Microamazon.nova-micro-v1:0(Bedrock-defined)Fallback for very small operations.
The model selection is exposed in Settings as a default, and via the /model slash command for per-conversation override. There is no open-weight or local LLM in the path. Local-only ML in Studio is limited to the embeddings model, used for semantic search; that model is small enough to run on every Studio install.

The data flow

A single Copilot turn moves through this path:
1

The user composes a prompt

Local. The prompt sits in the Electron renderer process. Attached context (active tab, terminal selection, image, voice, hosts, memories) is gathered locally.
2

Pre-send redaction

Local. A redaction pass scrubs known-secret patterns from the assembled context — Authorization headers, bearer tokens, password-like strings. The redaction is conservative: false positives mean the model sees [REDACTED], false negatives are the known limit. It does not catch a credential the operator deliberately substituted into a procedure prompt.
3

Cache assembly

Local. The system context is assembled in three tiers: global system prompt, organization-level context, session-level context. Each tier is bounded with a Bedrock cache point so the upstream cache can hit.
4

Sign and send to Bedrock

The request is signed with the user’s short-term AWS credential and sent to the Bedrock model invocation endpoint in us-east-1. TLS 1.2+ on the wire.
5

Bedrock returns streaming tokens

Bedrock streams tokens back. Studio parses tool calls as they arrive and routes them through the approval gate before execution.
6

Tool results re-enter the loop

Tool outputs become the next turn’s user message, looping back to step 2. The same redaction applies to tool output before it joins the next turn.
There is no path from a Studio renderer to a model that does not go through this flow. There is no out-of-band telemetry channel that ships prompt or response content elsewhere.

Layered prompt caching

Bedrock supports prompt caching: marking blocks of context as cacheable so subsequent calls with the same prefix don’t re-tokenize the cached portion. Studio assembles the model’s context in layers — broadly, the parts that change rarely sit before the parts that change per conversation, and the parts that change every turn sit at the end. The cacheable layers are marked so the upstream cache hits across calls; the volatile layer at the tail is what gets re-tokenized. The exact layering and breakpoint placement is a tuning surface we keep refining and don’t publish. The user-visible properties are: long conversations stay fast, repeated work in the same session is cheap, and the model’s per-call cost stays predictable as the workspace scales.

What does and doesn’t reach the model

ItemReaches model context?
Your prompt textYes.
Active tab content you attachedYes.
Terminal selection you sent to CopilotYes.
Image you attachedYes (multimodal call).
Voice transcription you sentYes (transcribed locally via AWS Transcribe).
Host inventory metadata (names, addresses, vendor)Yes, when the model needs it for a tool call.
Memories you savedYes, when retrieval surfaces them.
Procedure bodyYes, when running.
Tool descriptions and tool argument schemasYes (system prompt).
Tool call resultsYes (next user message).
Plaintext credentialsNo, by design. Credentials live in the vault; the model sees a reference, not the secret. The exception is procedure substitution, which is the known limit.
Cached plaintext DEKsNo. Never leaves the OS keychain or sidecar memory.
Other organizations’ dataNo. Cryptographic isolation; the model can’t see what your AWS credential can’t fetch.
Generated artifact sourceNo. Artifacts are stored encrypted; the model receives only what’s currently attached.

The boundary between local and cloud

Some computation in Studio is genuinely local. Some is genuinely in the cloud. The boundary is worth being explicit about:
ComputationWhere
LLM inferenceBedrock (us-east-1).
Voice transcription (when used)AWS Transcribe (us-east-1).
Image processing for multimodal callsBedrock (multimodal model).
Semantic search index (embeddings)Local. A small transformer runs inside the Go sidecar.
Knowledge graph indexingLocal. SQLite-backed in the agent.
Session recordingLocal. Stored encrypted; uploaded only when explicitly shared.
Packet capture and live diagnosticsLocal. Never leaves the device unless you save and share the artifact.
Procedure run statePersisted to your organization’s encrypted store.
Conversation transcriptsPersisted to your organization’s encrypted store.
The pattern is: anything that has to scale or that has to coordinate (LLM, transcription, sync) goes to AWS. Anything that depends on the local network state (capture, discovery, terminal session, embeddings) stays on your device.

Extended thinking

Anthropic models in Studio support extended thinking — internal reasoning that the model performs before producing a visible response. When extended thinking is enabled (configurable per conversation and per sub-agent), the thinking trace streams to the right-side panel so you can see what the model is reasoning about. Thinking content is not stored in the conversation transcript by default. It is observable while the run happens; the saved record is the visible response and the tool calls.

Compaction and context management

Conversations grow. When the context window approaches the model limit, Studio runs a compaction pass:
  • Old turns are summarized.
  • Tool outputs are kept in summary form.
  • Pinned context (host inventory snippets, attached artifacts) is preserved.
  • The compacted history is the new starting point for subsequent turns.
Compaction is triggered automatically when the context exceeds a threshold below the model limit, so a turn never fails for size. The /compact slash command runs the same pipeline manually when you want to free room before a heavy turn.

What we do not do

  • No fine-tuning on your data. Studio does not run Anthropic fine-tuning on your conversations, your procedures, your memories, or your tool outputs. Bedrock-side, Anthropic does not train on inference traffic.
  • No off-region failover. The Bedrock identifier is region-pinned. A regional outage degrades availability; it does not silently route inference to another region.
  • No multi-provider mux. There is no path that ships prompts to a non-Bedrock provider. If a future Studio version adds one, it will be opt-in, region-disclosed, and documented.
  • No “improve our models with your data” toggle. It does not exist. Your data is not training material.

Human in the loop

The classifier and approval gate that decides what tool calls actually run.

Known limits

Including AI context scrubbing of substituted secrets — the most important honest limit on this page.