Model Configuration

Model configuration is managed through the Settings panel in the browser UI (gear icon, any view). Settings are persisted locally to .a-society/settings.json; API keys are stored separately in .a-society/secrets.json. Both files are written with mode 0600.

On first launch, the Settings panel opens automatically and blocks navigation until at least one model is configured and activated.

Model fields

Each configured model has the following fields:

Field	Description
`displayName`	A label shown in the UI. Does not affect API calls.
`providerType`	`anthropic` or `openai-compatible`.
`providerBaseUrl`	Required for `openai-compatible`. The base URL of the API endpoint (e.g. `https://api.openai.com/v1`).
`modelId`	The model identifier passed to the API (e.g. `claude-opus-4-5`, `gpt-4o`).
`contextWindow`	The model's context window in tokens. Used to calculate the auto-compaction threshold (80% of this value). Set it to match your model's actual context window.
`maxOutputTokens`	Maximum tokens per response. Defaults: 4096 for Anthropic, 8192 for OpenAI-compatible. For manual Anthropic thinking, the thinking budget must be less than this value.
`reasoning`	Reasoning configuration. See below.
`cacheTtl`	Anthropic prompt cache TTL: `5m` or `1h`. See Prompt caching below.
`supportedInputTypes`	Optional. Declare which input modalities the model supports: `image`, `audio`, `video`. Stored with the model config but not yet acted on by the runtime.

Active model

Only one model can be active at a time. The active model is used for role turns when no role-specific model has been selected, for compaction LLM calls, and for automatic role-configuration selection turns. The first model you add is automatically activated. You can switch the active model at any time from the Settings panel — the change takes effect on the next turn.

Model auto-selection

When more than one model is configured, each role can receive a role-specific model selection for the active flow. The Model selection toggle in the Settings panel controls how that decision is made:

Mode	Behavior
Manual	The runtime prompts you to choose a model for each role when that role first needs configuration in a flow.
Auto	The active model runs a configuration-selection turn and chooses one configured model for the role. If auto-selection fails, the runtime falls back to the normal manual prompt.

Model auto-selection only runs when there is a real choice to make: more than one configured model and no usable model selection already saved for that role in the current flow. With a single configured model, the runtime uses the active model without prompting.

Role-specific model selections are saved in the flow’s runtime state. If the selected model is later deleted or no longer usable, the role re-enters model selection instead of silently falling back.

Prompt caching

For Anthropic models, the Settings panel includes Prompt cache TTL with two options:

TTL	Behavior
`5m`	Default short-lived prompt cache.
`1h`	Longer-lived prompt cache. Anthropic charges a higher write rate for this cache duration.

During project role turns, the runtime adds Anthropic prompt-cache breakpoints to the system prompt and recent message context. This helps repeated role turns reuse stable context such as required readings and workflow guidance.

Prompt cache TTL does not add cache controls to OpenAI-compatible requests. Automatic role-configuration selection turns and compaction calls run in system mode, so they do not use the project-turn prompt cache by default.

Reasoning configuration

Reasoning is configured per model. The mode must be compatible with the provider:

anthropic provider: disabled, anthropic-adaptive, anthropic-manual
openai-compatible provider: disabled, openai-chat, custom-openai-compatible

`disabled`

No reasoning. The model responds with standard output only.

`openai-chat`

OpenAI reasoning models (e.g. o3, o4-mini). Sends reasoning_effort and uses max_completion_tokens instead of max_tokens.

Field	Values	Description
`effort`	`none`, `minimal`, `low`, `medium`, `high`, `xhigh`	Controls how much reasoning the model performs.

`anthropic-adaptive`

Anthropic extended thinking in adaptive mode. The model decides how much thinking to use based on the task. Sends thinking.type: "adaptive" and output_config.effort.

Field	Values	Description
`effort`	`low`, `medium`, `high`, `xhigh`, `max`	Guides the model's thinking depth.
`display`	`omitted`, `summarized`	Whether thinking content is omitted or summarized in the feed.

`anthropic-manual`

Anthropic extended thinking with an explicit token budget. You control exactly how many tokens are allocated for thinking. Sends thinking.type: "enabled" with budget_tokens.

Field	Values	Description
`effort`	`low`, `medium`, `high`, `xhigh`, `max`	Guides the model's thinking depth.
`display`	`omitted`, `summarized`	Whether thinking content is omitted or summarized in the feed.
`budgetTokens`	positive integer	Thinking token budget. Must be less than `maxOutputTokens`.

`custom-openai-compatible`

For providers that expose reasoning through non-standard API fields (e.g. DeepSeek, local models). Lets you inject arbitrary request body fields and optionally configure how the reasoning trace is rendered in the UI.

Request configuration (request):

Field	Values	Description
`tokenLimitParam`	`max_tokens`, `max_completion_tokens`	Which request field to use for the output token limit.
`extraBody`	`Record<string, unknown>`	Additional fields merged into the API request body. Cannot override reserved keys: `model`, `messages`, `stream`, `stream_options`, `tools`, `max_tokens`, `max_completion_tokens`.

Trace configuration (trace, optional):

If the provider streams a reasoning trace in the response, you can configure how it is captured and displayed.

Field	Values	Description
`responseDeltaField`	string	The field name in each streaming delta that carries the reasoning content.
`requestMessageField`	string	The field name used to replay reasoning content back to the model in subsequent messages.
`replay`	`never`, `tool-calls-only`, `always`	When to replay reasoning traces back to the model. `tool-calls-only` is the default — replays only on turns that involved tool calls.
`display`	`hidden`, `collapsed`, `expanded`	How reasoning traces appear in the UI feed.
`label`	string	Display label for the reasoning trace in the feed (e.g. `"Thinking"`).

Both responseDeltaField and requestMessageField must be set for the trace configuration to take effect.

Compaction model

Context compaction uses the same active model via a separate LLMGateway instance in system mode — no tools, no consent gate. The same reasoning configuration, maxOutputTokens, and API key apply. Compaction runs as a standard text turn and its token usage is not tracked against the role session.