11 → Concurrency & tuning

Overview

What you'll build

This final step tunes the runtime for responsiveness: you’ll cap tool-call steps, tune provider options at runtime, and introduce safe concurrency so long-running I/O doesn’t freeze the chat loop.

Why it matters

Streaming models can run indefinitely if you don’t limit steps; guardrails prevent runaway tool loops.
Tool runtimes start faster when you batch initialization and reuse managed runtimes.
Carefully placed concurrency keeps the CLI responsive, while careless concurrency can corrupt storage or interleave terminal output.

Big picture

Every step so far focused on capability. This one focuses on operational quality. It prepares you to ship the CLI for real users: configurable, resilient, and performant.

Core concepts (Effect-TS)

Step Limits & Provider Options

streamChat accepts overrides for maxSteps, temperature, and maxOutputTokens. Wrap those defaults in ConfigService so you can change behaviour with /model or environment variables.

Lazy Managed Runtimes

ManagedRuntime.make delays tool initialization until the first call. Compose new tool layers once and store them—don’t rebuild per request.

Safe Concurrency

Use Effect’s combinators (Effect.all, Effect.forEach) with concurrency options for CPU-bound or I/O-bound work. Avoid parallelism when order matters (like writing session history) or when shared mutable state is involved.

Implementation

Step 1: Expose provider tuning in `VercelAI`

src/services/VercelAI.ts
const streamChat = <TOOLS extends ToolSet>(request: StreamChatRequest<TOOLS>) =>
  Effect.gen(function* () {
    const config = yield* configService.load
    const { messages, tools, maxSteps, temperature, onStepFinish } = request

    const model = yield* getModel

    return streamText({
      model,
      messages,
      tools,
      stopWhen: stepCountIs(maxSteps ?? config.maxSteps ?? 10),
      temperature: temperature ?? config.temperature,
      maxOutputTokens: config.maxTokens,
      providerOptions: normalizeProviderOptions(config) as never,
      onStepFinish,
    })
  })

Pair this with /model (Step 04) so users can adjust providers mid-session.

Step 2: Batch tool runtime setup

src/services/ToolRegistry.ts
const toolRuntimes = Effect.all({
  file: ManagedRuntime.make(fileToolsLayer),
  search: ManagedRuntime.make(searchToolsLayer),
  edit: ManagedRuntime.make(editToolsLayer),
  directory: ManagedRuntime.make(directoryToolsLayer),
  todo: ManagedRuntime.make(todoToolsLayer),
})

const toolsMap = makeAllTools(
  runtimes.file,
  runtimes.search,
  runtimes.edit,
  runtimes.directory,
  runtimes.todo,
)

return {
  tools: Effect.succeed(toolsMap),
  listToolNames: Effect.succeed(Object.keys(toolsMap).sort()),
} as const

By batching the runtime creation, you avoid sequential startup costs while keeping the code declarative.

Step 3: Stream and persist without blocking

src/chat/MessageService.ts
const assistantText =
  yield *
  handleChatStream(
    messages,
    tools,
    { maxSteps: config.maxSteps ?? 10, temperature: config.temperature },
    vercelAI,
  )

const trimmed = assistantText.trim()

yield *
  Effect.forEach(
    [
      Effect.sync(() => displayComplete()),
      trimmed.length > 0
        ? sessionStore.saveMessage(createAssistantMessage(trimmed, sessionId))
        : Effect.void,
    ],
    { concurrency: 2 },
  )

Running terminal cleanup and persistence concurrently keeps the UI snappy without risking race conditions (messages write after streaming completes, not during it).

Caution Notes

Terminal output: Don’t render markdown from multiple fibers simultaneously—it will interleave text. Keep rendering on the main loop, move storage/network I/O concurrently.
Shared resources: Anything touching Ref state (like ConfigService) should remain sequential unless you use Effect.withConcurrency(1).
Provider limits: Some providers treat high maxOutputTokens as a request for large responses; cap it based on cost.

Testing & Validation

Lower maxSteps to 1 and verify the model stops calling tools after the first invocation.
Increase temperature via /model and confirm the change applies immediately.
Start the CLI and time tool initialization before/after batching runtimes.
Simulate slow disk writes (e.g., add Effect.sleep) and confirm the UI remains responsive while messages persist.

Common Issues

Problem	Likely Cause	Fix
AI loops forever	`stopWhen` not configured	Use `stepCountIs(maxSteps ?? 10)` to cap tool calls
CLI freezes during saves	Persistence runs on same fiber as UI	Offload storage writes with `Effect.forEach` concurrency
Tool runtimes rebuilt per call	Not caching `ManagedRuntime.make`	Create runtimes once in `ToolRegistry` and reuse
Provider rejects request	Token limit too high	Respect provider-specific max tokens in `normalizeProviderOptions`

Connections

Builds on:

04 — Agent Loop — The event loop you’re optimizing
05 — Streaming Output — The renderer that benefits from these tweaks
10 — Add a Custom Tool — Multiple managed runtimes now in play

Next steps:

Experiment with caching (FileKeyValueStore) or metrics collection to monitor runtime behaviour.

Related code:

src/services/VercelAI.ts
src/services/ToolRegistry.ts
src/chat/MessageService.ts
src/services/ConfigService.ts

Overview​

What you'll build​

Why it matters​

Big picture​

Step Limits & Provider Options​

Lazy Managed Runtimes​

Safe Concurrency​

Implementation​

Implementation​

Step 1: Expose provider tuning in VercelAI​

Step 2: Batch tool runtime setup​

Step 3: Stream and persist without blocking​

Caution Notes​

Overview

What you'll build

Why it matters

Big picture

Step Limits & Provider Options

Lazy Managed Runtimes

Safe Concurrency

Implementation

Implementation

Step 1: Expose provider tuning in `VercelAI`

Step 2: Batch tool runtime setup

Step 3: Stream and persist without blocking

Caution Notes