Skip to content

Real-time interaction: streaming and concurrency

Maps to: Real-time interaction: streaming, concurrency control (double-texting).

Scope

Partial output to clients during long runs, long-lived thread streams, reconnect without gaps, and policies when users send overlapping messages.

Design questions

  • Stream granularity: tokens, graph deltas, custom events, or combined modes.
  • Client reconnect via last-event identifiers versus full replay policies.
  • Double-texting strategy: enqueue, reject, interrupt, or rollback, and UI copy for each.
  • Cleanup of partial tool calls when interrupting mid-flight.

Tradeoffs

  • Interrupt-on-new-message feels responsive but risks inconsistent tool side effects.
  • Enqueue is safe but can frustrate users who corrected a typo.
  • Thread streaming complexity rises when background runs and HITL share one thread.

Evaluation hooks

  • Dropped connection mid-stream resumes without duplicate or missing events.
  • Concurrent messages under each policy; assert state matches documented semantics.
  • Latency metrics from first token to final tool result.

Reference notes

See LangChain runtime article (concurrent runs figure).