Skip to main content
All posts

Streaming AI Responses in React 19: Building Real-Time Claude Integration Without the UX Jank

24 May 2026

Streaming AI Responses in React 19: Building Real-Time Claude Integration Without the UX Jank

Loading spinners are a lie we tell ourselves. When a user asks your AI agent something, they want to see the thinking happen. They want tokens appearing on screen in real-time, not a blank void for 3 seconds followed by a wall of text.

I spent two weeks building CitizenApp’s chat feature before I realized most Claude streaming tutorials are incomplete. They show you how to get tokens flowing, then hand-wave away the hard parts: backpressure handling, component unmounts during active streams, network interruptions mid-response, and cancellation that doesn’t leave dangling resources.

This is the post I needed when I started.

The Streaming Problem Most Tutorials Skip

Here’s what happens in a naive implementation:

  1. User sends a prompt
  2. FastAPI opens an SSE connection and streams Claude tokens
  3. Browser receives a token every 50-200ms
  4. User navigates away
  5. Component unmounts
  6. Request keeps streaming in the background, consuming memory
  7. CPU spikes because React’s trying to update an unmounted component
  8. You get 20 errors in the console

I prefer streaming over SSE (Server-Sent Events) instead of WebSockets for AI responses because:

But SSE isn’t magic. You need to actively manage the connection lifecycle.

The FastAPI Backend: Backpressure Matters

Let me show you the right way:

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
import anthropic
import asyncio
from typing import AsyncGenerator

app = FastAPI()

@app.post("/api/chat/stream")
async def stream_response(prompt: str) -> StreamingResponse:
    """Stream Claude response with proper resource cleanup."""
    
    async def generate() -> AsyncGenerator[str, None]:
        client = anthropic.Anthropic()
        
        try:
            # Use streaming parameter for token-by-token responses
            with client.messages.stream(
                model="claude-3-5-sonnet-20241022",
                max_tokens=1024,
                messages=[
                    {"role": "user", "content": prompt}
                ]
            ) as stream:
                # Critical: handle backpressure and cancellation
                for text in stream.text_stream:
                    # Yield SSE format
                    yield f"data: {text}\n\n"
                    
                    # Respect backpressure by yielding control
                    await asyncio.sleep(0)
                    
        except asyncio.CancelledError:
            # Client disconnected; cleanup happens automatically
            raise
        except anthropic.APIError as e:
            yield f"data: [ERROR] API Error: {str(e)}\n\n"
            raise
        finally:
            # Anthropic SDK handles cleanup in __exit__
            pass
    
    return StreamingResponse(
        generate(),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "Connection": "keep-alive",
            "X-Accel-Buffering": "no",  # Disable nginx buffering
        }
    )

Why await asyncio.sleep(0) is critical: It yields control back to the event loop, allowing FastAPI to check if the client disconnected. Without it, if the user closes the tab, your server doesn’t know for several seconds. This is backpressure handling—respecting what the client can consume.

The X-Accel-Buffering: no header prevents intermediate proxies (nginx, CloudFlare) from buffering the stream. You want tokens hitting the browser immediately.

React 19: The AbortController Pattern

In React 19, I avoid useEffect for streaming logic when possible (it’s not a side effect container). Instead, I treat streaming as an async operation that a user triggers:

'use client';

import { useState, useRef, useCallback } from 'react';

interface StreamMessage {
  role: 'user' | 'assistant';
  content: string;
}

export function ChatStream() {
  const [messages, setMessages] = useState<StreamMessage[]>([]);
  const [isStreaming, setIsStreaming] = useState(false);
  const abortControllerRef = useRef<AbortController | null>(null);

  const handleStreamResponse = useCallback(async (userPrompt: string) => {
    // Abort any existing stream first
    abortControllerRef.current?.abort();
    
    const controller = new AbortController();
    abortControllerRef.current = controller;

    setMessages(prev => [
      ...prev,
      { role: 'user', content: userPrompt }
    ]);
    
    setIsStreaming(true);
    let assistantMessage = '';

    try {
      const response = await fetch('/api/chat/stream', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ prompt: userPrompt }),
        signal: controller.signal, // Pass abort signal
      });

      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`);
      }

      // ReadableStream reader for token-by-token handling
      const reader = response.body?.getReader();
      if (!reader) throw new Error('No response body');

      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader.read();
        
        if (done) break;

        const chunk = decoder.decode(value, { stream: true });
        const lines = chunk.split('\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const token = line.slice(6); // Remove "data: "
            
            if (token.startsWith('[ERROR]')) {
              throw new Error(token);
            }
            
            assistantMessage += token;
            
            // Update message in-place while streaming
            setMessages(prev => {
              const newMessages = [...prev];
              const lastMessage = newMessages[newMessages.length - 1];
              
              if (lastMessage?.role === 'assistant') {
                lastMessage.content = assistantMessage;
              } else {
                newMessages.push({
                  role: 'assistant',
                  content: assistantMessage
                });
              }
              
              return newMessages;
            });
          }
        }
      }
    } catch (error) {
      if (error instanceof Error && error.name === 'AbortError') {
        // User cancelled—clean, no error message
        return;
      }
      
      setMessages(prev => [
        ...prev,
        {
          role: 'assistant',
          content: `Error: ${error instanceof Error ? error.message : 'Unknown error'}`
        }
      ]);
    } finally {
      setIsStreaming(false);
      abortControllerRef.current = null;
    }
  }, []);

  const handleCancel = useCallback(() => {
    abortControllerRef.current?.abort();
  }, []);

  return (
    <div className="flex flex-col gap-4">
      <div className="space-y-3 h-96 overflow-y-auto">
        {messages.map((msg, i) => (
          <div
            key={i}
            className={`p-3 rounded-lg ${
              msg.role === 'user'
                ? 'bg-blue-100 text-right'
                : 'bg-gray-100'
            }`}
          >
            {msg.content}
          </div>
        ))}
      </div>

      <div className="flex gap-2">
        <input
          type="text"
          placeholder="Ask something..."
          onKeyDown={(e) => {
            if (e.key === 'Enter' && !isStreaming) {
              handleStreamResponse(e.currentTarget.value);
              e.currentTarget.value = '';
            }
          }}
          disabled={isStreaming}
          className="flex-1 px-3 py-2 border rounded-lg"
        />
        {isStreaming && (
          <button
            onClick={handleCancel}
            className="px-4 py-2 bg-red-500 text-white rounded-lg"
          >
            Stop
          </button>
        )}
      </div>
    </div>
  );
}

Key patterns here:

  1. AbortController: Every stream gets its own controller. If a new prompt comes in while streaming, we abort the old one first. This prevents message corruption.

  2. Manual ReadableStream reading: Instead of relying on a library, we read chunks and parse SSE format ourselves. This gives us exact control over backpressure—we don’t add tokens to state faster than React can render them.

  3. In-place message updates: Rather than creating a new message object per token, we update the existing one. This prevents array thrashing and reduces re-renders.

  4. Graceful AbortError handling: If the user clicks “Stop” or navigates away, AbortError is thrown but it’s expected, so we don’t show an error message.

Gotcha: The Memory Leak That Bit Me

I initially didn’t use AbortController properly. Every stream would complete fully even if the user navigated away. On CitizenApp with long-running analyses, this meant:

The fix was simple (use AbortController), but it took 48 hours of profiling to realize the problem wasn’t in React—it was the browser still pulling data from the network.

One More Thing: Error Recovery

Real production systems have flaky networks. Here’s what I added after the first month:

// Add exponential backoff retry
const maxRetries = 3;
let retryCount = 0;

while (retryCount < maxRetries) {
  try {
    const response = await fetch('/api/chat/

Building something like this?

I build production-grade Python + React applications. Let's talk about your project.

Get in touch