Streaming AI Responses in React 19: Building Real-Time Claude Integration Without the UX Jank
Loading spinners are a lie we tell ourselves. When a user asks your AI agent something, they want to see the thinking happen. They want tokens appearing on screen in real-time, not a blank void for 3 seconds followed by a wall of text.
I spent two weeks building CitizenApp’s chat feature before I realized most Claude streaming tutorials are incomplete. They show you how to get tokens flowing, then hand-wave away the hard parts: backpressure handling, component unmounts during active streams, network interruptions mid-response, and cancellation that doesn’t leave dangling resources.
This is the post I needed when I started.
The Streaming Problem Most Tutorials Skip
Here’s what happens in a naive implementation:
- User sends a prompt
- FastAPI opens an SSE connection and streams Claude tokens
- Browser receives a token every 50-200ms
- User navigates away
- Component unmounts
- Request keeps streaming in the background, consuming memory
- CPU spikes because React’s trying to update an unmounted component
- You get 20 errors in the console
I prefer streaming over SSE (Server-Sent Events) instead of WebSockets for AI responses because:
- SSE is unidirectional (all we need)
- Built-in reconnection handling
- Simpler to debug than WebSocket state machines
- Naturally pairs with HTTP error codes
- No connection pool exhaustion at scale
But SSE isn’t magic. You need to actively manage the connection lifecycle.
The FastAPI Backend: Backpressure Matters
Let me show you the right way:
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
import anthropic
import asyncio
from typing import AsyncGenerator
app = FastAPI()
@app.post("/api/chat/stream")
async def stream_response(prompt: str) -> StreamingResponse:
"""Stream Claude response with proper resource cleanup."""
async def generate() -> AsyncGenerator[str, None]:
client = anthropic.Anthropic()
try:
# Use streaming parameter for token-by-token responses
with client.messages.stream(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{"role": "user", "content": prompt}
]
) as stream:
# Critical: handle backpressure and cancellation
for text in stream.text_stream:
# Yield SSE format
yield f"data: {text}\n\n"
# Respect backpressure by yielding control
await asyncio.sleep(0)
except asyncio.CancelledError:
# Client disconnected; cleanup happens automatically
raise
except anthropic.APIError as e:
yield f"data: [ERROR] API Error: {str(e)}\n\n"
raise
finally:
# Anthropic SDK handles cleanup in __exit__
pass
return StreamingResponse(
generate(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"X-Accel-Buffering": "no", # Disable nginx buffering
}
)
Why await asyncio.sleep(0) is critical: It yields control back to the event loop, allowing FastAPI to check if the client disconnected. Without it, if the user closes the tab, your server doesn’t know for several seconds. This is backpressure handling—respecting what the client can consume.
The X-Accel-Buffering: no header prevents intermediate proxies (nginx, CloudFlare) from buffering the stream. You want tokens hitting the browser immediately.
React 19: The AbortController Pattern
In React 19, I avoid useEffect for streaming logic when possible (it’s not a side effect container). Instead, I treat streaming as an async operation that a user triggers:
'use client';
import { useState, useRef, useCallback } from 'react';
interface StreamMessage {
role: 'user' | 'assistant';
content: string;
}
export function ChatStream() {
const [messages, setMessages] = useState<StreamMessage[]>([]);
const [isStreaming, setIsStreaming] = useState(false);
const abortControllerRef = useRef<AbortController | null>(null);
const handleStreamResponse = useCallback(async (userPrompt: string) => {
// Abort any existing stream first
abortControllerRef.current?.abort();
const controller = new AbortController();
abortControllerRef.current = controller;
setMessages(prev => [
...prev,
{ role: 'user', content: userPrompt }
]);
setIsStreaming(true);
let assistantMessage = '';
try {
const response = await fetch('/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt: userPrompt }),
signal: controller.signal, // Pass abort signal
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
// ReadableStream reader for token-by-token handling
const reader = response.body?.getReader();
if (!reader) throw new Error('No response body');
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const token = line.slice(6); // Remove "data: "
if (token.startsWith('[ERROR]')) {
throw new Error(token);
}
assistantMessage += token;
// Update message in-place while streaming
setMessages(prev => {
const newMessages = [...prev];
const lastMessage = newMessages[newMessages.length - 1];
if (lastMessage?.role === 'assistant') {
lastMessage.content = assistantMessage;
} else {
newMessages.push({
role: 'assistant',
content: assistantMessage
});
}
return newMessages;
});
}
}
}
} catch (error) {
if (error instanceof Error && error.name === 'AbortError') {
// User cancelled—clean, no error message
return;
}
setMessages(prev => [
...prev,
{
role: 'assistant',
content: `Error: ${error instanceof Error ? error.message : 'Unknown error'}`
}
]);
} finally {
setIsStreaming(false);
abortControllerRef.current = null;
}
}, []);
const handleCancel = useCallback(() => {
abortControllerRef.current?.abort();
}, []);
return (
<div className="flex flex-col gap-4">
<div className="space-y-3 h-96 overflow-y-auto">
{messages.map((msg, i) => (
<div
key={i}
className={`p-3 rounded-lg ${
msg.role === 'user'
? 'bg-blue-100 text-right'
: 'bg-gray-100'
}`}
>
{msg.content}
</div>
))}
</div>
<div className="flex gap-2">
<input
type="text"
placeholder="Ask something..."
onKeyDown={(e) => {
if (e.key === 'Enter' && !isStreaming) {
handleStreamResponse(e.currentTarget.value);
e.currentTarget.value = '';
}
}}
disabled={isStreaming}
className="flex-1 px-3 py-2 border rounded-lg"
/>
{isStreaming && (
<button
onClick={handleCancel}
className="px-4 py-2 bg-red-500 text-white rounded-lg"
>
Stop
</button>
)}
</div>
</div>
);
}
Key patterns here:
-
AbortController: Every stream gets its own controller. If a new prompt comes in while streaming, we abort the old one first. This prevents message corruption.
-
Manual ReadableStream reading: Instead of relying on a library, we read chunks and parse SSE format ourselves. This gives us exact control over backpressure—we don’t add tokens to state faster than React can render them.
-
In-place message updates: Rather than creating a new message object per token, we update the existing one. This prevents array thrashing and reduces re-renders.
-
Graceful AbortError handling: If the user clicks “Stop” or navigates away,
AbortErroris thrown but it’s expected, so we don’t show an error message.
Gotcha: The Memory Leak That Bit Me
I initially didn’t use AbortController properly. Every stream would complete fully even if the user navigated away. On CitizenApp with long-running analyses, this meant:
- Multiple concurrent streams running invisibly
- Memory climbing as response objects accumulated
- Network requests taking up connection slots
The fix was simple (use AbortController), but it took 48 hours of profiling to realize the problem wasn’t in React—it was the browser still pulling data from the network.
One More Thing: Error Recovery
Real production systems have flaky networks. Here’s what I added after the first month:
// Add exponential backoff retry
const maxRetries = 3;
let retryCount = 0;
while (retryCount < maxRetries) {
try {
const response = await fetch('/api/chat/