Integrating LLMs into a React Application
Streaming, server architecture and error handling — what you actually need to get right before shipping an LLM feature in your React product.
Most LLM integrations I've seen in production share the same mistake: they treat the API like a regular REST call. Send request, wait, display response. The result is a frozen UI for 10 to 15 seconds. This guide covers the three things that actually matter.
Why Streaming Is Non-Negotiable
LLMs generate text token by token. Without streaming, the user waits for the entire generation to finish before seeing anything — often 8 to 15 seconds for a 200-token response. With streaming, the first tokens appear in under a second. The experience is completely different.
OpenAI, Anthropic, and Mistral all support Server-Sent Events. Streaming isn't an optimization you add later — it's the baseline architecture. Building without it means tearing things apart when you inevitably add it.
Architecture: Never Call an LLM from the Client
Your API key must never end up in the JavaScript bundle — putting it in client-side code is the same as making it public. The standard solution: a Next.js API route that proxies the request server-side, with the key stored in environment variables.
// app/api/chat/route.ts
import { OpenAI } from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
export const runtime = 'edge';
export async function POST(req: Request) {
const { messages } = await req.json();
const stream = openai.beta.chat.completions.stream({
model: 'gpt-4o',
messages,
});
return new Response(stream.toReadableStream());
}The `runtime = 'edge'` flag deploys this route on Vercel or Netlify's Edge network. Time to first token drops by 30 to 40 % compared to a standard serverless function — on an LLM integration, every millisecond of initial latency is felt.
Consuming the Stream in React
React has no native primitive for text streams. The most direct approach: read the ReadableStream in an async function and accumulate chunks into local state.
async function send(content: string) {
setStreaming(true);
setOutput('');
const res = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages: [{ role: 'user', content }] }),
});
const reader = res.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
setOutput(prev => prev + decoder.decode(value, { stream: true }));
}
setStreaming(false);
}The `{ stream: true }` option on `decode` matters: without it, multi-byte UTF-8 sequences that get split across chunk boundaries produce corrupted characters. It's the kind of bug that only surfaces with accented text or emoji.
Error Handling Before You Ship
Three cases to handle before calling this production-ready:
- Timeout: an AbortController with a 30-second limit. Without it, a stalled request can hang indefinitely on the client side.
- Rate limiting (HTTP 429): catch the status, read the Retry-After header, and surface a clean retry option with exponential backoff.
- Interrupted stream: verify the response is complete at the end of reading. Show a clear error state rather than silently displaying truncated output.
One more thing that gets skipped: log your latencies. Time to first token (TTFT) and total generation time are the two metrics worth tracking. They surface service degradation before users start filing complaints.
Streaming isn't an optimization to add later. It's the baseline architecture. Build without it and you'll be tearing things apart when you add it.
Written by
Casian Ciorba
Six years shipping production code — software, SaaS, mobile apps and AI integrations. I write about what I actually use on projects, not what makes conference talks.