SQLAlchemy Relationship Lazy Loading Strategies in Multi-Tenant FastAPI: N+1 Queries and the Cost of Joinedload
I burned a week debugging production performance issues at CitizenApp before I realized: the problem wasn’t my query strategy—it was that I was using the same strategy everywhere. I had blindly adopted joinedload across all relationships, thinking “eager is always better than lazy.” I was wrong.
Here’s the uncomfortable truth: there is no universal SQLAlchemy loading strategy. The “best” approach depends on your concurrency model, tenant count, and whether you’re serving a single user or 50 concurrent requests. Let me show you what I learned.
The N+1 Problem Everyone Knows (And Still Gets Wrong)
Let’s say you have a CitizenApp-style multi-tenant structure:
from sqlalchemy import Column, Integer, String, ForeignKey, select
from sqlalchemy.orm import relationship, Session
class Tenant(Base):
__tablename__ = "tenants"
id = Column(Integer, primary_key=True)
name = Column(String)
class Organization(Base):
__tablename__ = "organizations"
id = Column(Integer, primary_key=True)
tenant_id = Column(Integer, ForeignKey("tenants.id"))
name = Column(String)
tenant = relationship("Tenant")
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
organization_id = Column(Integer, ForeignKey("organizations.id"))
name = Column(String)
organization = relationship("Organization")
This naive endpoint will trigger N+1 queries:
@app.get("/users")
async def list_users(session: Session):
users = session.query(User).all() # 1 query
return [
{
"id": u.id,
"name": u.name,
"org_name": u.organization.name # N queries (one per user)
}
for u in users
]
Everyone knows this. But here’s what I missed: the “fix” can be worse than the disease.
The Joinedload Trap I Fell Into
My first instinct was to use joinedload:
from sqlalchemy.orm import joinedload
@app.get("/users")
async def list_users(session: Session):
users = session.query(User).options(
joinedload(User.organization)
).all()
return [...]
This works perfectly for 5 users. For 50 concurrent requests with 100 users each?
The query becomes a 6-table LEFT JOIN with 50 concurrent threads all acquiring locks. Here’s what that query looks like:
SELECT users.id, users.organization_id, organizations.id, organizations.name, organizations.tenant_id
FROM users
LEFT JOIN organizations ON users.organization_id = organizations.id
LEFT JOIN tenants ON organizations.tenant_id = tenants.id
-- ... repeated for each relationship
Under concurrent load, PostgreSQL’s planner starts thrashing. Lock contention explodes. You get better “query counts” but worse wall-clock latency.
The Three Strategies and When They Actually Win
1. Selectinload: The Multi-Tenant Default
I prefer selectinload for most multi-tenant endpoints. Here’s why:
from sqlalchemy.orm import selectinload
@app.get("/users/{tenant_id}")
async def list_users(tenant_id: int, session: Session):
users = session.query(User).where(
User.organization.has(Organization.tenant_id == tenant_id)
).options(
selectinload(User.organization),
selectinload(User.organization).selectinload(Organization.tenant)
).all()
return [...]
What happens:
- Query 1:
SELECT * FROM users WHERE organization_id IN (...) - Query 2:
SELECT * FROM organizations WHERE id IN (...) - Query 3:
SELECT * FROM tenants WHERE id IN (...)
Why this wins for multi-tenant:
- Each query is simple and fast
- No lock contention from massive JOINs
- Scales linearly with result set size, not complexity
- Under concurrent load, each query hits separate index scans
- PostgreSQL planner stays predictable
Benchmark on CitizenApp (1000 users, 50 concurrent requests):
- Joinedload: 2.3s (lock wait time spike)
- Selectinload: 1.1s
- Lazy loading N+1: 8.4s
2. Joinedload: When It Actually Makes Sense
Joinedload wins in specific cases:
@app.get("/users/{user_id}")
async def get_user(user_id: int, session: Session):
# Single record fetch - joinedload is superior
user = session.query(User).where(User.id == user_id).options(
joinedload(User.organization).joinedload(Organization.tenant)
).first()
return user
Single-record scenarios where you need everything in one round-trip. You’re not fighting lock contention, and the JOIN is targeted.
Real-world case: User profile page. You fetch 1 user and always need their org + tenant data. One efficient query beats two round-trips.
3. Lazy Loading + Explicit Filtering
Here’s where I was completely wrong about “lazy is always bad”:
@app.get("/users/{tenant_id}")
async def list_users(tenant_id: int, session: Session):
# Don't eagerly load everything - let the app layer decide
users = session.query(User).where(
User.organization.has(Organization.tenant_id == tenant_id)
).all()
# Only serialize what we actually use
return [
{
"id": u.id,
"name": u.name,
# Skip org loading if response schema doesn't need it
}
for u in users
]
If your response schema doesn’t include the relationship, loading it is waste. I’ve seen endpoints that eagerly load 5 relationships and return 2 in JSON. That’s pure overhead.
The Gotcha: Async Sessions Break Everything
This burned me hard. With FastAPI async endpoints:
@app.get("/users")
async def list_users(session: AsyncSession): # AsyncSession!
users = await session.execute(
select(User).options(selectinload(User.organization))
)
return users.scalars().all() # This fails silently
Async SQLAlchemy has footguns:
- Selectinload may not execute the second query if you don’t access the relationship in the same session context
- Joinedload works better with async because it’s one query, but you lose the concurrency benefit
- You must expunge objects or you’ll get DetachedInstanceError
My solution at CitizenApp: I explicitly fetch related data in separate queries and assemble in Python:
@app.get("/users/{tenant_id}")
async def list_users(tenant_id: int, session: AsyncSession):
# Query users
users_result = await session.execute(
select(User).where(
User.organization.has(Organization.tenant_id == tenant_id)
)
)
users = users_result.scalars().all()
# Query organizations in bulk
org_ids = [u.organization_id for u in users]
orgs_result = await session.execute(
select(Organization).where(Organization.id.in_(org_ids))
)
orgs_map = {o.id: o for o in orgs_result.scalars()}
return [
{
"id": u.id,
"organization": orgs_map[u.organization_id],
}
for u in users
]
This is verbose but predictable. No surprises under load.
What I Actually Do Now
For CitizenApp (9 concurrent AI features, multi-tenant):
- List endpoints:
selectinloadfor 1-2 levels, then lazy-load in serializer if needed - Detail endpoints:
joinedloadto single record - Reporting queries: Raw SQL with explicit SELECT columns (no ORM overhead)
- Async endpoints: Explicit multi-query pattern, assemble in Python
The counterintuitive win: fewer total queries doesn’t always mean faster endpoints. Lock contention, network round-trips, and serialization all matter.
Stop defaulting to joinedload everywhere. Measure. Profile. Use selectinload for multi-tenant list queries.
Comments
All comments are moderated before appearing.
Leave a comment