Building Audit Logs That Don’t Destroy Your Database: A Production Pattern
I added audit logging to CitizenApp after the first paying customer asked “who changed this citizen’s address?” It seemed simple: add a table, insert rows on every mutation, done. Three months later, the audit table had 2.3 million rows and queries were timing out.
The problem wasn’t volume—it was that I’d treated audit logs like regular application data. They’re not. Audit data has different access patterns, retention requirements, and scale characteristics than your core business tables. Here’s the production pattern I landed on after actually breaking things.
Why Standard Audit Approaches Fail at Scale
Most tutorials show this:
# SQLAlchemy model - the naive approach
class AuditLog(Base):
__tablename__ = "audit_logs"
id = Column(UUID, primary_key=True, default=uuid.uuid4)
tenant_id = Column(UUID, ForeignKey("tenants.id"), nullable=False)
user_id = Column(UUID, ForeignKey("users.id"))
action = Column(String, nullable=False)
resource_type = Column(String, nullable=False)
resource_id = Column(UUID, nullable=False)
changes = Column(JSON)
created_at = Column(DateTime, default=datetime.utcnow)
This works until you hit 500k+ rows. Then:
- Queries slow down because indexes grow massive
- Storage explodes because you’re indexing everything
- Pagination breaks because
OFFSET 100000scans all previous rows - Retention cleanup locks the table for minutes
I hit all four in production. The audit table grew to 40% of our total database size, and a customer trying to view their audit history would timeout after 30 seconds.
The Pattern: Date-Partitioned Tables + Selective Indexing
Here’s the CitizenApp audit schema that actually scales:
-- Parent table (partitioned by month)
CREATE TABLE audit_logs (
id UUID DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
user_id UUID,
action VARCHAR(50) NOT NULL,
resource_type VARCHAR(50) NOT NULL,
resource_id UUID NOT NULL,
changes JSONB,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- Create monthly partitions (automate this with a cron job)
CREATE TABLE audit_logs_2026_05 PARTITION OF audit_logs
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');
CREATE TABLE audit_logs_2026_06 PARTITION OF audit_logs
FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');
-- Index ONLY what you query by
CREATE INDEX idx_audit_tenant_time_2026_05
ON audit_logs_2026_05(tenant_id, created_at DESC);
CREATE INDEX idx_audit_resource_2026_05
ON audit_logs_2026_05(tenant_id, resource_type, resource_id);
Why this works:
- Partitions keep indexes small — PostgreSQL only scans the relevant month
- Retention is trivial —
DROP TABLE audit_logs_2025_01takes milliseconds - Queries are predictable — recent data (what users actually view) is always fast
- Storage is cheaper — old partitions can be moved to cold storage or compressed
The key insight: users almost never query audit logs older than 90 days. Why optimize for something that doesn’t happen?
Making Audit Data Searchable Without Killing Performance
The second mistake I made was trying to make audit logs “fully searchable.” I added a GIN index on the entire changes JSONB column:
-- Don't do this
CREATE INDEX idx_audit_changes ON audit_logs USING gin(changes);
Index size: 8 GB. Query time for “find all changes to email addresses”: still 12 seconds.
GIN indexes on unstructured JSONB are expensive and rarely selective enough to help. Here’s what actually works:
Extract Key Fields for Common Queries
# FastAPI endpoint - store searchable fields explicitly
class AuditLog(Base):
__tablename__ = "audit_logs"
id = Column(UUID, primary_key=True, default=uuid.uuid4)
tenant_id = Column(UUID, nullable=False)
action = Column(String(50), nullable=False) # "created", "updated", "deleted"
resource_type = Column(String(50), nullable=False) # "citizen", "document"
resource_id = Column(UUID, nullable=False)
# Extract common search fields
changed_fields = Column(ARRAY(String)) # ["email", "phone"]
full_changes = Column(JSONB) # Complete before/after
created_at = Column(DateTime, nullable=False, default=datetime.utcnow)
# When logging changes
def log_citizen_update(tenant_id, user_id, citizen_id, old_data, new_data):
changed_fields = [k for k in new_data if old_data.get(k) != new_data[k]]
audit = AuditLog(
tenant_id=tenant_id,
user_id=user_id,
action="updated",
resource_type="citizen",
resource_id=citizen_id,
changed_fields=changed_fields, # Indexed array
full_changes={
"before": old_data,
"after": new_data
}
)
session.add(audit)
Now I can index the array:
CREATE INDEX idx_audit_changed_fields_2026_05
ON audit_logs_2026_05 USING gin(changed_fields);
This index is 90% smaller and queries like “show me all email changes” run in under 200ms.
The AI Feature That Made Audit Logs Useful
Here’s the thing about audit logs: nobody reads them. We built a beautiful audit viewer UI with filters and pagination, and usage was less than 5% of tenants.
Then I added an AI explainer powered by Claude Haiku:
// TypeScript API route - AI audit explanation
async function explainAuditHistory(
tenantId: string,
resourceType: string,
resourceId: string
): Promise<string> {
const logs = await db.auditLog.findMany({
where: {
tenantId,
resourceType,
resourceId,
},
orderBy: { createdAt: 'desc' },
take: 50, // Last 50 changes
include: { user: { select: { name: true } } }
});
const prompt = `Summarize this audit history in 2-3 sentences:
${logs.map(log =>
`${log.createdAt.toISOString()}: ${log.user.name} ${log.action} ${log.resource_type}, changed: ${log.changed_fields.join(', ')}`
).join('\n')}`;
const response = await anthropic.messages.create({
model: 'claude-haiku-3-5',
max_tokens: 150,
messages: [{ role: 'user', content: prompt }]
});
return response.content[0].text;
}
Output: “This citizen record has been updated 8 times in the last 30 days, primarily by Sarah Chen. Most changes were to contact information (phone, email) and address fields. The most recent update added emergency contact details.”
Audit log usage jumped to 40%. Turns out people want insights, not raw logs.
Gotchas I Learned the Hard Way
1. Don’t log SELECT queries
I tried logging all database reads for compliance. The audit table grew to 10 million rows in a week. Now I only log mutations (create/update/delete) and sensitive reads (viewing SSN, downloading reports).
2. Partition creation must be automated
I manually created partitions for the first 3 months. Then I forgot, and inserts started failing with “no partition for value 2026-08-01”. Now a GitHub Action creates next month’s partition on the 25th of each month.
3. JSON querying is still expensive
Even with extracted fields, avoid changes->>'field' = 'value' in WHERE clauses. Use the array of changed fields for filtering, then load full JSON only for display.
Production Results
After implementing this pattern:
- Audit queries under 200ms (p95), down from 8+ seconds
- Storage growth dropped 60% (partitions + selective indexing)
- Retention cleanup is instant (drop old partitions)
- Zero query timeouts in 4 months
The lesson: treat audit logs like time-series data, not relational data. Partition by time, index only what you search, and accept that old data will be archived or deleted. Your database will thank you.