Every week someone asks me: “Do you actually use AI, or is that just marketing copy?”
The honest answer: yes, heavily — but not the way most people imagine. AI doesn’t replace engineering judgment. It amplifies it. Here’s exactly what that looks like on a real project.
The Stack
Three tools, three distinct roles:
- Claude — architecture, code review, documentation, complex reasoning
- Cursor — in-editor generation, refactoring, inline explanation
- GitHub Copilot — fast autocomplete for repetitive patterns
None of them replace each other. Each has a context where it’s genuinely faster.
Phase 1: Architecture Decisions
Before writing a single line of code, I describe the problem to Claude:
I'm building a multi-tenant SaaS where each organisation's data
must be logically isolated. PostgreSQL, FastAPI. Options:
1. Row-level security with tenant_id FK everywhere
2. Separate schemas per tenant
3. Separate databases
Current scale: <500 tenants, <10k rows per tenant.
GDPR compliance required. Prioritise maintainability over performance.
Which pattern and why?
Claude’s response is usually better than any Stack Overflow thread — because it reasons through my constraints, not a generic scenario. I use this for:
- Schema design tradeoffs
- Auth architecture decisions
- Caching strategy
- Deployment topology
Time saved: 1-2 hours per architecture decision. The thinking I do is genuinely deeper, not shallower.
Phase 2: Scaffolding the Boring Parts
Once the architecture is decided, there’s a class of code that’s important but not interesting: Pydantic schemas, CRUD endpoints, test fixtures, migration files.
In Cursor, I describe what I need:
Create a FastAPI router for the User resource.
Endpoints: GET /users, GET /users/{id}, POST /users, PATCH /users/{id}, DELETE /users/{id}
Use async SQLAlchemy. Inject AsyncSession via Depends.
Return 404 for missing users, 409 for email conflicts.
Follow the pattern in routers/organisation.py.
The output is ~90% correct and needs 10% adjustment. That’s fine — the adjustment is the interesting part.
What I never do: accept generated code without reading every line. AI generates plausible-looking code that can have subtle bugs. Plausible ≠ correct.
Phase 3: Test Generation
This is where I get the biggest leverage. Writing tests is important but tedious. AI is excellent at it.
# I write the function
async def transfer_credits(
db: AsyncSession,
from_user_id: int,
to_user_id: int,
amount: Decimal,
) -> Transaction:
...
# I ask Claude: "Write pytest tests for transfer_credits.
# Cover: happy path, insufficient balance, invalid users,
# concurrent transfers (race condition), decimal precision."
Claude generates tests I would have written but faster. More importantly, it sometimes suggests edge cases I hadn’t thought of — the concurrent transfer scenario above was a Claude suggestion.
Result: 107 tests in CitizenApp. Probably 40+ of those were AI-drafted, then reviewed and adjusted by me.
Phase 4: Code Review
Before any PR, I paste the diff into Claude:
Review this authentication middleware. I'm particularly worried about:
1. The JWT validation edge cases
2. Whether the rate limiting is applied before or after auth
3. Any timing attack vectors in the token comparison
[paste diff]
Claude caught a subtle issue in my Fernet key rotation logic that my own review missed — the old key wasn’t being kept available for decrypting existing data during rotation.
Where AI Fails (And Human Judgment Is Irreplaceable)
After 18 months of heavy AI-assisted development, here’s where it consistently falls short:
1. Business logic tradeoffs
AI doesn’t know that your SaaS has a “freemium” tier that needs different rate limiting. It doesn’t know your biggest client needs an audit log export in a non-standard format. Domain knowledge is yours.
2. Long-range architectural consequences
AI gives excellent advice for today’s problem. It often misses how today’s decision will constrain you in 6 months.
3. Security audit of its own output
AI generates code with security vulnerabilities. It’s also good at finding them — but only when you ask explicitly. Never assume AI-generated code is secure without a dedicated security review pass.
4. The judgment call
”Should we add this feature now or defer it?” AI will give you both sides of the argument. The decision is still yours.
The 3× Number
Where does “3× faster” come from?
Rough breakdown on a typical feature:
- Architecture decisions: 50% faster (Claude eliminates most of the research)
- Boilerplate/scaffolding: 80% faster (Cursor generates the structure)
- Test writing: 60% faster (AI drafts, I review/adjust)
- Documentation: 70% faster (Claude explains, I verify)
- Code review: 30% faster (catches more issues, faster)
Weighted across a project, ~3× is a conservative estimate. Some tasks are 5×. Some (judgment calls, architecture) are 1.1×.
The Mental Model That Works
Think of AI as a very fast, very well-read junior engineer who:
- Knows every library, every pattern, every Stack Overflow answer
- Never gets tired or distracted
- Has no domain knowledge about your business
- Occasionally makes confident, plausible mistakes
- Needs clear briefs and explicit review
Treat it that way, and it’s transformative. Treat it as a magic code generator, and you’ll ship subtle bugs at 3× the speed.
Using AI tools on a project I can help with? Let’s talk.