Every “AI productivity” article I’ve read is either vague advice (“use AI for boilerplate!”) or cherry-picked demos that don’t look like real work. This is neither.
This is a concrete walkthrough of a single feature — adding TOTP 2FA to CitizenApp — showing exactly where Cursor and Claude helped, where they didn’t, and what the actual time savings looked like.
The Feature: TOTP 2FA with Recovery Codes
The requirement: users can enable TOTP-based two-factor authentication (Google Authenticator-compatible). If they lose their device, they can use one of 10 one-time recovery codes.
This is a non-trivial feature: it touches the auth flow, the user model, the session model, email sending, and the frontend in at least 4 places.
Estimated time without AI: 2–3 days. Actual time: ~6 hours.
Here’s how.
Step 1: Architecture Decision (Claude)
Before opening my editor, I opened Claude with context:
I'm adding TOTP 2FA to a FastAPI app. Current auth:
- bcrypt password hashing
- JWT access tokens (15min)
- HTTP-only refresh token cookies (30 days)
- PostgreSQL + SQLAlchemy async
Requirements:
- Users can enable/disable 2FA
- TOTP compatible with Google Authenticator, Authy
- 10 recovery codes on setup, each single-use
- Login flow: password → if 2FA enabled → TOTP prompt
Questions:
1. Where to store TOTP secret? Encrypted at rest?
2. How to handle the "pending 2FA" state during login —
issue a limited token or use session?
3. Recovery code format — hash them or store plain?
Libraries: pyotp seems obvious. Anything better?
Claude’s response clarified three things I hadn’t fully thought through:
-
Yes, encrypt the TOTP secret at rest — same Fernet key as other PII. If your DB is breached, unencrypted TOTP secrets let an attacker bypass 2FA permanently.
-
Use a short-lived “pending 2FA” JWT rather than a session. Stateless — no DB lookup during the TOTP verification step. Include
{"sub": user_id, "2fa_pending": true}in the payload with a 5-minute expiry. -
Hash recovery codes — same as password hashing. Bcrypt is overkill (slow), SHA-256 is fine since recovery codes are long random strings.
Total time: 20 minutes. I had a clear, validated architecture before writing any code.
Step 2: Database Schema (Cursor)
In Cursor, I described the schema to Agent mode:
Add to models/user.py:
- totp_secret field (encrypted string, nullable)
- totp_enabled bool (default false)
- totp_verified_at datetime (nullable)
Create new file models/recovery_code.py:
- id, user_id FK, code_hash (sha256, unique), used bool, used_at datetime
- 10 codes per user, generated on 2FA setup
Cursor generated both files. I reviewed:
- ✅ Correct SQLAlchemy 2.0 mapped_column syntax
- ✅ Proper ForeignKey with cascade delete
- ✅ Index on
code_hashfor lookup performance - ⚠️ Missing:
used_attimezone handling — I addedtimezone=Trueto the datetime column
One manual fix. Two files I didn’t have to write from scratch.
Then Alembic migration:
alembic revision --autogenerate -m "add_totp_2fa"
I reviewed the generated migration, confirmed the column types were correct, ran it.
Step 3: The TOTP Service (Cursor + Manual)
The TOTP logic is self-contained enough for Cursor to handle most of it:
Create services/totp.py with:
- generate_totp_secret() → encrypted secret
- get_totp_uri(user_email, secret) → otpauth:// URI for QR code
- verify_totp(encrypted_secret, code, window=1) → bool
- generate_recovery_codes(user_id, db) → list of raw codes (store hashes)
- verify_recovery_code(user_id, code, db) → bool (mark used)
Use pyotp. Decrypt secret before passing to pyotp.
Generated code was ~90% correct. What I fixed manually:
# Cursor generated this:
totp = pyotp.TOTP(secret)
return totp.verify(code)
# I changed to:
totp = pyotp.TOTP(secret)
# window=1 allows 30s clock drift — important for mobile TOTP apps
return totp.verify(code, valid_window=1)
Small but important: valid_window=1 means ±1 time step (±30 seconds). Without it, users with slightly drifted clocks fail verification constantly.
Step 4: The Auth Flow Changes (Manual + Claude Review)
This was the part I wrote mostly by hand — the login flow change is architectural, not boilerplate:
# routes/auth.py — modified login endpoint
@router.post("/login")
async def login(credentials: LoginSchema, db: AsyncSession = Depends(get_async_db)):
user = await authenticate_user(db, credentials.username, credentials.password)
if not user:
raise HTTPException(status_code=401, detail="Invalid credentials")
# If 2FA is enabled, issue a limited pending token instead of full auth
if user.totp_enabled:
pending_token = create_access_token(
{"sub": str(user.id), "2fa_pending": True},
expires_delta=timedelta(minutes=5)
)
return {"requires_2fa": True, "pending_token": pending_token}
# Normal flow — no 2FA
return await issue_full_tokens(user)
@router.post("/verify-2fa")
async def verify_2fa(
payload: Verify2FASchema,
db: AsyncSession = Depends(get_async_db),
):
# Validate the pending token
token_data = decode_pending_token(payload.pending_token)
if not token_data or not token_data.get("2fa_pending"):
raise HTTPException(status_code=401, detail="Invalid pending token")
user = await get_user(db, int(token_data["sub"]))
# Try TOTP first, then recovery code
if not (
verify_totp(user.totp_secret, payload.code)
or await verify_recovery_code(user.id, payload.code, db)
):
raise HTTPException(status_code=401, detail="Invalid 2FA code")
return await issue_full_tokens(user)
After writing this, I pasted it to Claude with: “Review for security issues, particularly around the pending token and the TOTP/recovery code branch.”
Claude flagged one issue: the or short-circuit means if TOTP passes, the recovery code check is never run — which is correct behaviour, but I should log the authentication method used in the audit trail.
I added it. Three lines. Worth catching.
Step 5: Tests (Cursor + Review)
I wrote a brief test spec in a comment, then let Cursor generate the test cases:
# Tests needed:
# - login without 2FA → full tokens
# - login with 2FA → pending token + requires_2fa flag
# - verify 2FA with valid TOTP → full tokens
# - verify 2FA with invalid TOTP → 401
# - verify 2FA with valid recovery code → full tokens, code marked used
# - verify 2FA with used recovery code → 401
# - verify 2FA with expired pending token → 401
Cursor generated 7 tests. I reviewed each one, fixed 2 issues:
- Mock for
pyotp.TOTP.verifywasn’t isolated — tests were dependent on each other - Recovery code test wasn’t checking
used=Trueafter verification
Fixed, ran pytest. All 7 passed. Added to the existing 107-test suite.
Step 6: Frontend (Cursor, mostly autonomous)
The React side: Cursor handled it with minimal guidance.
Add to pages/Login.tsx:
- If login response has requires_2fa: true, show 2FA input form
- Submit pending_token + code to /auth/verify-2fa
- Handle recovery code toggle (same input, different UX label)
- Show loading state during verification
Add to pages/AccountSettings.tsx:
- Enable 2FA flow: show QR code + secret + "verify to confirm" step
- Disable 2FA flow: require password confirmation
- Show recovery codes after setup (copy-to-clipboard, one-time display)
- Regenerate recovery codes option
Frontend took ~90 minutes. Cursor did most of the repetitive React + TypeScript. I reviewed for accessibility (focusable inputs, keyboard navigation on the code entry) and UX edge cases (what happens if QR scan fails → show manual entry option).
The Honest Time Breakdown
| Task | Without AI | With AI | Saved |
|---|---|---|---|
| Architecture decision | 2h (research + thinking) | 20min (Claude review) | ~1.5h |
| DB schema + migration | 30min | 10min | 20min |
| TOTP service | 1h | 25min | 35min |
| Auth flow changes | 1.5h | 1.5h (manual) | 0 |
| Tests | 1h | 30min | 30min |
| Frontend | 3h | 1.5h | 1.5h |
| Total | ~9h | ~4.25h | ~4.75h |
That’s roughly 2× for this feature — not 3×. The architectural work (login flow changes, security review) wasn’t AI-accelerated much.
The 3× number shows up on features with more boilerplate relative to architecture: CRUD endpoints, schema migrations, test fixtures, UI forms.
What I’d Never Let AI Do Unreviewed
- Any security-sensitive logic — the
valid_windowissue is a perfect example of plausible-but-wrong AI output - DB migrations — generated migrations can silently drop columns if you’re not careful
- Auth flow — I wrote the pending token flow entirely by hand
- Error handling — AI usually handles happy path, misses edge cases
The rule I follow: AI drafts, I ship. Everything AI generates gets read line by line before it goes into main.
Cursor Config That Helps
Two settings that materially improve Cursor’s output quality:
// .cursor/rules (project-level)
{
"rules": [
"Always use SQLAlchemy 2.0 mapped_column syntax, not Column()",
"Use async/await throughout — no sync database calls",
"Follow existing patterns in the codebase before inventing new ones",
"Always include error handling for database operations",
"Use Pydantic v2 model_validator not root_validator"
]
}
These prevent the most common errors in AI-generated FastAPI code.
Want to walk through your project’s architecture with me? 30 minutes, no pitch.