Render PostgreSQL Backups to S3: Automating Disaster Recovery for Multi-Tenant SaaS Without Vendor Lock-In
I’ve been building CitizenApp on Render for two years. The platform is solid—zero complaints about uptime or developer experience. But I’ll be honest: the first time I read Render’s backup documentation, I realized I was entirely dependent on their infrastructure for data recovery.
That terrified me.
Here’s the thing: Render’s managed PostgreSQL comes with daily backups, but they’re stored in their infrastructure. If Render experiences a catastrophic failure (unlikely, but possible), or if they discontinue the service, or if I need to migrate to another provider—I’m at their mercy. For CitizenApp, which handles sensitive tenant data across 200+ organizations, this isn’t acceptable.
I’m not paranoid. I’m just someone who’s read enough incident reports.
Why You Can’t Sleep Well With Platform-Only Backups
Render’s backup system is reliable for Render’s purposes. They retain backups for 30 days, support point-in-time recovery (PITR), and honestly, their restoration process works smoothly. But there’s a fundamental asymmetry: they control the backup, they control the restore, and they decide how long data lives.
Here’s what keeps me awake:
- Vendor lock-in: Your data is only as portable as Render’s export capabilities
- Compliance: SOC 2 audits often require evidence that backups exist outside the primary infrastructure
- Recovery time objectives (RTO): Platform outages mean you can’t restore even if you have backups
- Cost surprises: Render might price backup storage differently next year
- Multi-region redundancy: A single region failure shouldn’t cascade to your recovery capability
For CitizenApp’s clients, especially enterprises, I need to answer this question with confidence: “If Render evaporates tomorrow, how fast can we restore your data?” The honest answer with platform-only backups is: “We’re probably okay, but I can’t guarantee it.”
So I automated offsite backups. Here’s how.
The Architecture: Dump, Upload, Verify
I prefer a push-based model over trying to hook into Render’s backup system directly. Here’s why:
- Simplicity: One scheduled job, no complex WAL archiving setup
- Portability: Works with any PostgreSQL, not Render-specific
- Auditability: I can see exactly what backed up and when
- Cost control: S3/R2 storage is predictable and cheap
The flow looks like this:
PostgreSQL on Render
↓
pg_dump (compressed)
↓
Encrypt with age
↓
Upload to Cloudflare R2 (or AWS S3)
↓
Verify checksum & log
↓
Alert if failed
Setting Up Automated Backups
Step 1: Create an S3-Compatible Bucket
I use Cloudflare R2 because egress is free (S3 charges $0.09/GB). For CitizenApp, that’s the difference between $10/month and $50/month at our current data size.
# AWS S3 (if you prefer)
aws s3 mb s3://citizenapp-postgres-backups-prod
# Cloudflare R2
# Done via dashboard, bucket: citizenapp-postgres-backups-prod
Create an IAM user with restricted permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::citizenapp-postgres-backups-prod",
"arn:aws:s3:::citizenapp-postgres-backups-prod/*"
]
}
]
}
Step 2: Python Backup Script
This runs daily via GitHub Actions (or a cron job elsewhere). I prefer GitHub Actions because it’s free, version-controlled, and doesn’t require managing another server.
# backup_postgres.py
import os
import subprocess
import hashlib
import sys
from datetime import datetime
import boto3
from pathlib import Path
def backup_postgres(db_url: str, bucket: str, region: str = "us-east-1") -> bool:
"""
Backup PostgreSQL to S3-compatible storage.
Returns True if successful.
"""
timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
backup_file = f"/tmp/citizenapp_backup_{timestamp}.sql.gz"
checksum_file = f"{backup_file}.sha256"
try:
# Step 1: Dump with compression
print(f"[{timestamp}] Starting pg_dump...")
subprocess.run(
[
"pg_dump",
"--format=plain",
"--compress=9",
"--no-password",
db_url,
],
stdout=open(backup_file, "wb"),
stderr=subprocess.PIPE,
check=True,
)
file_size_mb = Path(backup_file).stat().st_size / (1024 * 1024)
print(f"Dump complete: {file_size_mb:.2f} MB")
# Step 2: Calculate checksum
print("Calculating checksum...")
sha256_hash = hashlib.sha256()
with open(backup_file, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
sha256_hash.update(chunk)
checksum = sha256_hash.hexdigest()
with open(checksum_file, "w") as f:
f.write(f"{checksum} {os.path.basename(backup_file)}\n")
# Step 3: Upload to S3
print(f"Uploading to S3 ({bucket})...")
s3_client = boto3.client(
"s3",
region_name=region,
endpoint_url=os.getenv("S3_ENDPOINT_URL"), # For R2
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
)
s3_client.upload_file(
backup_file,
bucket,
f"backups/{timestamp}.sql.gz",
ExtraArgs={
"Metadata": {
"checksum": checksum,
"size-mb": str(file_size_mb),
}
},
)
s3_client.upload_file(
checksum_file,
bucket,
f"backups/{timestamp}.sql.gz.sha256",
)
print(f"✓ Backup successful: {timestamp}")
print(f" Checksum: {checksum}")
# Step 4: Cleanup
os.remove(backup_file)
os.remove(checksum_file)
return True
except subprocess.CalledProcessError as e:
print(f"✗ pg_dump failed: {e.stderr.decode()}")
return False
except Exception as e:
print(f"✗ Backup failed: {str(e)}")
return False
if __name__ == "__main__":
db_url = os.getenv("DATABASE_URL")
bucket = os.getenv("BACKUP_BUCKET", "citizenapp-postgres-backups-prod")
region = os.getenv("BACKUP_REGION", "us-east-1")
success = backup_postgres(db_url, bucket, region)
sys.exit(0 if success else 1)
Step 3: GitHub Actions Workflow
# .github/workflows/backup-postgres.yml
name: Backup PostgreSQL to S3
on:
schedule:
- cron: "0 2 * * *" # 2 AM UTC daily
workflow_dispatch:
jobs:
backup:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"
- name: Install dependencies
run: |
pip install boto3 psycopg2-binary
- name: Run backup
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
AWS_ACCESS_KEY_ID: ${{ secrets.BACKUP_AWS_ACCESS_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.BACKUP_AWS_SECRET_KEY }}
S3_ENDPOINT_URL: ${{ secrets.BACKUP_S3_ENDPOINT }}
BACKUP_BUCKET: ${{ secrets.BACKUP_BUCKET }}
run: python scripts/backup_postgres.py
- name: Notify on failure
if: failure()
uses: actions/github-script@v6
with:
script: |
const slack_webhook = "${{ secrets.SLACK_WEBHOOK_BACKUPS }}";
await fetch(slack_webhook, {
method: "POST",
body: JSON.stringify({
text: "⚠️ PostgreSQL backup failed for CitizenApp"
})
});
Testing Recovery (The Part Everyone Skips)
Here’s the critical thing: a backup you’ve never restored is just hope. I test recovery quarterly.
# Download and verify
aws s3 cp s3://citizenapp-postgres-backups-prod/backups/20240115_020000.sql.gz .
sha256sum -c <(echo "abc123... 20240115_020000.sql.gz")
# Restore to a test database
gunzip -c 20240115_020000.sql.