Database Guide
StudioBrain supports two database modes: SQLite for desktop/single-user deployments and PostgreSQL for multi-tenant cloud deployments. This guide covers the full database architecture, security model, migration workflow, and scaling strategy.
Database Modes
SQLite (Desktop / Single-User)
In desktop mode, StudioBrain uses a single SQLite database file. This is the default for self-hosted and open-core installations.
- Zero configuration. No external database server required.
- Single file. The entire database lives at the path specified by
DATABASE_URL(e.g.,sqlite:////data/db/city_brains.db). - Single tenant. All entities use
tenant_id='local'as a hardcoded value. - No RLS. SQLite does not support row-level security; tenant isolation is enforced at the application layer.
Connection string format:
DATABASE_URL=sqlite:////data/db/city_brains.dbPostgreSQL (Cloud / Multi-Tenant)
In cloud mode, StudioBrain uses a five-service database architecture with strict separation between PII and content data.
- Two PostgreSQL instances. Auth (PII/billing) and Content (entities/templates) run on separate servers.
- Row-Level Security (RLS). Content database enforces tenant isolation at the database level.
- Qdrant. Dedicated vector database for RAG embeddings, one collection per tenant.
- Redis. Sessions, rate limits, real-time presence, and PubSub channels.
Connection string format:
AUTH_DATABASE_URL=postgresql://studiobrain_auth:password@auth-host:5432/studiobrain_auth
CONTENT_DATABASE_URL=postgresql://studiobrain_app:password@content-host:5432/studiobrain_content
QDRANT_URL=http://qdrant-host:6333
REDIS_URL=redis://user:password@redis-host:6379/0Five-Service Architecture
The cloud deployment uses five dedicated database services:
+---------------------------------------------------+
| ZONE 1: RESTRICTED (PII) |
| |
| +-----------------------------------------+ |
| | Auth DB (PostgreSQL) | |
| | | |
| | Tables: | |
| | tenants (company info, PII) | |
| | users (emails, names, PII) | |
| | subscriptions (Stripe, billing) | |
| | oauth_tokens (encrypted) | |
| | api_keys (hashed) | |
| | audit_logs (immutable) | |
| | usage_metrics (billing) | |
| | revoked_tokens | |
| | stripe_events | |
| | | |
| | Access: App backend ONLY | |
| | Firewall: Block all except app server | |
| +-----------------------------------------+ |
| |
+---------------------------------------------------+
+---------------------------------------------------+
| ZONE 2: STANDARD (Content) |
| |
| +-----------------------------------------+ |
| | Content DB (PostgreSQL + RLS) | |
| | | |
| | Tables: | |
| | entities (all entity types) | |
| | assets (metadata, not files)| |
| | templates (type definitions) | |
| | plugin_data (plugin state) | |
| | relationships (entity links) | |
| | timeline_entries | |
| | sync_state | |
| | | |
| | Access: App backend + AI service (RO) | |
| | All tables: RLS by tenant_id | |
| | NO PII. tenant_id is a UUID. | |
| +-----------------------------------------+ |
| |
| +-------------------+ +--------------------+ |
| | Qdrant | | Redis | |
| | | | | |
| | Collection/tenant | | Sessions (no PII) | |
| | Entity embeddings | | Rate limits | |
| | HNSW indexes | | Presence/PubSub | |
| | No PII in payload | | BrainBits tracking | |
| +-------------------+ +--------------------+ |
| |
+---------------------------------------------------+Auth Database
The Auth DB stores all personally identifiable information and billing data. It is the only database that maps tenant_id UUIDs to real company names, user emails, and billing information.
Key tables:
| Table | Purpose | Row Estimate (Year 1) |
|---|---|---|
tenants | Organizations with billing, plan, storage config | ~600 |
users | User accounts with email, name, auth credentials | ~2,400 |
subscriptions | Stripe subscription tracking | ~600 |
oauth_tokens | Encrypted Google Drive/OAuth tokens | ~1,000 |
api_keys | Hashed API keys for programmatic access | ~200 |
audit_logs | Immutable append-only security log | ~1M/year |
usage_metrics | Monthly billing metrics per tenant | ~7,200 |
revoked_tokens | Invalidated JWTs (for logout) | Variable |
stripe_events | Webhook idempotency tracking | Variable |
Security measures:
- Encrypted at rest (LUKS or ZFS encryption)
- TLS required for all connections (
sslmode=require) pgauditextension enabled for query logging- Firewall restricts access to the app backend server only
- OAuth tokens and billing addresses are AES-256 encrypted at the application layer before storage
Content Database
The Content DB stores all entity data, templates, plugin state, and relationships. It contains zero PII. The tenant_id column is a UUID — it never contains names, emails, or company information.
Key tables:
| Table | Purpose | RLS |
|---|---|---|
entities | All entity types in one table (schema-agnostic JSON blob) | Yes |
assets | Asset metadata (filenames, AI descriptions, tags) — not binary files | Yes |
templates | Entity type definitions (parsed from template markdown) | Yes |
plugin_data | Per-plugin key-value state | Yes |
relationships | Entity-to-entity links | Yes |
timeline_entries | Chronological events for entities | Yes |
sync_state | Client sync tracking (last push/pull timestamps) | No |
Row-Level Security:
Every table (except sync_state) has RLS enabled with a tenant isolation policy:
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON entities
USING (tenant_id = current_setting('app.current_tenant')::UUID);The backend sets app.current_tenant on each database session from the JWT token. Even if application-level filtering fails, the database itself prevents cross-tenant data access.
Qdrant (Vector Store)
Qdrant stores vector embeddings for RAG (Retrieval-Augmented Generation). Each tenant gets its own collection with isolated HNSW indexes.
Sizing (Year 1):
Vectors: 240 paid tenants x 500 entities x 10 chunks x 384 dims x 4 bytes
= ~1.8 GB vectors
HNSW: ~2x vectors = ~3.6 GB indexes
Total: ~5.5 GB storage, ~4 GB RAM for hot indexes
Collections: 240 (one per paid tenant)Configuration:
QDRANT_URL=http://qdrant-host:6333
QDRANT_API_KEY=your-api-keyRedis
Redis handles transient state that benefits from sub-millisecond latency:
| Usage | Data Stored | Persistence |
|---|---|---|
| Sessions | JWT metadata (tenant_id, user_id, role — no PII) | TTL-based eviction |
| Rate limits | Request counts per tenant per endpoint | TTL-based eviction |
| Presence | Online user indicators per tenant | TTL-based eviction |
| BrainBits | Credit balance tracking | Persisted (AOF) |
| PubSub | Real-time event channels per tenant | Transient |
| AI cache | Cached generation results | TTL-based eviction |
Memory usage (Year 1): ~25 MB active data. 2 GB allocated RAM provides extreme headroom.
Configuration:
REDIS_URL=redis://studiobrain_app:password@redis-host:6379/0The AI service uses a separate Redis database number to avoid key collisions:
REDIS_URL=redis://studiobrain_ai:password@redis-host:6379/1Security Zones
Zone 1: Restricted (Auth DB)
The Auth DB is in a restricted security zone. Access rules:
- Firewall: Only the app backend server IP can connect on port 5432. All other IPs are blocked.
- AI service is BLOCKED. The AI service cannot reach the Auth DB — enforced by firewall rules.
- Dedicated DB user:
studiobrain_authwith CRUD permissions. No superuser access. - All queries logged via
pgauditextension. - Full disk encryption on the host.
Zone 2: Standard (Content DB, Qdrant, Redis)
Content services are in a standard security zone. Access rules:
- App backend: Full CRUD access with RLS enforcement.
- AI service: Read-only access to Content DB (
studiobrain_ai_rouser). Full access to Qdrant and Redis. - No PII exposure. The content zone never sees user emails, names, or billing information.
JWT Token Design
JWTs bridge the two zones. The token contains only UUIDs and roles — never PII:
{
"sub": "550e8400-e29b-41d4-a716-446655440000",
"tenant_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"role": "editor",
"plan": "indie",
"iat": 1771882000,
"exp": 1771968400
}The content layer never needs to know who you are — only what tenant you belong to and what role you have.
Database Router
StudioBrain uses a DatabaseRouter class that routes SQLAlchemy operations to the correct database engine based on the deployment mode.
DeploymentMode Enum
class DeploymentMode(Enum):
DESKTOP = "desktop" # SQLite, single user
SELF_HOSTED = "self_hosted" # Single PostgreSQL instance
CLOUD = "cloud" # Separate Auth + Content databasesRouting Logic
class DatabaseRouter:
AUTH_MODELS = {
'tenants', 'users', 'subscriptions', 'api_keys',
'oauth_tokens', 'audit_logs', 'revoked_tokens',
'usage_metrics', 'stripe_events'
}
def get_engine(self, model_class):
if self.mode == DeploymentMode.DESKTOP:
return self.default_engine # SQLite for everything
if model_class.__tablename__ in self.AUTH_MODELS:
return self.auth_engine # Auth DB for PII models
return self.content_engine # Content DB for everything elseIn desktop mode, all models use the same SQLite engine. In cloud mode, auth-related models are routed to the Auth DB and content models to the Content DB.
Schema Parity
The same SQLAlchemy models work across all modes. The differences are minimal:
| Feature | Desktop (SQLite) | Self-Hosted (PostgreSQL) | Cloud (Split PostgreSQL) |
|---|---|---|---|
| All content tables | Yes (tenant_id='local') | Yes | Yes (Content DB + RLS) |
| Users table | Yes (single user) | Yes | Yes (Auth DB) |
| Tenants table | No | Yes | Yes (Auth DB) |
| Subscriptions | No | No | Yes (Auth DB) |
| Audit logs | No | Yes | Yes (Auth DB) |
| RLS policies | No | No | Yes |
Database Users
Follow the principle of least privilege. Each service gets a dedicated database user with only the permissions it needs.
| DB User | Instance | Permissions | Used By |
|---|---|---|---|
studiobrain_auth | Auth DB | Full CRUD on auth schema | App backend (auth layer) |
studiobrain_auth_ro | Auth DB | SELECT only | Admin dashboard (future) |
studiobrain_app | Content DB | Full CRUD with RLS | App backend (content layer) |
studiobrain_ai_ro | Content DB | SELECT only with RLS | AI service (read-only) |
studiobrain_sync | Content DB | INSERT/UPDATE with RLS | Sync protocol endpoints |
Creating Database Users
-- Auth DB
CREATE USER studiobrain_auth WITH PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO studiobrain_auth;
-- Content DB (read-write for app)
CREATE USER studiobrain_app WITH PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO studiobrain_app;
-- Content DB (read-only for AI service)
CREATE USER studiobrain_ai_ro WITH PASSWORD 'secure_password';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO studiobrain_ai_ro;Alembic Migrations
StudioBrain uses Alembic for database schema migrations. The migration system is deployment-mode-aware: it applies different changes based on whether the target is SQLite, self-hosted PostgreSQL, or cloud PostgreSQL.
Generating a Migration
# Enter the backend container
docker exec -it studiobrain-backend bash
# Generate a new migration
cd /app/backend
alembic revision --autogenerate -m "add_new_field_to_entities"Reviewing a Migration
Always review the generated migration before applying:
# View the migration file
cat alembic/versions/xxxx_add_new_field_to_entities.pyCheck that:
- New columns have sensible defaults
- Indexes are added for frequently queried columns
- Cloud-only changes are wrapped in deployment mode checks
- The
downgrade()function correctly reverses all changes
Applying Migrations
# Apply all pending migrations
alembic upgrade head
# Apply to a specific revision
alembic upgrade abc123
# Check current state
alembic currentRolling Back
# Roll back one migration
alembic downgrade -1
# Roll back to a specific revision
alembic downgrade abc123Mode-Aware Migrations
Migrations can conditionally apply changes based on deployment mode:
from alembic import op
from config import get_deployment_mode, DeploymentMode
def upgrade():
mode = get_deployment_mode()
# Applied in ALL modes (including SQLite)
op.add_column('entities',
sa.Column('new_field', sa.String(), server_default=''))
# Applied only in cloud mode (PostgreSQL with RLS)
if mode == DeploymentMode.CLOUD:
op.execute('ALTER TABLE entities ENABLE ROW LEVEL SECURITY')
op.execute("""
CREATE POLICY tenant_isolation ON entities
USING (tenant_id = current_setting('app.current_tenant', true))
""")Backup and Restore
Auth Database
Daily full backup:
pg_dump -h auth-host -U studiobrain_auth -d studiobrain_auth \
-F c -f /backups/auth/auth_$(date +%Y%m%d_%H%M%S).dumpPoint-in-time recovery (PITR) setup:
# postgresql.conf on Auth DB host
wal_level = replica
archive_mode = on
archive_command = 'cp %p /backups/auth/wal/%f'Restore from full backup:
pg_restore -h auth-host -U postgres -d studiobrain_auth \
--clean --if-exists /backups/auth/auth_20260224.dumpContent Database
Daily full backup:
pg_dump -h content-host -U studiobrain_app -d studiobrain_content \
-F c -f /backups/content/content_$(date +%Y%m%d_%H%M%S).dumpPer-tenant export (for data portability):
pg_dump -h content-host -U studiobrain_app -d studiobrain_content \
--table='entities' --where="tenant_id='tenant-uuid'" \
-f /backups/tenant_export.sqlQdrant
# Create snapshot
curl -X POST "http://qdrant-host:6333/snapshots"
# List snapshots
curl "http://qdrant-host:6333/snapshots"
# Restore from snapshot
curl -X PUT "http://qdrant-host:6333/snapshots/{snapshot_name}/recover"Redis
# Trigger snapshot
redis-cli -h redis-host -a password BGSAVE
# Copy RDB file
scp redis-host:/var/lib/redis/dump.rdb /backups/redis/redis_$(date +%Y%m%d).rdbSQLite (Desktop)
# Online backup (no downtime)
sqlite3 /data/db/city_brains.db ".backup '/backups/city_brains_$(date +%Y%m%d).db'"Backup Strategy Summary
| Service | Frequency | Method | Retention | RPO | RTO |
|---|---|---|---|---|---|
| Auth DB | Continuous + daily full | WAL archiving + pg_dump | 90 days | 0 (WAL) | 15 min |
| Content DB | Hourly WAL + daily full | WAL archiving + pg_dump | 30 days | 0 (WAL) | 15 min |
| Qdrant | Daily | Snapshot API | 7 days | 24h | 30 min |
| Redis | Hourly | RDB snapshot | 24 hours | 1h | 5 min |
| Content files | Hourly | ZFS snapshots or rsync | 30 days | 1h | Instant |
The Auth DB has the most aggressive backup strategy because it is the hardest to rebuild. The Content DB can theoretically be rebuilt from user’s local markdown files via the sync protocol.
Connection Strings Reference
Format
| Database | Format |
|---|---|
| SQLite | sqlite:////absolute/path/to/database.db |
| PostgreSQL | postgresql://user:password@host:5432/database |
| PostgreSQL (SSL) | postgresql://user:password@host:5432/database?sslmode=require |
| Qdrant | http://host:6333 |
| Redis | redis://user:password@host:6379/db_number |
Service-to-Database Mapping
| Service | Auth DB | Content DB | Qdrant | Redis |
|---|---|---|---|---|
| App Backend | Read/Write | Read/Write | Read/Write | Read/Write (DB 0) |
| AI Service | BLOCKED | Read-Only | Read/Write | Read/Write (DB 1) |
| Admin Dashboard | Read-Only | --- | --- | --- |
Scaling
Year 1 Sizing (2,400 users, ~600 tenants)
| Service | Storage | RAM | Notes |
|---|---|---|---|
| Auth DB | 10 GB | 2 GB | ~515 MB data, mostly audit logs |
| Content DB | 50 GB | 4 GB | ~1.2 GB data for 240 paid tenants |
| Qdrant | 20 GB | 8 GB | ~5.5 GB vectors + indexes |
| Redis | 5 GB | 2 GB | ~25 MB active data |
Connection Pooling
For deployments beyond 500 concurrent connections, add PgBouncer in front of PostgreSQL:
# pgbouncer.ini
[databases]
studiobrain_content = host=content-host port=5432 dbname=studiobrain_content
[pgbouncer]
listen_port = 6432
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25Update the connection string to point to PgBouncer:
CONTENT_DATABASE_URL=postgresql://studiobrain_app:password@pgbouncer-host:6432/studiobrain_contentScaling Path
| Scale | Auth DB | Content DB | Qdrant | Redis |
|---|---|---|---|---|
| Year 1 (2,400 users) | Single instance, 10 GB | Single instance, 50 GB | Single instance, 20 GB | Single instance, 2 GB |
| Year 2 (10K users) | Add read replica | Add PgBouncer, consider Citus | Add replica for read scaling | Redis Sentinel (HA) |
| Year 3 (50K+ users) | Dedicated HA cluster | Citus distributed by tenant_id | Qdrant distributed mode | Redis Cluster |
| Enterprise tenant | Stays shared (small data) | Dedicated instance option | Dedicated Qdrant node | Shared (per-tenant namespace) |
Qdrant Sharding
As the vector count grows, Qdrant supports horizontal scaling:
# Create a distributed collection with 3 shards
curl -X PUT "http://qdrant-host:6333/collections/tenant_collection" \
-H "Content-Type: application/json" \
-d '{
"vectors": {"size": 384, "distance": "Cosine"},
"shard_number": 3,
"replication_factor": 2
}'PostgreSQL Read Replicas
For read-heavy workloads (particularly the AI service reading entity content), add streaming replicas:
# primary postgresql.conf
wal_level = replica
max_wal_senders = 3
# replica recovery.conf
primary_conninfo = 'host=primary-host port=5432 user=replicator password=password'Point the AI service’s read-only connection to the replica:
CONTENT_DATABASE_URL=postgresql://studiobrain_ai_ro:password@replica-host:5432/studiobrain_contentDesktop-to-Cloud Migration
Users who start on the desktop (SQLite) tier can migrate to cloud (PostgreSQL) when they upgrade their plan.
Migration Flow
- User upgrades to Indie or Team plan via Stripe checkout.
- A tenant record is created in the Auth DB.
- The desktop app prompts: “Import your local project to the cloud?”
- Desktop sends the SQLite database to
POST /api/migrate/import. - Backend reads SQLite entities, rewrites
tenant_idfrom'local'to the new tenant UUID. - Entities are inserted into the Content DB with RLS.
- Qdrant embeddings are generated from entity content.
- Desktop switches to cloud sync mode.
Reverse Export
Cloud data can be exported back to SQLite for offline use:
curl -H "Authorization: Bearer $TOKEN" \
"http://localhost:8201/api/migrate/export" \
-o studiobrain_export.dbThis creates a standalone SQLite database with all tenant entities, with tenant_id rewritten to 'local' for desktop compatibility.