Why We Chose SQLite
We assumed PostgreSQL for Version 3. After a decade running Postgres in production, why would we even consider something else? We knew the tooling, the failure modes, the operational playbook. Postgres is the safe choice for good reason.
Then we looked at what "self-hosting Postgres" actually involves, compared it to what Litestream does for SQLite, and changed our minds. This post covers the decision, the architecture, the trade-offs, and why we'd make the same call again.
What Version 2 Runs On
Curling IO Version 2 runs on Rails backed by PostgreSQL on Crunchy Bridge, a fantastic managed Postgres service. It works. Crunchy Data handles backups, failover, and tuning. We don't think about shared_buffers or autovacuum settings. We don't run pgBackRest or schedule base backups. (Crunchy Data was acquired by Snowflake in 2025, which adds another reason to reduce our dependency on third-party managed services.)
The trade-off is cost, lock-in, and jurisdiction. Crunchy Bridge is AWS-only, so the app servers have to live on AWS too. That's the full stack dependent on a single US cloud provider. We're a 100% Canadian company and we'd rather keep our infrastructure closer to home, especially given the current political uncertainty south of the border.
For Version 3, we're moving to OVH, a French-owned provider with data centers in Quebec and Ontario. All data stays in Canada. Crunchy Bridge doesn't run on OVH, so we can't bring it along. Self-hosting Postgres on OVH would mean taking on everything Crunchy handles today: connection pooling, server tuning (shared_buffers, effective_cache_size, work_mem, max_connections), autovacuum monitoring, and the full backup story (pgBackRest, WAL archiving, scheduled base backups, tested restore procedures, monitoring backup freshness). That's a lot to take on for a small organization.
The Architecture: Database Per Sport
Version 3 uses a database-per-sport architecture. Each sport gets its own SQLite file with an identical schema:
db/
├── curling.db # All curling club data
├── pickleball.db # All pickleball club data
└── shared.db # Cross-sport reference data, jobs, email suppressions
Each sport database is multi-tenant. Every table with tenant data includes an org_id column, and every query filters by it. Adding a new sport means creating a new database file with the same schema. No cross-sport queries, no shared-database contention.
shared.db holds things that span sports: Canadian tax jurisdictions (17 rows covering GST, HST, PST, and QST rates by province), the background job queue, and email suppression lists from Postmark webhooks.
At startup, the server opens one connection per database and holds it for the lifetime of the app:
pub fn open(path: String) -> Result(sqlight.Connection, sqlight.Error) {
use conn <- result.try(sqlight.open(path))
use _ <- result.try(sqlight.exec("PRAGMA journal_mode=WAL;", conn))
use _ <- result.try(sqlight.exec("PRAGMA busy_timeout=5000;", conn))
use _ <- result.try(sqlight.exec("PRAGMA foreign_keys=ON;", conn))
Ok(conn)
}
Three PRAGMAs on every connection: WAL mode for concurrent reads, a 5-second busy timeout so writes queue instead of failing immediately, and foreign key enforcement (which SQLite disables by default).
One connection per database. No connection pool. No pool configuration. Each incoming HTTP request gets the sport-specific connection (resolved from the hostname) plus the shared connection.
Why In-Process SQLite
SQLite runs inside the application process. There's no socket, no TCP round-trip, no serialization between the app and the database. A query is a function call.
Our workload is primarily simple indexed lookups: single-row fetches by primary key, filtered lists by org_id, a handful of line items joined to an order. SQLite handles tens of thousands of write transactions per second, far beyond what 1,000+ curling clubs will generate. Even during peak registration or when a provincial championship is posting live scores, the load profile is overwhelmingly reads with occasional write bursts.
The cost follows from the architecture. No separate database server means no database hosting bill. The entire v3 stack runs on a single server for less than what managed Postgres alone costs on AWS.
WAL Mode and Concurrency
WAL (Write-Ahead Logging) is what makes SQLite viable for a concurrent web application. Without it, any write locks the entire database for both readers and writers. With WAL, readers proceed concurrently with writes. On the BEAM, where hundreds of lightweight processes might query simultaneously during a traffic spike, this is essential.
Writes are still serialized. One writer at a time, coordinated by a mutex in the SQLite library. If the write lock can't be acquired within the 5-second busy timeout, SQLite returns SQLITE_BUSY. With our read-heavy workload, write contention hasn't been a practical concern. This is why we use a separate database for background jobs. Job processing writes to shared.db, not the sport databases, so it never contends with registration queries. With SQLite's single-writer model, separating write-heavy workloads into different files is a real architectural consideration, not just organization.
Litestream: The Tipping Point
This is what tipped the scales.
SQLite is a file. Litestream watches that file and continuously replicates its WAL frames to S3-compatible object storage. For us, that's OVH Object Storage in two different geographic locations in Canada. Changes sync every 10 seconds to both.
The configuration is one YAML file:
dbs:
- path: /var/lib/curling-io/databases/curling.db
replicas:
- type: s3
bucket: xxxxx
endpoint: xxxxx
sync-interval: 10s
- type: s3
bucket: xxxxx
endpoint: xxxxx
sync-interval: 10s
Recovery is one command:
litestream restore -o /path/to/curling.db s3://xxxxx/curling.db
That produces a ready-to-use database file. Copy it to the right path, start the app.
Now compare that to self-hosting PostgreSQL backup and recovery:
- Install pgBackRest
- Configure a stanza, set
archive_modeandarchive_commandinpostgresql.conf - Schedule base backups via cron
- Monitor backup success, WAL retention, and disk usage
- Periodically test restores (because an untested backup isn't a backup)
- When disaster strikes: stop PostgreSQL, run
pgbackrest restore --type=time --target="...", wait for WAL replay, verify the recovery point, promote to primary
PostgreSQL gives you something Litestream doesn't: exact transaction-level point-in-time recovery. You can roll back to any specific transaction. Litestream gives you "latest snapshot plus WAL frames," which means a worst-case recovery gap of about 10 seconds. For our use case that's more than sufficient.
The operational gap is where it gets decisive. Litestream is one binary, one config file, near-zero ongoing maintenance. PostgreSQL backup is a practice, something you rehearse, monitor, and maintain over the life of the project. Even with perfect tooling, you can't reduce "restore a running server's state" to "download a file and start the app." That gap is architectural. It won't close with better Postgres tooling because the complexity is inherent to backing up a running server process.
The simplicity ceiling for SQLite backup is fundamentally higher than what's possible for any client-server database.
The SQLite Tax
Here's what we pay for that simplicity. Every item below is something PostgreSQL handles natively.
No Real Types
SQLite doesn't have booleans, dates, or enums. Booleans are stored as INTEGER 0/1. Dates are Unix epoch integers. Enums are TEXT strings.
Every boolean parameter needs a db.bool_to_int(value) call. Every boolean decoder needs a != 0 check. All date formatting, timezone conversion, and comparison lives in application code. There's no DATE_TRUNC, no INTERVAL, no AT TIME ZONE.
We use STRICT tables on everything, which enforces that values match declared column types (INTEGER, TEXT, REAL, BLOB). It's a free safety net for anything that bypasses the application layer, but it doesn't help with booleans, dates, or enums. Those stay as application-level concerns, but since Gleam and Parrot are the only source of writes and types are enforced there, it's manageable.
Limited ALTER TABLE
SQLite stores its schema as the original CREATE TABLE text, not as structured system catalogs like PostgreSQL. This means ALTER TABLE is minimal:
ADD COLUMN,RENAME COLUMN,DROP COLUMNall work- Changing a column's type, default, or
NOT NULLdoes not - Adding or modifying
CHECKorFOREIGN KEYconstraints does not - Adding
UNIQUEto an existing column does not
Changing a column post-launch requires a four-step dance: add the new column, copy the data, drop the old column, rename. PostgreSQL handles this with a single ALTER COLUMN.
No CHECK Constraints
We don't use CHECK constraints anywhere. This is a policy decision, and not one we're thrilled about.
CHECK constraints are embedded in the CREATE TABLE DDL string. Modifying one means a full table rebuild. Adding a new enum value shouldn't require that. So all validation lives in Gleam: union types for enums (with exhaustive pattern matching at compile time), validation functions for business logic, and let assert in decoders to crash fast if bad data gets in.
Write Serialization
As covered above, writes are serialized. In practice this barely matters because there's no network round-trip and no protocol overhead. A write is a function call that completes in microseconds. The lock is held for such a short window that writers rarely wait for each other at all. But it does rule out horizontal scaling to multiple app servers writing to the same database.
Single Node Only
SQLite is a file on local disk. Multiple app servers can't share it (NFS breaks SQLite's file locking and leads to corruption). We're locked into a single-server architecture. On a BEAM runtime that handles massive concurrency on one node, this is fine for well over 100,000 clubs. It's a hard ceiling if we ever need more, and if we do, that's a really good problem to have.
There are projects that add distributed replication to SQLite (LiteFS, rqlite, dqlite), but they add significant complexity: FUSE filesystems, consensus protocols, additional infrastructure. At that point you're fighting to make SQLite behave like Postgres, and you should probably just use Postgres.
NIF Risk
Our SQLite driver (esqlite) is an Erlang NIF - compiled C code that runs outside the BEAM scheduler's control. A long-running query blocks a scheduler thread. A segfault takes down the entire VM.
In practice, SQLite's C library is near-bulletproof, and our queries return in microseconds. We mitigate the risk by keeping all queries simple (indexed lookups, no full table scans) and pushing anything heavy to background jobs. But the risk surface exists.
SQL Dialect Gaps
Small things that add friction:
DELETE FROM t WHERE ... LIMIT 1doesn't work. You need a subquery:DELETE FROM t WHERE id = (SELECT id FROM t WHERE ... LIMIT 1).- No
LISTEN/NOTIFYfor real-time push from the database. Change notification goes through application-level pub/sub. NULLcomparisons needIS ?instead of= ?when the parameter can be NULL.
Why We Accept It
Every tax above is an application-level workaround. Type conversions, date formatting, schema evolution dances, validation in Gleam instead of CHECK constraints. None of them change the architecture. They're annoyances, not obstacles.
The things that would actually force a migration (write contention, multi-node scaling, database size) aren't happening at 1,000 clubs with a read-heavy workload on the BEAM.
When We'd Switch to PostgreSQL
Honestly, we don't see it happening anytime soon. Most of the limits people worry about with SQLite (database size, write throughput, busy errors) can be pushed further by upgrading the server. Vertical scaling is cheap and simple, and the BEAM makes excellent use of whatever hardware you give it.
The only real trigger would be needing multiple application servers, and that would mean we've seriously outgrown our predictions or miscalculated. If that happens, the migration path is mechanical, not architectural. Swap sqlight for gleam_pgo. Adjust SQL dialect across all query files: ? becomes $1, $2, ..., INTEGER PRIMARY KEY becomes GENERATED ALWAYS AS IDENTITY, booleans become real booleans, remove the DELETE subquery workarounds. Separate .db files map to PostgreSQL schemas within a single database. Business logic and request handlers stay the same.
What's Next
If SQLite hits a wall, we switch to Postgres. The exit strategy is defined and the migration is a dialect swap, not a rewrite.
But so far, Litestream plus in-process SQLite has been one of the best decisions in the v3 rewrite. And there's one more benefit worth its own post: because SQLite runs in-process, every test gets its own in-memory database. No shared state, no cleanup, no flaky tests. That's next.
This is Part 5 of the Curling IO Foundation series. Next up: test isolation for free with SQLite.