PostgreSQL Database Design Rules: Complete Guide with Examples
Ever had to fix a production problem at 3 a.m.? The real cause is often bad database design choices made long ago. Those we’ll fix it later or the app will handle it decisions can blow up months later, causing downtime and messed-up data.
These 25 rules come from building systems that handle tons of data. Think of it as a guide from the trenches, not a boring lecture. Each rule has a good way and a bad way example, so you can easily see the difference.
Explore our comprehensive list of Free AI Tools: 31 Best Powerful Tools You Can Use Every Day to find more incredible resources.
Table of Contents
I. Basic Schema Stuff
1) Always use a substitute primary key
Important: Really important.
Real-world keys (like email or username) can change. If they do, you’ll have to update millions of rows and every related foreign key. Substitute keys (BIGSERIAL / UUID) are reliable and won’t cause headaches.
-- ✅ Good:
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- ❌ Bad:
CREATE TABLE users (
email TEXT PRIMARY KEY -- This will cause problems down the road!
);
2) Every table needs created_at and updated_at
Important: Super important.
Without these, you can’t easily track what happened when. They’re crucial for debugging, audits, and moving data. Use TIMESTAMPTZ, not basic timestamps.
-- ✅ Good:
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
-- ❌ Bad:
-- No timestamp columns at all
-- Or TIMESTAMP without a timezone
3) Use TIMESTAMPTZ instead of TIMESTAMP
Important: Pretty important.
TIMESTAMP drops timezone info. Once you have servers in different timezones, your data gets confusing. TIMESTAMPTZ handles timezones correctly behind the scenes.
-- ✅ Good:
event_time TIMESTAMPTZ NOT NULL
-- ❌ Bad:
event_time TIMESTAMP -- Timezone info is lost!
4) Use TEXT instead of VARCHAR(n)
Important: Okay, this one’s good to know.
In PostgreSQL, TEXT and VARCHAR perform the same. VARCHAR(n) just adds a length check that you’ll probably have to change later. If you need limits, use constraints.
-- ✅ Good:
name TEXT NOT NULL,
CONSTRAINT chk_name_len CHECK (length(name) <= 255)
-- ❌ Bad:
name VARCHAR(255) -- You'll probably need to change this later.
5) Use BIGINT / BIGSERIAL for IDs, not INT
Important: This could save you from a major problem.
INT only goes up to about 2.1 billion. Some companies have actually hit this limit! BIGINT gives you way more room to grow.
-- ✅ Good:
id BIGSERIAL PRIMARY KEY
-- ❌ Bad:
id SERIAL PRIMARY KEY -- A potential limit in the future
II. Relationships and Keys
6) Always use foreign keys
Important: Really important.
Without them, you’ll end up with orphaned data. Saying the app will handle it isn’t good enough. Bugs happen!
-- ✅ Good:
user_id BIGINT NOT NULL
REFERENCES users(id)
ON DELETE CASCADE
-- ❌ Bad:
user_id BIGINT -- The app will check it. (It won't always.)
7) Think carefully about ON DELETE
Important: This can prevent data loss or broken features.
The default is RESTRICT (which stops deletions). CASCADE deletes related data. SET NULL keeps the row but clears the link. Pick the wrong one, and you’ll either block operations or delete important stuff.
-- ✅ Good:
REFERENCES orders(id) ON DELETE CASCADE -- Delete order items when the order is removed.
REFERENCES users(id) ON DELETE SET NULL -- Keep the order, but clear the user if the user is deleted.
REFERENCES accounts(id) -- Don't allow deleting accounts that have orders.
-- ❌ Bad:
-- Not thinking about what happens when you delete something.
8) Use junction tables for many-to-many relationships
Important: Seriously, do this.
Don’t store M:N relations as comma-separated stuff. Junction tables let you index, join, and add constraints.
-- ✅ Good:
CREATE TABLE user_roles (
user_id BIGINT REFERENCES users(id),
role_id BIGINT REFERENCES roles(id),
granted_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (user_id, role_id)
);
-- ❌ Bad:
role_ids INTEGER[] -- Hard to join, no integrity
roles TEXT -- 'admin,editor' (yuck)
9) Index every foreign key column
Important: Helps performance a lot.
PostgreSQL doesn’t automatically make indexes for foreign keys. Without one, deleting or updating a parent row can be super slow.
-- ✅ Good:
CREATE INDEX idx_orders_user_id
ON orders(user_id);
-- ❌ Bad:
-- Foreign key without an index = slow queries
10) Use soft deletes for important data
Important: You’ll thank yourself later.
Hard deletes are forever. Add a deleted_at column and filter by it. Use partial indexes to keep things fast.
-- ✅ Good:
deleted_at TIMESTAMPTZ DEFAULT NULL;
CREATE INDEX idx_users_active
ON users(email)
WHERE deleted_at IS NULL;
-- ❌ Bad:
DELETE FROM users WHERE id = 42;
-- Gone forever! And no record of it.
III. Keeping Your Data Clean
11) Normalize first; denormalize only when needed
Start with a good, normalized design. Denormalizing (adding redundant data) can speed up reads, but it makes writes harder and can lead to inconsistent data. Document why you did it.
-- ✅ Good:
orders.user_id → users.id → users.email
-- Only denormalize after measuring:
-- Copied email into orders to speed billing report:
-- 800ms → 12ms on 50M rows
-- ❌ Bad:
-- Copying `user_name`, `user_email` everywhere because it's easier.
12) Default to NOT NULL; only use NULL on purpose
NULL causes tricky logic. NULL != NULL, comparisons return NULL, and aggregates ignore values. Every nullable column adds extra complexity.
-- ✅ Good:
status TEXT NOT NULL DEFAULT 'pending',
deleted_at TIMESTAMPTZ -- NULL means not deleted
-- ❌ Bad:
name TEXT, -- Nullable by accident
price NUMERIC -- Is NULL the same as 0? Who knows?
13) Use CHECK constraints for validation
App validation can be bypassed. Database constraints are the last line of defense.
-- ✅ Good:
CONSTRAINT chk_price_positive CHECK (price > 0),
CONSTRAINT chk_status_valid CHECK (status IN ('active','inactive','suspended'))
-- ❌ Bad:
-- The API validates it.
-- Meanwhile, the database has `price = -500` and `status = 'yolo'`
14) Use NUMERIC for money; never FLOAT/DOUBLE
Floating-point math isn’t accurate. 0.1 + 0.2 becomes a weird number. Use NUMERIC(precision, scale) or store cents as BIGINT.
-- ✅ Good:
price NUMERIC(12,2) NOT NULL,
balance NUMERIC(15,2) NOT NULL
-- Or: price_cents BIGINT NOT NULL
-- ❌ Bad:
price FLOAT
price DOUBLE PRECISION
15) Be careful with ENUMs – Use CHECK or lookup tables if needed
PostgreSQL ENUMs can be tough to change. You can add values, but deleting or renaming is tricky. CHECK or reference tables are more flexible.
-- ✅ Good:
status TEXT NOT NULL
CHECK (status IN ('draft','published'));
-- Or lookup table:
-- status_code TEXT REFERENCES statuses(code)
-- ❌ Bad:
CREATE TYPE status AS ENUM ('draft','published');
IV. Making Things Fast
16) Index your filters, joins, and sorts
Without indexes, PostgreSQL reads everything. On millions of rows, that’s the difference between fast and slow.
-- ✅ Good:
CREATE INDEX idx_orders_user_status
ON orders(user_id, status)
WHERE deleted_at IS NULL;
-- ❌ Bad:
-- We'll add indexes when it gets slow. (It will be slow, at 3 a.m.)
17) Use partial indexes when you only need a slice
If most rows are completed and you only query pending, a full index wastes space. Partial indexes are smaller and faster.
-- ✅ Good:
CREATE INDEX idx_orders_pending
ON orders(created_at)
WHERE status = 'pending';
-- ❌ Bad:
CREATE INDEX idx_orders_created
ON orders(created_at);
18) EXPLAIN ANALYZE before you deploy
Don’t guess about performance. EXPLAIN ANALYZE shows the query plan, row counts, and time. A Seq Scan on a big table is a bad sign.
-- ✅ Good:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending';
-- ❌ Bad:
-- Deploying because it worked on my machine with 10 rows.
19) Use connection pooling (PgBouncer)
Each PostgreSQL has overhead. Too many direct things can use up memory. PgBouncer shares things and prevents issues.
-- ✅ Good:
App → PgBouncer (6432) → PostgreSQL
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
-- ❌ Bad:
App → PostgreSQL directly (500 )
-- And then the server runs out of memory.
V. Changes and Keeping Things Running
20) Plan column changes carefully
ALTER TABLE can lock tables for a while. The safe way: add column → fill in data → switch → remove old column.
-- ✅ Good:
ALTER TABLE users ADD COLUMN name_new TEXT; -- fast
-- Then fill in the data in small batches.
-- Update the app to use `name_new`.
-- Remove the old column later.
-- ❌ Bad:
ALTER TABLE users RENAME COLUMN name TO full_name;
-- The app breaks right away.
21) UUID v7 for distributed systems; BIGSERIAL for single-node
BIGSERIAL is small and fast. UUIDs are good when coordinating is hard (microservices, multi-region, offline data). UUIDv7 is better than v4 for sorting and indexing.
-- ✅ Good:
-- Single PostgreSQL :
id BIGSERIAL PRIMARY KEY
-- Distributed/microservices:
id UUID PRIMARY KEY DEFAULT gen_random_uuid(); -- v4
-- Or generate UUIDv7 in the application code.
-- ❌ Bad:
-- UUID v4 as the main key on big tables:
-- Random inserts = slow performance
22) Wrap operations in transactions
Without deals, each step saves on its own. If step 2 fails, your data problems.
-- ✅ Good:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
-- ❌ Bad:
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- If it crashes here = money gone
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
23) Divide big tables (100M+ rows)
Divide by time or group. Queries that look at one division don’t have to read the others. Maintenance becomes easier.
-- ✅ Good:
CREATE TABLE events (
id BIGSERIAL,
created_at TIMESTAMPTZ NOT NULL,
payload JSONB
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2025_01
PARTITION OF events
FOR VALUES FROM ('2025-01-01')
TO ('2025-02-01');
-- ❌ Bad:
-- 500M rows in one table
-- Maintenance takes forever.
-- Every query is slow.
24) Store JSON as JSONB, not JSON or TEXT
JSONB is binary, indexable, and supports operators. JSON is text and gets re-parsed. TEXT doesn’t even have a structure.
-- ✅ Good:
metadata JSONB NOT NULL DEFAULT '{}';
CREATE INDEX idx_meta_gin
ON products USING GIN(metadata);
-- ❌ Bad:
metadata JSON -- Gets re-parsed
metadata TEXT -- No check
25) Use Row-Level Security (RLS) for multi-tenant things
Bugs can show data. RLS keeps things separate inside to prevent problems.
-- ✅ Good:
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON documents
USING (tenant_id = current_setting('app.tenant_id'));
-- ❌ Bad:
-- `WHERE tenant_id = ?` in every query
-- If you miss it, you'll have leaks.
Naming Tips
Be consistent with names!
- Tables: plural,
snake_case→users,order_items - Primary keys: always
idasBIGSERIALorUUID - Foreign keys:
{singular_table}_id→user_id,order_id - Indexes:
idx_{table}_{columns}→idx_users_email - Constraints:
chk_{table}_{desc}/uq_{table}_{cols} - Timestamps:
created_at+updated_ateverywhere; alwaysTIMESTAMPTZ
Final Words
These rules are based on experience. The bad examples have caused real incidents.
Start by checking your design against the first five rules. You’ll probably find some.
Save this guide and share it.
FAQ
1. Why shouldn’t I use email as a primary key in PostgreSQL?
Because real-world values like emails can change. Updating them means touching related foreign keys and potentially millions of rows. A surrogate key like BIGSERIAL or UUID avoids that risk.
2. Is TIMESTAMPTZ really better than TIMESTAMP?
Yes. TIMESTAMPTZ safely handles time zones and prevents data confusion when your app or servers run in different regions. TIMESTAMP can silently cause inconsistent time data.
3. When should I use BIGSERIAL vs UUID for IDs?
Use BIGSERIAL for single-node PostgreSQL setups where performance and simplicity matter. Use UUID (preferably v7) in distributed or microservice systems where coordination is harder.
4. Do I really need foreign keys if my app validates everything?
Yes. Application validation can fail or be bypassed. Foreign keys protect your data at the database level and prevent orphaned or inconsistent records.
5. When should I partition a PostgreSQL table?
Consider partitioning when a table reaches 100M+ rows or when queries consistently filter by time or tenant. Partitioning improves performance and simplifies maintenance.
Source: Umidjon Tojiboyev