Schema Conventions That Let AI Agents Write Correct Code

We use AI coding agents extensively at smplkit. Not as a novelty — as a core part of the development workflow. Claude Code writes implementation code, runs tests, monitors CI, and auto-remediates failures. This is fast when it works. When it doesn’t, the failure mode is usually the same: the agent makes a schema assumption that’s wrong.

A column named userId instead of user_id. A table named accounts instead of account. A VARCHAR(255) where we use TEXT. A missing JSONB data column. Each mistake requires a correction cycle — the agent produces code, it fails, the agent reads the error, it tries again. Multiply this across dozens of tables and hundreds of queries, and inconsistency becomes a serious productivity drag.

So we wrote down the rules. All of them. In enough detail that an AI agent — or a new human developer — can produce correct schema code on the first try.

The Conventions

These are the database schema conventions that govern every table in every smplkit service.

Naming

Table names are singular. account, not accounts. user, not users. A table holds rows, and each row represents one entity. account_user for the join table, not accounts_users.

Column names use snake_case. created_at, account_id, stripe_customer_id. No camelCase, no PascalCase, no abbreviations beyond the universal ones.

Constrained values use SCREAMING_SNAKE_CASE. Values like ACTIVE, OWNER, API_KEY, and MONTHLY are uppercase to visually distinguish them from freeform text. This is a convention enforced in application code, not a database-level enum — which means adding a new value never requires a migration.

Primary Keys

Every table — including join tables — has an id column as its primary key. UUIDs, generated by the application (uuid4()), with a DEFAULT gen_random_uuid() fallback in the database for raw SQL inserts.

Why UUIDs over auto-incrementing integers? They’re globally unique without coordination, safe to generate in application code before the database round-trip, and don’t leak information about table size or insertion order.

Why id on join tables? A composite primary key on (account_id, user_id) is theoretically sufficient, but a surrogate id simplifies ORM usage (SQLAlchemy works more naturally with single-column PKs), API resource identification (single-value URLs), and references from audit logs or activity feeds.

String Columns

All string columns use TEXT, not VARCHAR(n). PostgreSQL stores TEXT and VARCHAR(n) identically — the length parameter is just a check constraint. Length validation belongs in the application layer (Pydantic schemas) where it produces better error messages and doesn’t require a migration when limits change.

Standard Columns

Every non-join table gets these columns:

id (UUID PK), account_id (tenant scoping), created_at (timestamptz), updated_at (timestamptz, managed by SQLAlchemy onupdate), deleted_at (nullable timestamptz for soft delete), version (nullable integer for optimistic concurrency), and data (JSONB, defaults to '{}').

The deleted_at and version columns are present on every table even if the table doesn’t currently use soft deletes or optimistic concurrency. They default to NULL and impose zero overhead until opted into. The alternative — adding them later — requires a migration, which is exactly the kind of schema change that disrupts development flow.

The data column is our document-style escape hatch. Coming from document databases, we value the ability to store evolving fields without migrations. Fields that are experimental or unstable live in data. When they prove important, they get promoted to dedicated columns. The column starts as an empty JSON object and costs nothing until used.

Foreign Keys

Real foreign keys within a service boundary. The database enforces referential integrity for all relationships where both tables live in the same database.

Logical foreign keys across service boundaries. When a product service references an account_id from the app service’s database, the column exists and is always populated, but no FK constraint is created. Integrity is enforced by the provisioning and introspection contracts, not the database.

Every foreign key column gets an explicit index. PostgreSQL doesn’t auto-index foreign keys (unlike some databases), so this must be done manually.

Why This Matters for AI Agents

An AI coding agent writing a new SQLAlchemy model needs to make dozens of micro-decisions: table name format, column types, which standard columns to include, how to handle foreign keys, whether to use an enum type or text with application validation.

Without conventions, each decision is a guess. The agent might produce correct code or it might not. When it doesn’t, the correction cycle burns time.

With conventions, the decisions are predetermined. An agent that knows the conventions produces correct code on the first try because there are no ambiguous choices. “Should this string column be VARCHAR(100) or TEXT?” is not a question when the convention says TEXT always. “Should I add a data JSONB column?” is not a question when the convention says every non-join table gets one.

We’ve found that explicit, comprehensive conventions — the kind that feel unnecessarily detailed when you write them — are exactly what AI agents need. Humans can infer conventions from examples. Agents perform better with explicit rules.

The SQLAlchemy Integration

The DeclarativeBase subclass auto-derives table names from class names: Account maps to account, ProductInstance maps to product_instance. This eliminates the __tablename__ boilerplate on every model and ensures naming consistency.

SQLAlchemy model classes use a Model suffix: UserModel, AccountModel, EnvironmentModel. Pydantic schema classes use clean names: User, Account, Environment. This prevents name collisions between ORM models and API schemas — a common source of import confusion in FastAPI applications.

The Result

Since establishing these conventions, our schema-related error rate in AI-generated code has dropped dramatically. New tables come out right the first time. Queries use the correct column names. Migrations generate cleanly because the models follow predictable patterns.

The conventions also help human developers, but the AI productivity impact is more dramatic because agents make more schema decisions per hour than humans do.

smplkit’s database conventions are designed for both human and AI developers. Read more in the docs.