Python coding standards for three AI agents and a human

Coding standards documents usually feel like bureaucracy — a list of rules that exists to exist, written by someone who felt strongly about snake_case and wanted everyone to know it.

Ours exists for a stranger reason: most of the authors writing code for this platform are not people. On any given day, three AI agents and one human are all writing Python across multiple microservices. That changes what a standards document is for. A human team converges on style through imitation and review — you absorb the house voice by reading the codebase and getting your PRs marked up. An agent doesn’t absorb anything; every session starts fresh, and whatever isn’t written down and mechanically checkable simply doesn’t exist for it. The inconsistencies that creep in between repos under those conditions aren’t trivial formatting differences. They’re maintenance hazards that compound, because nobody involved will ever get tired of introducing them.

So the standards below are not aesthetic preferences. They’re the subset of style we can state precisely enough to check — by a linter where possible, in review where not — and precisely enough for an agent to follow.

Type hints everywhere, in the modern syntax

Python’s type hint syntax has accumulated multiple ways to say the same thing. We standardize on the modern forms:

str | None not Optional[str]
list[str] not List[str]
dict[str, int] not Dict[str, int]
tuple[str, ...] not Tuple[str, ...]

The union syntax arrived in Python 3.10, which is our minimum supported version; Optional is still valid Python, but it’s idiomatic for an era we don’t target. One syntax means one pattern for every author to reproduce.

Functions that can return None say so: def get_flag(key: str) -> Flag | None: communicates the contract exactly. Functions that never return None don’t carry the annotation. You might not get a flag back; handle accordingly.

Google docstrings

We chose Google-style docstrings over NumPy and Sphinx styles because they’re readable without being rendered:

def get_flag(key: str, environment: str | None = None) -> Flag | None:
    """Retrieve a flag by key.

    Args:
        key: The flag's unique identifier within the account.
        environment: The environment to retrieve the flag for.
            Defaults to the client's configured environment.

    Returns:
        The flag object, or None if no flag with this key exists.

    Raises:
        AuthenticationError: If the API key is invalid.
    """

Docstrings are required on all public functions, classes, and modules; private functions benefit too, but we don’t enforce it. The Raises: section is the part we’re least flexible about — if a caller doesn’t know a function can raise AuthenticationError, they can’t write correct error handling, and no amount of reading the function signature will tell them.

One exception hierarchy

Every exception raised by smplkit code inherits from a base SmplkitError:

SmplkitError
├── ValidationError      # Bad input from the caller
├── NotFoundError        # Resource doesn't exist
├── ConflictError        # Resource state conflict (e.g., duplicate key)
├── AuthenticationError  # Invalid or missing credentials
├── AuthorizationError   # Authenticated but not permitted
└── ServiceError         # Internal service failures

The hierarchy lives in smplkit-core — the smplcore.exceptions module is the single source of truth — and it earns its place three ways. Catchability: except SmplkitError is right for a top-level handler that turns any domain failure into a JSON:API error response, while except ValidationError is right for a caller that treats input errors differently from auth errors. HTTP mapping: the FastAPI exception handler maps the hierarchy to status codes (ValidationError → 400, NotFoundError → 404, ConflictError → 409, AuthenticationError → 401, AuthorizationError → 403, ServiceError → 500) once, in one place, for every service. And searchability: grep -r "raise NotFoundError" src/ gives you every place in the codebase that returns a 404. That’s a useful audit, and it only works because there’s exactly one way to say “not found.”

Naming: `FlagModel` vs `Flag`

SQLAlchemy models get a Model suffix (FlagModel, ConfigModel); Pydantic schemas get clean names (Flag, Config). A FlagModel is a database row — it has id, created_at, version, SQLAlchemy column types, and a place in ORM sessions. A Flag is the API representation, a pure Pydantic model used for serialization — often the same fields, but different semantics.

The payoff is in signatures: a function that takes a Flag is working with API data, and one that takes a FlagModel is working with database state, visibly, without anyone opening the type definition. For an agent generating code from a function signature, that visible distinction is the difference between guessing and knowing.

`from future import annotations`, everywhere

Every module starts with from __future__ import annotations. It’s the rule that causes the most initial confusion, so it gets its own section.

The import enables PEP 563 postponed evaluation: annotations are stored as strings and evaluated lazily rather than at definition time. The practical win is clean forward references. Without it:

class Config:
    def parent(self) -> Config:  # NameError: Config isn't defined yet
        ...

With it:

from __future__ import annotations

class Config:
    def parent(self) -> Config:  # Fine — annotation is a string at this point
        ...

The caveat: postponed evaluation breaks code that reads annotations at runtime carelessly. Pydantic v1 had this problem; Pydantic v2 handles it correctly, which is what we run. The result is that all type hints in smplkit code are strings at runtime — fine for static analysis, fine for Pydantic v2, and forward references work everywhere without quote marks.

What we deliberately don’t standardize

We have no opinion on formatting beyond what Ruff enforces: line length 100, Ruff’s isort-compatible import ordering, done. We don’t mandate an async style — handlers that do I/O are async def, purely computational ones are sync, FastAPI is happy with both, and this is pragmatic rather than dogmatic. And we don’t prescribe test structure beyond “tests live in tests/ and use pytest” — over-specified test structure produces tests that conform to the structure without testing the right things. The requirement is coverage, not structure.

Enforcement

Two levels. Ruff handles formatting and mechanical style in CI, and non-conforming code doesn’t merge. The type hint and docstring standards are enforced in code review — a public function without a docstring or hints gets a comment before merge.

That split is worth noticing in the context this document started with: the rules a linter can check are rules an agent cannot drift from, no matter how many sessions it starts fresh in. Review catches the rest. Between the two, the codebase reads like it had one author — which, in a sense, it did.