The Hidden Cost of “Magic” Schemas

Software Engineering

Python

When DRY turns to dust: what I learned after letting automation define my API contracts. A story about boundaries, intent, and why hand-written schemas age better than auto-generated ones.

Author

Dominik Lindner

Published

November 8, 2025

1 The allure of auto-generated schemas

Every engineer eventually meets the siren song of the “single source of truth.” DRY, the holy grail of software engineering.

Don’t Repeat Yourself! Then why define models twice in a backend app? Once for the database, once for the API? Why not auto-generate definitions of one from the other?

I thought the same while building my recipe scanner app.

With a neat utility, I could turn my SQLAlchemy model into a Pydantic schema in one line.

RecipeOut = sqlalchemy_to_pydantic(RecipeORM)

It worked beautifully… until it didn’t.

What started as convenience quietly eroded my trust boundaries. For a deeper explanation of trust boundaries see my post Pydantic vs. Dataclass.

A harmless migration broke the entire API:

A new internal column appeared in the public OpenAPI docs.
A new nullable database field caused UI validation failures.
Even lazy-loaded relationships started firing extra queries during serialization.

Clear symptoms of automation not being useful but hollowing out your design.

For early prototypes this shortcut might be acceptable. But how do you know when to stop?

The rest of this post explores what to do instead.

2 Design the data flow of your application

2.1 The “walking-skeleton” mindset

Define the end-to-end data flow as one vertical slice through the system.

In our case: upload → pipeline → DB → JSON → UI.

Vocabulary first
List the nouns (“Recipe”, “OCR Block”, “Nutrition”) and relationships.
Minimal DTOs
Hand-write just enough Pydantic schemas for that first slice.
Stub everything else
Fake OCR, canned LLM responses, in-memory cache.
Ship & test
When the UI renders, your contract is real.
Iterate
Replace stubs with real logic.

2.2 Controlled Divergence ≠ Incoherence

It’s healthy for the storage model and the API schema to diverge. What is needed for database indexing and searching might not even be necessary in your internal services, yet alone any public API.

Layer	Optimised For	Example
Database	Joins, constraints, indexes, archival columns	`recipe_version_id`, `deleted_at`, `INT2` for tiny enums
API / Pipeline	Clear intent, validation, front-end friendliness	`ingredients: List[str]`, camelCase, enum labels

You can still auto convert from the database entry and drop every additional field, dto = RecipeRead.from_orm(orm_obj)) . Defintions stay coherent, but are not coupled. Or use model_dump() to write a JSON blob of the pydantic model to the database, if you must store it.

2.3 Triplet Schemas: Create / Update / Read

I thought that the triplet schema appraoch utter redundancy when starting to develop my backend. Turns out it is not. You will be thankfully for any Optional field. And it is nice to not fill in fake ids, which are overwritten by the database on creation of a recipe, this removes clutter from your code.

class RecipeCreate(BaseModel):
    title: str               # required
    calories: float          

class RecipeUpdate(BaseModel):
    title: Optional[str]     # patch semantics
    calories: Optional[float]

class RecipeRead(BaseModel):
    id: int
    title: str
    calories: float

For naming convention, pick a convention early and stick to it. Even if it feels verbose, teammates (and future you) will navigate the repo with ease.

Further Benefits:

Make-fields-required-later without breaking PATCH routes.
Disallow mutation of field id simply by omitting it in Update.
OpenAPI docs become self-explanatory.

2.4 Wrapper DTOs Keep Lists Clean

Having lists or multiple fields in your routes makes debugging in your browser harder.

Use a wrapper. Here is pagination of recipe search results:

class RecipePage(BaseModel):
    items: List[RecipeRead]
    total: int
    skip:  int
    limit: int
    has_more: bool

This pattern lets you extend api complexity without disturbing the inner DTO. And of course there is no connection to the database schema.

3 Final Take-aways

Automatic models are a scaffold, not a contract. Use only them only in prototypes.
Model vocab first, ship a thin end to end slice, iterate.
Embrace purposeful divergence between DB and API.
Redundant names and schema triplets pay dividends in clarity, validation, and future refactors.

Good systems age well not because they’re perfectly DRY, but because their trust boundaries are deliberate.
Writing your schemas by hand is an act of intent. You decide what is exposed in the API and what remains inside the backend.

Keep in my mind: Automation can help you move fast, but clarity is what helps you keep moving.

That’s the real tradeoff behind “auto-generated schemas”: not speed versus redundancy, but convenience versus comprehension.