The Hidden Cost of “Magic” Schemas
1 The allure of auto-generated schemas
Every engineer eventually meets the siren song of the “single source of truth.” DRY, the holy grail of software engineering.
Don’t Repeat Yourself! Then why define models twice in a backend app? Once for the database, once for the API? Why not auto-generate definitions of one from the other?
I thought the same while building my recipe scanner app.
With a neat utility, I could turn my SQLAlchemy model into a Pydantic schema in one line.
RecipeOut = sqlalchemy_to_pydantic(RecipeORM)It worked beautifully… until it didn’t.
What started as convenience quietly eroded my trust boundaries. For a deeper explanation of trust boundaries see my post Pydantic vs. Dataclass.
A harmless migration broke the entire API:
- A new internal column appeared in the public OpenAPI docs.
- A new nullable database field caused UI validation failures.
- Even lazy-loaded relationships started firing extra queries during serialization.
Clear symptoms of automation not being useful but hollowing out your design.
For early prototypes this shortcut might be acceptable. But how do you know when to stop?
The rest of this post explores what to do instead.
2 Design the data flow of your application
2.1 The “walking-skeleton” mindset
Define the end-to-end data flow as one vertical slice through the system.
In our case: upload → pipeline → DB → JSON → UI.
Vocabulary first
List the nouns (“Recipe”, “OCR Block”, “Nutrition”) and relationships.Minimal DTOs
Hand-write just enough Pydantic schemas for that first slice.Stub everything else
Fake OCR, canned LLM responses, in-memory cache.Ship & test
When the UI renders, your contract is real.Iterate
Replace stubs with real logic.
2.2 Controlled Divergence ≠ Incoherence
It’s healthy for the storage model and the API schema to diverge. What is needed for database indexing and searching might not even be necessary in your internal services, yet alone any public API.
| Layer | Optimised For | Example |
|---|---|---|
| Database | Joins, constraints, indexes, archival columns | recipe_version_id, deleted_at, INT2 for tiny enums |
| API / Pipeline | Clear intent, validation, front-end friendliness | ingredients: List[str], camelCase, enum labels |
You can still auto convert from the database entry and drop every additional field, dto = RecipeRead.from_orm(orm_obj)) . Defintions stay coherent, but are not coupled. Or use model_dump() to write a JSON blob of the pydantic model to the database, if you must store it.
2.3 Triplet Schemas: Create / Update / Read
I thought that the triplet schema appraoch utter redundancy when starting to develop my backend. Turns out it is not. You will be thankfully for any Optional field. And it is nice to not fill in fake ids, which are overwritten by the database on creation of a recipe, this removes clutter from your code.
class RecipeCreate(BaseModel):
title: str # required
calories: float
class RecipeUpdate(BaseModel):
title: Optional[str] # patch semantics
calories: Optional[float]
class RecipeRead(BaseModel):
id: int
title: str
calories: floatFor naming convention, pick a convention early and stick to it. Even if it feels verbose, teammates (and future you) will navigate the repo with ease.
Further Benefits:
- Make-fields-required-later without breaking PATCH routes.
- Disallow mutation of field
idsimply by omitting it inUpdate. - OpenAPI docs become self-explanatory.
2.4 Wrapper DTOs Keep Lists Clean
Having lists or multiple fields in your routes makes debugging in your browser harder.
Use a wrapper. Here is pagination of recipe search results:
class RecipePage(BaseModel):
items: List[RecipeRead]
total: int
skip: int
limit: int
has_more: boolThis pattern lets you extend api complexity without disturbing the inner DTO. And of course there is no connection to the database schema.
3 Final Take-aways
Automatic models are a scaffold, not a contract. Use only them only in prototypes.
Model vocab first, ship a thin end to end slice, iterate.
Embrace purposeful divergence between DB and API.
Redundant names and schema triplets pay dividends in clarity, validation, and future refactors.
Good systems age well not because they’re perfectly DRY, but because their trust boundaries are deliberate.
Writing your schemas by hand is an act of intent. You decide what is exposed in the API and what remains inside the backend.
Keep in my mind: Automation can help you move fast, but clarity is what helps you keep moving.
That’s the real tradeoff behind “auto-generated schemas”: not speed versus redundancy, but convenience versus comprehension.