flowchart TD
A([Start]) --> B{Will external input<br/>populate it?}
B -->|Yes| P1[Pydantic]
B -->|No| C{Does it cross a thread,<br/>process or service boundary?}
C -->|Yes| P1[Pydantic]
C -->|No| D{Instantiated in a tight loop<br/>thousands of times?}
D -->|Yes| D1[Dataclass]
D -->|No| E{Need brainy behaviour:<br/>caching, math ops, etc.?}
E -->|Yes| CL[Plain class]
E -->|No| F{Need OpenAPI or<br/>JSON Schema?}
F -->|Yes| P1[Pydantic]
F -->|No| CL[Plain class]
When to use pydantic in your app?
BaseModel or just slap a @dataclass on it”? If that debate keeps popping up in code-reviews, this post is for you.
1 A tale of two classes
1.1 The balance between structure and speed
AI agents, LLM api calls and the overall microservice architecture of modernd AI applications passes data over many interfaces. The real challenge behind all this is not speed, but managing complexity and structure.
Too much validation at the interfaces and your system drags. Too little, and single malformed entry can blow everything up.
1.2 A debugging nightmare
I recently started building a recipe-digitization platform for cookbooks.
The data of single “recipe” object snakes through:
- The frontend app.
- FastAPI (request body validation).
- A LangGraph pipeline that does OCR, LLM classification, nutrition maths.
- PostgreSQL for storage.
- Retrieval from storage.
- Another LangGraph pipeline for recommendations and question answering
That data crosses six trust boundaries. One malformed field and the whole chain can break. Without proper testing or custom exceptions it will not even be clear where the chain broke.
Imagine the frontend sends "calories": "three hundred" instead of 300.
FastAPI happily passes it along, the LangGraph pipeline classifies it, your nutrition module tries to sum up the numbers — and somewhere deep inside, Python raises TypeError: unsupported operand type(s) for +: 'int' and 'str'.
By then the stack trace is six layers deep and no one remembers which hop introduced the bad data.
1.3 Why Pydantic matters
Interface contracts is where Pydantic shines: every hop can call model_validate(). The invalid or missing fields are found at every link.
With an ordinary class you’d need a minefield of if not isinstance() checks or hope that the error explodes somewhere downstream and you have good exception handling in place.
1.4 When speed wins
But if your code is a tight loop (see also this post), which mutates ten thousand bounding-boxes per second, the extra validation overhead is waste you can’t afford.
That’s when a plain @dataclass is better.
1.5 The decision that shapes your system
So how do you decide when to pick which. The decision is usually done early in the development, as switching comes at a cost.
Read on for my guide of pydantic vs. dataclass.
2 When Pydantic has the top hand
2.1 Anything public
Public means “untrusted”: browser forms, mobile apps, webhook payloads, micro-service messages.
A rule of thumb: If humans or foreign code can hit an endpoint, wrap the payload in Pydantic.
Pydantic will:
- Coerce
"42"intoint(42)and reject"forty-two". - Produce gorgeous, self-updating OpenAPI docs for your front-enders.
- Hand you a neat
.model_dump()dict ready for JSON or logging.
2.2 Pipes, DAGs and other data hurricanes
In a LangGraph or Airflow style pipeline, data hops from node to node.
A single corrupted value will keep blowing up later nodes. It can be hard to track down the root cause.
Declaring each node’s input and output as Pydantic acts as formalization of the interface contract. If the contract is broken a validation error is raised immediately.
2.3 Anything you need to document
If your product requires interface documentation of modules, explaining the interface in a docstring is waste.
Pydantic (especially inside FastAPI) auto-publishes a living JSON schema. Your docs automatically stay aligned with reality; no code-docu schism.
3 When a dataclass wins
3.1 Performance-critical inner loops
Parsing and validating thousands of objects per second costs CPU.
If you’re:
- Transforming video frames
- Running physics simulations
- Streaming sensor data
…then a bare @dataclass is often 10-15× faster to instantiate. You already trust the numbers, only the documentation of the interface contract requires more manual effort now.
3.2 Immutable configuration
App-level settings—API keys, directory paths—are loaded once at boot. After that they never change.
A frozen @dataclass is lighter. You can still parse env files with tools like python-dotenv.
3.3 Library code that must stay dependency-free
If you publish a small library, dragging in Pydantic’s dependency tree is overkill for users.
Plain classes keep your footprint tiny and import times fast.
3.4 Behaviour-heavy domain objects
The more your class behaves like an actor (lots of methods, cached properties, custom __eq__), the less benefit you get from declarative validation. Pydantic is just additional formalism with little added value.
4 Practical tips for your project
4.1 A hybrid pattern that scales
From my research, I found that most large back-ends settle on three layers:
| Layer | Typical choice | Rationale |
|---|---|---|
| Edge / API | Pydantic | Strong validation + OpenAPI |
| Core logic | Mix & match | Pydantic where data hops, dataclass for hot loops |
| Persistence | SQLAlchemy models | Mirrors DB; cast to/from Pydantic with .model_validate() |
That means you can convert like so:
dto = RecipeRead.model_validate(orm_obj, from_attributes=True) # ORM → DTO
orm_obj.calories = dto.calories # DTO → ORMNo reflection required, and the mapping is crystal-clear.
4.2 Naming endings save future tears
Pydantic is frequently used on Public APIs, which support create, update, read methods.
Pick a convention on day one and never break it:
RecipeCreate,RecipeUpdate,RecipeRead- Or
RecipeIn,RecipePatch,RecipeOut - Wrappers like
RecipePage,RecipeFilters
When I started doing it, I thought three nearly identical models are redundant. Yet when the tar pit of complexity tries to drag you in, you are happy for every stick that you can reach; or in other words if you defined a Update Class with optional fields.
4.3 Decision Graph
You can use this graph to decide what to use.
5 The bottom line
- Pydantic is not just syntactic sugar; it is a contract, a validator, and a documentation generator baked into one.
- Dataclasses and plain classes are perfect for tight, trusted, logic-heavy cores.
- Large projects thrive when they mix them intentionally: Pydantic at the boundaries, dataclasses for heavy lifting.
The real skill isn’t memorizing which decorator to use; it’s learning to see your system as a chain of trust boundaries.
Pydantic, dataclasses, or plain classes are just different ways of declaring that trust, what you verify, and where errors should surface.
In larger systems, this mindset scales beyond Python: enterprise teams in any language end up building similar constructs to formalize trust at their interfaces. The principle of trust boundaries endures.
In the end that also applies how people work together. Teams thrive on clear contracts and responsibility.
Structure buys clarity and, and clarity buys speed. Pydantic is one tool to execute this philosophy.