Modern ORM Architecture

For years, we’ve believed in the power of ORMs to accelerate application development, especially during the early stages of development for most SaaS applications. The speed and ease of refactoring and quickly making changes generally trumps the need for fine-tuned performance enhancements that can be achieved with raw SQL or NO-SQL queries.

Most developers learn to use an ORM or two - be it Hibernate, Prisma, TypeORM, the Django ORM, or Active Record - but never get the chance to dive into how they work under the hood. Here, we’ll write a series of blog posts outlining the architectural patterns and design choices made by each framework, as well as the pros and cons of each approach.

What is an ORM?

An Object-Relational Mapper (ORM) is a programming technique and tool that creates a bridge between object-oriented programming languages and relational databases. It translates data between incompatible type systems - the objects in your application code and the tables in your database. The ORM handles the underlying SQL or NO-SQL generation and execution. For SQL databases, this means mapping database tables to classes, rows to objects, and columns to object attributes. For NO-SQL databases, this can mean many things. ORMs like Mongoid for Ruby map MongoDB collections to classes, and MongoDB objects to Ruby objects. Sometimes these are referred to as ODMs (Object Data Mappers).

Frequently when first developing an application, you’ll want to do many common operations like creating, reading, updating, and deleting records that represent objects in your system, and ORMs can help quickly scaffold this functionality. They also often come with features like connection pooling, secure parameterization, caching, and lazy loading out of the box. More mature applications may want to spend time developing very specific implementations of these features, but ORMs often provide enough functionality out of the box to keep developers focused on application business logic rather than technical nuances.

Model Definition Patterns

Object-based Model Definition

Django uses a very explicit configuration-based approach, while Rails’ Active Record takes a very Ruby-esque convention-over-configuration-based approach.

class User(models.Model):
    email = models.EmailField(unique=True)
    
    class Meta:
        db_table = 'users'

class User < ApplicationRecord
  validates :email, presence: true, uniqueness: true
end
# Automatically maps to 'users' table, assumes 'id' primary key

bin/rails generate migration User first_name:string last_name:string email:string

For Active Record, a developer uses the CLI to generate migrations, which are sequentially applied and kept track of via a metadata table (a very common pattern for migration frameworks). In the Ruby code, the User class has no first_name or last_name attribute; they are inferred from the migrations. It only has a client-side validation function that is called for the email field before it is written to the database. This convention-based approach is extremely terse, and great for smaller applications where you can keep track of the objects and their structure in your head. Additionally, many IDEs and tools can help quickly introspect your database structure itself and the migration files to provide autocomplete or typing hints.

The User object is a subclass of ApplicationRecord, which is what provides the “magic” that makes it work.

Django treats the model itself as the source of truth, and migrations are generated from the model. It uses a similar methodology for tracking migrations as Active Record, but the columns/fields exist as code on the model class itself. It is less terse, and (in my opinion), it is easier to manage larger applications where you may not frequently be working with the same set of models, or if you have dozens or hundreds of distinct models.

Both frameworks return “Active Objects” which have injected methods for interacting with the database. This allows you to use a syntax like new_user.save() without interacting with an entity manager or session object.

Decorator-based Model Definition

Java’s Hibernate and TypeScripts TypeORM use annotations/decorators on vanilla classes to define model structure.

// TypeORM
@Entity("users")
export class User {
  @PrimaryGeneratedColumn()
  id: number
  
  @Column({ unique: true })
  email: string
}

// Hibernate
@Entity
@Table(name = "users")
public class User {
  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;
}

For these types of ORMs, runtime functionality will mutate these classes/objects to add metadata. In TypeORM, querying the users table in our example will yield an object with the columns as simple fields, with additional keys on the object for metadata. Saving a modified object to the database does not use a user.save() call but rather you interact using an EntityManager object via manager.save(user). In Hibernate, a similar pattern of using entity manager objects and session objects exists via session.persist(new User("example@example.org"))).

Notably, TypeORM supports the “Active Object” pattern as well, if the models explicitly inherit from BaseEntity. The pattern shown above is known as the “Data Mapper” pattern.

Code Generation-based Model Definition

The JavaScript/TypeScript Prisma ORM uses a schema file in the Prisma Schema Language to define the schema, then uses a code generator to generate a type-safe client.

model User {
  id    String @id @default(cuid())
  email String @unique
  posts Post[]
}

This trades off some of the run-time magic in frameworks like TypeORM or Rails for work done at compile-time. It uses a similar “Data Mapper” syntax as TypeORM and Hibernate.

Data Patterns: Active Object vs. Data Mapper

We touched on these two patterns a bit, but let’s dive in deeper to the pros and cons.

Active Objects tend to be easier to understand from the get-go. It’s rather simple, and you can almost think of your objects in your code and your records in your database as the same thing. There’s not a lot of indirection or abstraction between the two concepts. On the flip side, this means that your application is highly coupled to your database. In medium-to-large projects, this frequently means that your tests will be highly reliant upon your database’s performance. This can be mitigated a bit with creative use of other patterns like Service Objects or mocking in tests, but this can introduce some unnatural architecture styles where you lose out on the simplicity you gained in the first place.

Data Mapper ORMs are thought of as more of a layer between your application and your database, keeping them explicitly separated. This can be great for complex scenarios where you may store the data in a way that is fundamentally different than how you think about the data. Closure tables (a pattern for storing hierarchical data) are a good example of this. Your application code can be rather elegant, while your Data Mapper can handle storing it in a more complex way that facilitates much faster querying. This type of bridge to provide both simplicity and performance is a common solution handled by this pattern. On the other side, you may be able to predict that the Data Mapper pattern is frequently more complex to implement.

I tend to prefer Active Object ORMs (note: I am a frequent user of Django and Rails) for most use cases because of the simplicity. In very specific cases, I will combine the approaches, choosing Active Objects for most functionality and using the Data Mapper pattern for specialized data that has advanced performance or storage needs.

Conclusion

The various model definition patterns have pros and cons of their own, and they’re often highly coupled with the underlying framework philosophy for data patterns. There’s no “one right answer” as is the case with virtually everything in software engineering, but it remains that each choice is correct in a given context. Most of my work tends to be on either greenfield applications or small-to-medium-sized codebases where Active Object ORMs make the most sense, and thus I tend to use object-based model definitions. However in some specific cases, I find it more elegant to use the Data Mapper pattern for certain classes of data problems, especially when using non-relational databases or representing complex data structures or complex queries.