Why and How You Do Query Routing With Spring JPA

Jul 1, 2024 | 7 mins read

software-engineering spring-boot

Bunny — Photo by Lilian Velet on Unsplash

First thing first, before you expect anything from this article, this is NOT a general guide to route queries with any ORMs and/or DBMS. This article is focused to:

Spring Boot 2/3 (at least 2.0.1)
Spring JPA 2/3 + Hibernate 5/6
SQL Database, specifically PostgreSQL
So it’s only JDBC. (Spring) R2DBC may follow similar idea but different transaction propagation setup.

In short, this article is for non-reactive Spring Boot app which uses SQL database.
If you’re not using any of the above technologies, you may still gain something by reading.

This article assumes we follow a standard single primary-replication scheme, which should fit most applications that read a lot more data than they write. There are several advantages to separating write and read requests to different database server(s).

First, it improves database performance. Your writes won’t be impacted by your reads. We can also tune each database for its specific purpose.

Second, it improves scalability, easier to scale each database independently. This is particularly useful in situations where the application experiences high read traffic, but low write traffic, or vice versa. Although scaling write database can only be done vertically, scaling read database can be done horizontally by adding more replicas.

Moreover, separate read and write databases can improve fault tolerance.
In the event of write database outage, database system can promote any of the read databases to be the next write database while we restoring the original write database. This can minimize downtime and ensure that the application remains accessible to users.
In the case of read database failure, all read requests can be routed to write database while we restoring the read databases.

The idea

Suppose we have 2 database servers: one is primary - the read-write database, the other is secondary - the read-only database, a replica of the primary. Often, secondary is used only by viewer/reporting app, and also as a failover database when the primary is unavailable, while primary is highly utilized from high reads and writes from the main app.
If the secondary has low utilization, why not using it specifically for processing read-only requests, so primary can focus to handle only read-write requests.

To utilize both servers effectively, we can route every query to either primary or secondary. The idea is simple:

All read-write queries should be routed to primary database
All read-only queries should be routed to secondary database
When secondary fails, all read-only queries should be routed to primary database

The implementation

Following the diagram, the routing logic should be done at app/service level. To do the routing, we need to have at least two datasources: one for read-write, one for read-only. There’s a Spring Boot doc to set up two datasources which fits our requirement. We could simply follow and modify that.
With Postgres and HikariCP, for example:

The *.url property has 2 important points:

Both instances are registered as potential hosts of read-write and read-only data source. This is important if you want to set up connection fail-over.
For read-write the driver must only use primary database, while for read-only the driver should prioritize secondary database over primary database.

This simple connection fail-over is useful to create a high availability database installation.

In case of failure on the primary database, to have minimal downtime, the secondary database could be promoted to be a primary so no further write transaction will be lost.
In case of failure on the secondary database, because there’s no other secondary, the driver will automatically uses the primary database as the next read-only datasource.
If all of the servers are marked as “can’t connect” in the cache, then an attempt will be made to connect to all the hosts in the URL, in order. If none succeeds, a normal connection exception is thrown.

The configuration code is simple, only consists of 2 new classes:

1. Routing Data Source

The core transaction routing logic is quite straightforward. It’s using enum map to map the target datasource, with only two keys. To get the target data source, AbstractRoutingDataSource will determine current lookup key first which can simply be determined by checking if current transaction is read-only or not.

2. DataSource Bean Configuration

Nothing fancy, just configuring a primary data source from previous step that will route the query to either the read-write data source or the read-only data source.

Just that?

Of course! not. If it’s that straightforward, this article won’t exist, really.

TransactionRoutingDataSource needs the read-only flag to be set before acquiring the database connection. The problem is, if we set the readOnly or the isolationLevel attribute on the @Transactional annotation, Hibernate will fetch the database connection eagerly in order to set the read-only flag or the custom isolation level. The read-only flag is set after the connection is acquired, not before.

This means our TransactionRoutingDataSource#determineCurrentLookupKey() will wrongly return the read-write data source for all read-only transactions.
To better understand this, this is the problematic code in HibernateJpaDialect#beginTransaction(...):

If you’re using Spring 6.1.2+ or Spring Boot 3+, scroll down to the next section for a much simpler & better solution.

We need to make that nested if condition evaluates to false. There are two options:

1. Set the connection release mode to another value than ConnectionReleaseMode.ON_CLOSE
For example, by setting hibernate.connection.handling_mode to DELAYED_ACQUISITION_AND_RELEASE_AFTER_TRANSACTION. By default Spring uses DELAYED_ACQUISITION_AND_HOLD.

2. Set the prepareConnection property of the HibernateJpaDialect to false.
Fo example, by providing this custom BeanPostProcessor:

Personally I prefer the first option as it’s more flexible and less code than the latter.
In any case, both options come with a caveat: it will disable custom transaction isolation level support, only default isolation level will work.

Simpler solution for Spring 6.1.2+

Starting from Spring Boot 3.2.1 or Spring Data JPA 3.2.1 (specifically Spring 6.1.2), we can use LazyConnectionDataSourceProxy to eliminate the aforementioned issue. Even better, it makes the TransactionRoutingDataSource obsolete, not needed at all!

We just need to:

Create LazyConnectionDataSourceProxy instance with the read-write data source
Set the read-only data source
Set default auto-commit to false (see also “Suggestions” section below)
Configure and return as the primary data source

Example:

Suggestions

Disable auto-commit mode

When we are using read-write transaction, by default Hibernate will acquire connection eagerly to check and disable the auto-commit mode, making the connection to be held longer than actually needed. This is bad for long-term performance.
To improve on this, we can delay the connection acquisition for read-write transaction until Hibernate executes the first SQL statement.
To do this, we need to instruct the connection pool to disable the auto-commit mode when creating a new database connection and then tell Hibernate the same:

Don’t mix read-write and read-only transactions in a single request session

This is especially if you use connection handling mode that releases connections after each transaction ends (which is a good practice if you don’t need custom isolation level).

The reason is it may cause race condition in a tight cycle of write and read. From my experience, there’s a noticeable database replication delay - even though it’s in milliseconds, which may cause data inconsistencies if you read row X in a read-only transaction right after writing row X on a read-write transaction. This write->read cycle is usually common during bulk processing or parallel processing.

In a single request session, I recommend to either use only read-only transactions (for example, only for GET APIs), or only read-write transactions. If you use both, make sure the outermost transaction is a read-write transaction so all statements will always run on the primary database.

Set default transaction for methods to read-only

It’s a good practice to define @Transactional(readOnly = true) annotation at the class level and only override it for read-write methods. This way, we can make sure that read-only methods are executed by default on the secondary nodes. If we forget to add @Transactional annotation on a read-write method, we will get an exception since read-write transactions can only execute on the primary node.

That’s all Folks!

References

software-engineering spring-boot