![](/style/images/good.png)
![](/style/images/bad.png)
Scaling a Spring application with a YugabyteDB cluster
source link: https://vladmihalcea.com/scaling-spring-yugabytedb-cluster/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Scaling a Spring application with a YugabyteDB cluster
Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
You can earn a significant passive income stream from promoting my book, courses, tools, training, or coaching subscriptions.
If you're interested in supplementing your income, then join my affiliate program.
Introduction
In this article, we are going to see that scaling the data access layer of a Spring application can be done very easily with a YugabyteDB cluster.
As I explained in this article, YugabyteDB is an open-source distributed SQL database that offers all the benefits of a typical relational database (e.g., SQL, strong consistency, ACID transactions) with the advantages of a globally-distributed auto-sharded database system (e.g., NoSQL document databases).
How to create a local YugabyteDB cluster
As I showed in this article, creating a single-node YugabyteDB Docker container is extremely easy.
However, configuring a multi-node YugabyteDB Docker container is just as easy.
First, we will create a Docker network that will be used by our YugabyteDB cluster:
docker network create yugabyte-network |
Afterward, we will create the first YugabyteDB node:
docker run -d --name yugabyte-replica1 --net=yugabyte-network ^ -p7001:7000 -p9000:9000 -p5433:5433 ^ yugabytedb /yugabyte :latest bin /yugabyted start ^ --base_dir= /tmp/yugabyte ^ --daemon= false |
The caret (e.g.,
^
) bash operator I used here is to instruct my local Windows OS to continue the command on the next line without executing it when reaching the end of the line.If you’re using Linux or Mac OS, then replace
^
with a\
bash operator.
After creating the first node, we can add two more nodes so that we will end up with a three-node cluster:
docker run -d --name yugabyte-replica2 --net=yugabyte-network ^ yugabytedb /yugabyte :latest bin /yugabyted start ^ --base_dir= /tmp/yugabyte ^ -- join =yugabyte-replica1 ^ --daemon= false docker run -d --name yugabyte-replica3 --net=yugabyte-network ^ yugabytedb /yugabyte :latest bin /yugabyted start ^ --base_dir= /tmp/yugabyte ^ -- join =yugabyte-replica1 ^ --daemon= false |
That’s it!
If we open the YugabyteDB Admin web server UI, which, in our case, is available on the 7001
port on the localhost
because of the -p7001:7000
parameter we provided to docker run
when the yugabyte-replica1
node was created, then we will be able to see our newly created 3-node cluster:
![YugabyteDB Cluster](https://vladmihalcea.com/wp-content/uploads/2023/01/YugabyteDBCluster.png)
Configuring Spring to use the YugabyteDB cluster
While you can use the PostgreSQL JDBC Driver to connect to a single-node YugabyteDB database, if you have a cluster of nodes, it’s much better to use the YugabyteDB-specific JDBC Driver, which you can get from Maven Central:
< dependency > < groupId >com.yugabyte</ groupId > < artifactId >jdbc-yugabytedb</ artifactId > < version >${yugabytedb.version}</ version > </ dependency > |
As I explained in this article, it’s very important to use a connection pool when you are connecting to any database system, and YugabyteDB is no different.
Therefore, our DataSource
configuration is going to look as follows:
@Bean (destroyMethod = "close" ) public DataSource dataSource() { YBClusterAwareDataSource dataSource = new YBClusterAwareDataSource(); dataSource.setURL(url()); dataSource.setUser(username()); dataSource.setPassword(password()); HikariConfig hikariConfig = new HikariConfig(); hikariConfig.setMaximumPoolSize(maxConnections); hikariConfig.setAutoCommit( false ); hikariConfig.setDataSource(dataSource); return new HikariDataSource(hikariConfig); } private String url() { return String.format( "jdbc:yugabytedb://%s:%d/%s?load-balance=true" , host, port, database ); } |
Notice that we are wrapping the YBClusterAwareDataSource
into a HikariDataSource
so that we can allow HikariCP to manage the YugabyteDB physical connections.
Spring batch processing task
To demonstrate how the YugabyteDB cluster works, let’s build a Spring batch processing task.
Consider we have a Post
entity that’s mapped as follows:
@Entity @Table (name = "post" ) public class Post { @Id @GeneratedValue private Long id; private String title; @Enumerated (EnumType.ORDINAL) private PostStatus status; public Long getId() { return id; } public Post setId(Long id) { this .id = id; return this ; } public String getTitle() { return title; } public Post setTitle(String title) { this .title = title; return this ; } public PostStatus getStatus() { return status; } public Post setStatus(PostStatus status) { this .status = status; return this ; } } |
We will define the PostRepository
that will provide the data access methods for the Post
entity:
@Repository public interface PostRepository extends BaseJpaRepository<Post, Long> { } |
We also need to provide the @EnableJpaRepositories
configuration that instructs Spring where the Spring Data Repositories are located and the implementation of the BaseJpaRepository
interface:
@EnableJpaRepositories ( value = "com.vladmihalcea.book.hpjp.spring.batch.repository" , repositoryBaseClass = BaseJpaRepositoryImpl. class ) |
Note that the
PostRepository
extends theBaseJpaRepository
from the Hypersistence Utils open-source project and not the defaultJpaRepository
from Spring Data JPA.While the
JpaRepository
is a popular choice, few Java developers are aware of its problems. Not only it provides asave
method that doesn’t always translate to the proper JPA operation, but inheriting thefindAll
method in every singleRepository
, even the ones which can have hundreds of millions of records in the associated database tables, is a terrible idea.
The PostRepository
is used by the ForumService
, which defines a createPosts
method that can save a large volume of Post
records using multiple threads and JDBC-level batching:
@Service @Transactional (readOnly = true ) public class ForumService { private static final Logger LOGGER = LoggerFactory.getLogger( ForumService. class ); private static final ExecutorService executorService = Executors .newFixedThreadPool( Runtime.getRuntime().availableProcessors() ); private final PostRepository postRepository; private final TransactionTemplate transactionTemplate; private final int batchProcessingSize; public ForumService( @Autowired PostRepository postRepository, @Autowired TransactionTemplate transactionTemplate, @Autowired int batchProcessingSize) { this .postRepository = postRepository; this .transactionTemplate = transactionTemplate; this .batchProcessingSize = batchProcessingSize; } @Transactional (propagation = Propagation.NEVER) public void createPosts(List<Post> posts) { CollectionUtils.spitInBatches(posts, batchProcessingSize) .map(postBatch -> executorService.submit(() -> { try { transactionTemplate.execute((status) -> postRepository.persistAll(postBatch) ); } catch (TransactionException e) { LOGGER.error( "Batch transaction failure" , e); } })) .forEach(future -> { try { future.get(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } catch (ExecutionException e) { LOGGER.error( "Batch execution failure" , e); } }); } } |
There are several aspects worth mentioning about the ForumService
class:
- The
createPosts
uses the@Transactional(propagation = Propagation.NEVER)
because the transaction management is handled by each batch processing task. - The
CollectionUtils.spitInBatches
is used to split aList<T>
into aStream<List<T>>
where eachStream
element has at mostbatchProcessingSize
elements. - The
Post
entities are persisted using thepersistAll
method from theBaseJpaRepository
.
Since Hibernate does not use JDBC-batching by default, we will set the following Hibernate configurations in the Java-based Spring Bean configuration:
protected Properties additionalProperties() { Properties properties = new Properties(); properties.setProperty( "hibernate.jdbc.batch_size" , String.valueOf(batchProcessingSize()) ); properties.setProperty( "hibernate.order_inserts" , "true" ); properties.setProperty( "hibernate.order_updates" , "true" ); return properties; } |
Testing time
When sending a large chunk of Post
entities to the createPosts
method:
List<Post> posts = LongStream.rangeClosed( 1 , POST_COUNT) .mapToObj(postId -> new Post() .setTitle( String.format( "High-Performance Java Persistence - Page %d" , postId ) ) .setStatus(PostStatus.PENDING) ) .toList(); forumService.createPosts(posts); |
In the YugabyteDB Tablet Servers view, the Write ops/sec
columns indicate that all three YugabyteDB nodes are used equally to save the post
table records:
![YugabyteDB Cluster Tablet Servers Writes](https://vladmihalcea.com/wp-content/uploads/2023/01/YugabyteDBClusterTabletServersWrites.png)
So, the write operations are propagated to all nodes, not just to a single Primary node, as it’s the case with the Single-Primary Replication strategy.
This is because YugabyteDB employs a shared-nothing architecture where each node is the primary for a given data subset. For instance, the first node can be the owner of Post
with the id
value of 1
and handle read and write operations associated with this record.
The other two nodes will keep a copy of this Post
with the id
value of 1
for high-availability purposes. At the same time, the second node can be the owner of the Post
record with the id
value of 2
, and the other two nodes will be keeping a copy of this particular record.
Therefore, the YugabyteDB cluster is not used only for writing. The three nodes are also used also when executing read operations because the data is automatically sharded.
For instance, if we fetch 1000
Post
entities:
LongStream.rangeClosed( 1 , 1000 ) .boxed() .forEach(id -> assertNotNull(forumService.findById(id) ) ); |
We can see in the Table Servlets that all nodes were being used when fetching these records:
![YugabyteDB Cluster Tablet Servers Reads](https://vladmihalcea.com/wp-content/uploads/2023/01/YugabyteDBClusterTabletServersReads-1.png)
Cool, right?
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
And there is more!
You can earn a significant passive income stream from promoting all these amazing products that I have been creating.
If you're interested in supplementing your income, then join my affiliate program.
Conclusion
Most often, a Spring application uses a Single-Primary replicated relational database since this has been the typical replication architecture employed by PostgreSQL, MySQL, Oracle, or SQL Server.
However, scaling a single-primary replicated database requires a read-write and read-only transaction routing strategy. For this reason, reads will be scaled horizontally on Replica nodes, while writes can only be scaled vertically on the Primary node.
A YugabyteDB cluster doesn’t have this limitation since it can scale both the read and write operations across the entire cluster, making it easier for you to scale a Spring application.
![Transactions and Concurrency Control eBook](https://vladmihalcea.com/wp-content/uploads/2020/03/TransactionsEbookDownloadHorizontal.png)
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK