4

Scaling a Spring application with a YugabyteDB cluster

 1 year ago
source link: https://vladmihalcea.com/scaling-spring-yugabytedb-cluster/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Scaling a Spring application with a YugabyteDB cluster

Last modified: Jan 24, 2023

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

You can earn a significant passive income stream from promoting my book, courses, tools, training, or coaching subscriptions.

If you're interested in supplementing your income, then join my affiliate program.

Introduction

In this article, we are going to see that scaling the data access layer of a Spring application can be done very easily with a YugabyteDB cluster.

As I explained in this article, YugabyteDB is an open-source distributed SQL database that offers all the benefits of a typical relational database (e.g., SQL, strong consistency, ACID transactions) with the advantages of a globally-distributed auto-sharded database system (e.g., NoSQL document databases).

How to create a local YugabyteDB cluster

As I showed in this article, creating a single-node YugabyteDB Docker container is extremely easy.

However, configuring a multi-node YugabyteDB Docker container is just as easy.

First, we will create a Docker network that will be used by our YugabyteDB cluster:

docker network create yugabyte-network

Afterward, we will create the first YugabyteDB node:

docker run -d --name yugabyte-replica1 --net=yugabyte-network ^
-p7001:7000 -p9000:9000 -p5433:5433 ^
yugabytedb/yugabyte:latest bin/yugabyted start ^
--base_dir=/tmp/yugabyte ^
--daemon=false

The caret (e.g., ^) bash operator I used here is to instruct my local Windows OS to continue the command on the next line without executing it when reaching the end of the line.

If you’re using Linux or Mac OS, then replace ^ with a \ bash operator.

After creating the first node, we can add two more nodes so that we will end up with a three-node cluster:

docker run -d --name yugabyte-replica2 --net=yugabyte-network ^
yugabytedb/yugabyte:latest bin/yugabyted start ^
--base_dir=/tmp/yugabyte ^
--join=yugabyte-replica1 ^
--daemon=false
docker run -d --name yugabyte-replica3 --net=yugabyte-network ^
yugabytedb/yugabyte:latest bin/yugabyted start ^
--base_dir=/tmp/yugabyte ^
--join=yugabyte-replica1 ^
--daemon=false

That’s it!

If we open the YugabyteDB Admin web server UI, which, in our case, is available on the 7001 port on the localhost because of the -p7001:7000 parameter we provided to docker run when the yugabyte-replica1 node was created, then we will be able to see our newly created 3-node cluster:

YugabyteDB Cluster

Configuring Spring to use the YugabyteDB cluster

While you can use the PostgreSQL JDBC Driver to connect to a single-node YugabyteDB database, if you have a cluster of nodes, it’s much better to use the YugabyteDB-specific JDBC Driver, which you can get from Maven Central:

<dependency>
<groupId>com.yugabyte</groupId>
<artifactId>jdbc-yugabytedb</artifactId>
<version>${yugabytedb.version}</version>
</dependency>

As I explained in this article, it’s very important to use a connection pool when you are connecting to any database system, and YugabyteDB is no different.

Therefore, our DataSource configuration is going to look as follows:

@Bean(destroyMethod = "close")
public DataSource dataSource() {
YBClusterAwareDataSource dataSource = new YBClusterAwareDataSource();
dataSource.setURL(url());
dataSource.setUser(username());
dataSource.setPassword(password());
HikariConfig hikariConfig = new HikariConfig();
hikariConfig.setMaximumPoolSize(maxConnections);
hikariConfig.setAutoCommit(false);
hikariConfig.setDataSource(dataSource);
return new HikariDataSource(hikariConfig);
}
private String url() {
return String.format(
"jdbc:yugabytedb://%s:%d/%s?load-balance=true",
host,
port,
database
);
}

Notice that we are wrapping the YBClusterAwareDataSource into a HikariDataSource so that we can allow HikariCP to manage the YugabyteDB physical connections.

Spring batch processing task

To demonstrate how the YugabyteDB cluster works, let’s build a Spring batch processing task.

Consider we have a Post entity that’s mapped as follows:

@Entity
@Table(name = "post")
public class Post {
@Id
@GeneratedValue
private Long id;
private String title;
@Enumerated(EnumType.ORDINAL)
private PostStatus status;
public Long getId() {
return id;
}
public Post setId(Long id) {
this.id = id;
return this;
}
public String getTitle() {
return title;
}
public Post setTitle(String title) {
this.title = title;
return this;
}
public PostStatus getStatus() {
return status;
}
public Post setStatus(PostStatus status) {
this.status = status;
return this;
}
}

We will define the PostRepository that will provide the data access methods for the Post entity:

@Repository
public interface PostRepository extends BaseJpaRepository<Post, Long> {
}

We also need to provide the @EnableJpaRepositories configuration that instructs Spring where the Spring Data Repositories are located and the implementation of the BaseJpaRepository interface:

@EnableJpaRepositories(
value = "com.vladmihalcea.book.hpjp.spring.batch.repository",
repositoryBaseClass = BaseJpaRepositoryImpl.class
)

Note that the PostRepository extends the BaseJpaRepository from the Hypersistence Utils open-source project and not the default JpaRepository from Spring Data JPA.

While the JpaRepository is a popular choice, few Java developers are aware of its problems. Not only it provides a save method that doesn’t always translate to the proper JPA operation, but inheriting the findAll method in every single Repository, even the ones which can have hundreds of millions of records in the associated database tables, is a terrible idea.

The PostRepository is used by the ForumService, which defines a createPosts method that can save a large volume of Post records using multiple threads and JDBC-level batching:

@Service
@Transactional(readOnly = true)
public class ForumService {
private static final Logger LOGGER = LoggerFactory.getLogger(
ForumService.class
);
private static final ExecutorService executorService = Executors
.newFixedThreadPool(
Runtime.getRuntime().availableProcessors()
);
private final PostRepository postRepository;
private final TransactionTemplate transactionTemplate;
private final int batchProcessingSize;
public ForumService(
@Autowired PostRepository postRepository,
@Autowired TransactionTemplate transactionTemplate,
@Autowired int batchProcessingSize) {
this.postRepository = postRepository;
this.transactionTemplate = transactionTemplate;
this.batchProcessingSize = batchProcessingSize;
}
@Transactional(propagation = Propagation.NEVER)
public void createPosts(List<Post> posts) {
CollectionUtils.spitInBatches(posts, batchProcessingSize)
.map(postBatch -> executorService.submit(() -> {
try {
transactionTemplate.execute((status) ->
postRepository.persistAll(postBatch)
);
} catch (TransactionException e) {
LOGGER.error("Batch transaction failure", e);
}
}))
.forEach(future -> {
try {
future.get();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} catch (ExecutionException e) {
LOGGER.error("Batch execution failure", e);
}
});
}
}

There are several aspects worth mentioning about the ForumService class:

  • The createPosts uses the @Transactional(propagation = Propagation.NEVER) because the transaction management is handled by each batch processing task.
  • The CollectionUtils.spitInBatches is used to split a List<T> into a Stream<List<T>> where each Stream element has at most batchProcessingSize elements.
  • The Post entities are persisted using the persistAll method from the BaseJpaRepository.

Since Hibernate does not use JDBC-batching by default, we will set the following Hibernate configurations in the Java-based Spring Bean configuration:

protected Properties additionalProperties() {
Properties properties = new Properties();
properties.setProperty(
"hibernate.jdbc.batch_size",
String.valueOf(batchProcessingSize())
);
properties.setProperty(
"hibernate.order_inserts",
"true"
);
properties.setProperty(
"hibernate.order_updates",
"true"
);
return properties;
}

Testing time

When sending a large chunk of Post entities to the createPosts method:

List<Post> posts = LongStream.rangeClosed(1, POST_COUNT)
.mapToObj(postId -> new Post()
.setTitle(
String.format("High-Performance Java Persistence - Page %d",
postId
)
)
.setStatus(PostStatus.PENDING)
)
.toList();
forumService.createPosts(posts);

In the YugabyteDB Tablet Servers view, the Write ops/sec columns indicate that all three YugabyteDB nodes are used equally to save the post table records:

YugabyteDB Cluster Tablet Servers Writes

So, the write operations are propagated to all nodes, not just to a single Primary node, as it’s the case with the Single-Primary Replication strategy.

This is because YugabyteDB employs a shared-nothing architecture where each node is the primary for a given data subset. For instance, the first node can be the owner of Post with the id value of 1 and handle read and write operations associated with this record.

The other two nodes will keep a copy of this Post with the id value of 1 for high-availability purposes. At the same time, the second node can be the owner of the Post record with the id value of 2, and the other two nodes will be keeping a copy of this particular record.

Therefore, the YugabyteDB cluster is not used only for writing. The three nodes are also used also when executing read operations because the data is automatically sharded.

For instance, if we fetch 1000 Post entities:

LongStream.rangeClosed(1, 1000)
.boxed()
.forEach(id ->
assertNotNull(forumService.findById(id)
)
);

We can see in the Table Servlets that all nodes were being used when fetching these records:

YugabyteDB Cluster Tablet Servers Reads

Cool, right?

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

HPJP_Video_Vertical_h200.jpg

HPSQL_Video_Course_h200.jpg

And there is more!

You can earn a significant passive income stream from promoting all these amazing products that I have been creating.

If you're interested in supplementing your income, then join my affiliate program.

Conclusion

Most often, a Spring application uses a Single-Primary replicated relational database since this has been the typical replication architecture employed by PostgreSQL, MySQL, Oracle, or SQL Server.

However, scaling a single-primary replicated database requires a read-write and read-only transaction routing strategy. For this reason, reads will be scaled horizontally on Replica nodes, while writes can only be scaled vertically on the Primary node.

A YugabyteDB cluster doesn’t have this limitation since it can scale both the read and write operations across the entire cluster, making it easier for you to scale a Spring application.

Transactions and Concurrency Control eBook

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK