4

The Spring Data findAll Anti-Pattern

 2 years ago
source link: https://vladmihalcea.com/spring-data-findall-anti-pattern/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

The Spring Data findAll Anti-Pattern

Last modified: Sep 13, 2022

Imagine having a tool that can automatically detect JPA and Hibernate performance issues. Wouldn’t that be just awesome?

Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.

So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!

Introduction

In this article, we are going to see why the Spring Data findAll method is actually a terrible Anti-Pattern.

A picture is worth 1000 words:

You’d be surprised how many times I’ve seen this issue where tons of data was fetched from the DB just to filter it in Java and throw away the vast majority of records that were not needed for that particular business requirement.

Anti-Pattern

First, let’s start with the Anti-Pattern definition, as given by Wikipedia:

An anti-pattern in software engineering, project management, and business processes is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive.

That’s exactly what you get when you offer a findAll method in the Spring Data Repository base interface that all Data Access Objects will automatically inherit.

The Spring Data Repository hierarchy

Usually, when you create a Spring Data JPA Repository, your custom interface extends the JpaRepository:

@Repository
public interface PostRepository
extends JpaRepository<Post, Long> {
}

However, the JpaRepository provides a series of methods that it either declared directly or inherited from the underlying base interfaces, as illustrated by the following diagram:

Spring Data findAll Anti-Pattern

Even if the JpaReository didn’t provide a findAll method explicitly, it would still inherit one from the CrudRepository that’s shared by all Spring Data modules.

The Spring Data findAll Anti-Pattern

The problem with the findAll method is that, as an API designer, you should not really offer your clients a way to fetch an entire database table because, if you do, some developers are actually going to do that.

Let’s assume we have the following Post and Tag entities in our system:

List<Tag> tags = List.of(
new Tag()
.setId(1L)
.setName("JDBC"),
new Tag()
.setId(2L)
.setName("JPA"),
new Tag()
.setId(3L)
.setName("Hibernate")
);
tags.forEach(tag -> entityManager.persist(tag));
for (long i = 1; i <= POST_SIZE; i++) {
entityManager.persist(
new Post()
.setId(i)
.setTitle(
String.format(
"High-Performance Java Persistence, Part %d",
i
)
)
.addTag(tags.get((int) i % 3))
);
}

We want to get the titles of all the JDBC or Hibernate posts, so how can we do that?

The easiest way is, of course, to use a single SQL query that looks as follows:

public List<String> findPostTitleByTags(List<String> tags) {
return entityManager.createNativeQuery("""
select p.title
from post p
where exists (
select 1
from post_tag pt
join tag t on pt.tag_id = t.id and pt.post_id = p.id
where t.name in (:tags)
)
order by p.id
""")
.setParameter("tags", tags)
.getResultList();
}

But, not every Java developer enjoys solving this problem using a simple SQL query. Some of them will do whatever is humanly possible to avoid writing any SQL query.

And, since we are just one findAll method call away from “solving” this issue, it won’t take long until someone provides the following solution:

List<String> postTitlesStreamRecords = postRepository.findAll()
.stream()
.filter(
post -> post.getTags().stream()
.map(Tag::getName)
.anyMatch(matchingTags::contains)
)
.sorted(Comparator.comparing(Post::getId))
.map(Post::getTitle)
.collect(Collectors.toList());

When executing the findAll workaround, Hibernate will execute the following queries:

Query:["select post0_.id as id1_1_, post0_.title as title2_1_ from post post0_"], Params:[()]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(1)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(2)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(3)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(4)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(5)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(6)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(7)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(8)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(9)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(10)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(11)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(12)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(13)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(14)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(15)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(16)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(17)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(18)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(19)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(20)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(21)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(22)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(23)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(24)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(25)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(26)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(27)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(28)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(29)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(30)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(31)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(32)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(33)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(34)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(35)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(36)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(37)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(38)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(39)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(40)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(41)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(42)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(43)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(44)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(45)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(46)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(47)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(48)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(49)]
Query:["select tags0_.post_id as post_id1_4_0_, tags0_.tag_id as tag_id2_4_0_, tag1_.id as id1_5_1_, tag1_.name as name2_5_1_ from post_tag tags0_ inner join tag tag1_ on tags0_.tag_id=tag1_.id where tags0_.post_id=?"], Params:[(50)]

Not nice!

The reason why you see so many secondary queries is due to filter predicate that needs to fetch the tags collection in order to match the provided tags.

Preventing the Spring Data fetchAll Anti-Pattern

The findAll method should not be offered by default, and that’s why the HibernateRepository from the Hibernate Types project, deprecates it.

More, if you call it, it will throw an UnsupportedOperationException.

To make sure the findAll method is never called by mistake, just make your Repository classes extend the HibernateRepository interface:

@Repository
public interface PostRepository extends
HibernateRepository<Post>,
JpaRepository<Post, Long>, CustomPostRepository {
}

The HibernateRepository needs to precede the JpaRepository as otherwise, IntelliJ IDEA will not strike through the findAll method call.

And, we also have to provide the com.vladmihalcea.spring.repository package to the @EnableJpaRepositories annotation:

@EnableJpaRepositories(
basePackages = {      
"com.vladmihalcea.spring.repository",
...
}
)

This way, you will instantly see that the findAll method is to be avoided since it’s striked through now:

HibernateRepository FindAll

However, if you know for sure that for specific use cases it’s fine to have a findAll method, then you can simply override it as follows:

@Repository
public interface PostRepository extends HibernateRepository<Post>,
JpaRepository<Post, Long>, CustomPostRepository {
@Query("from Post")
@Override
List<Post> findAll();
}

This way, you will explicitly specify that the findAll method is really needed, although I doubt this will be the case in most non-trivial applications.

For more details about the HibernateRepository, check out this article.

The Hibernate implicit polymorphic query Anti-Pattern

The findAll Anti-Pattern reminds me of a lesser-known “feature” of Hibernate that allows you to select the entire database with a single query:

//Don't try this at home!
List<Object> objects = entityManager.createQuery("""
select o
from java.lang.Object o
""")
.getResultList();

The reason why this query works is because of the implicit polymorphic query feature offered by Hibernate. The reason why you haven’t seen this Anti-Pattern used in practice is that the Hibernate developers didn’t think it was a good idea to encourage fetching the entire database at once, so they omitted it from the documentation.

Internally, Hibernate team actually used such a query during testing:

protected void cleanupTestData() {
doInHibernate(this::sessionFactory, session -> {
session.createQuery(
"delete from java.lang.Object"
).executeUpdate();
});
}

But, since this was limited to the Hibernate integration tests, there was no problem using such a dangerous query. And, what’s interesting is that not all Hibernate tests used it. Some of Hibernate integration tests were simply dropping and recreating the DB schema in between tests, so there was no point in deleting all records.

I'm running an online workshop on the 15th of September about High-Performance SQL.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

Conclusion

The Spring Data findAll method follows exactly the Wikipedia definition of an Anti-Pattern, and it’s much better if you avoid it by default because, this way, you can prevent some very inefficient data access operations that will probably be overlooked until you have enough data in your production system.

Nevertheless, just because you deprecated the default findAll method, it doesn’t mean you cannot explicitly declare it. However, this way, you will tell other developers that you really know what you are doing, and you can still enjoy loading an entire database table in your application memory.

So, going back to the definition of an Anti-Pattern:

An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive.

We can conclude that the findAll method is ineffective and risks being highly counterproductive, and if you offer it by default, then it risks becoming a common response to a recurring problem.

Transactions and Concurrency Control eBook

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Comment *

Before posting the comment, please take the time to read the FAQ page

Name *

Email *

Website

Notify me of follow-up comments by email.

This site uses Akismet to reduce spam. Learn how your comment data is processed.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK