Optimizing Output File Testing in Spring Batch

Jonny Hackett December 12, 2023 Java, Spring, Spring Batch, Testing Leave a Comment

It’s quite common to build Spring Batch jobs in which the output is a file for distribution to another team, or to another business. These text files can be in various formats from delimited, fixed length, XML, or some other structure such as an MT950 formatted file (common in financial institutions).

In a previous article, I discussed testing practices using Mockito, but they were primarily focused on testing the business logic that’s usually found in the ItemProcessor or services called by the ItemProcessor.

However, this time, we’ll shift our focus to the testing of the output component of our batch jobs. To illustrate this, we will walk through an example, examining how to efficiently test the results generated by our batch job. Our example centers on a batch job that extracts a catalog of books from a database and produces a file as its output. We will then provide insights into the testing methodology, making the process more robust and less error-prone.

The Example

The batch job I’m going to use as an example in this article will read a catalog of books from a database and generate a file as its output. For the first part of this example code, we’re going to create an ItemReader that calls a service to obtain a list of books.

Disclaimer, I’m not going to list the code for the service because we’re going to mock the return values that it provides in our unit test. This service could connect to a database to query the results or obtain the results from an API call. In this example, the implementation isn’t important.

package com.keyhole.blog.batch;


import java.util.List;
import org.springframework.batch.core.annotation.BeforeStep;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.NonTransientResourceException;
import org.springframework.batch.item.ParseException;
import org.springframework.batch.item.UnexpectedInputException;
import org.springframework.batch.item.support.IteratorItemReader;
import org.springframework.beans.factory.annotation.Autowired;


public class BookReader implements ItemReader<Book> {
	
	@Autowired
	private BookService bookService;
	
	private IteratorItemReader<Book> delegateReader;
	
	
	@BeforeStep
	public void beforeStep() {
List<Book> bookResults = this.bookService.findAllBooksByAuthorLastName("King");
		this.delegateReader = new IteratorItemReader<>(bookResults);
	}


	@Override
	public Book read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
		return this.delegateReader.read();
	}
}

For the Book object, I kept it simple for the sake of our example.

package com.keyhole.blog.batch;


public class Book {


	private String title;
	private String authorFirstName;
	private String authorLastName;
	private String publicationDate;


	public String getTitle() {
		return title;
	}


	public void setTitle(String title) {
		this.title = title;
	}


	public String getAuthorFirstName() {
		return authorFirstName;
	}


	public void setAuthorFirstName(String authorFirstName) {
		this.authorFirstName = authorFirstName;
	}


	public String getAuthorLastName() {
		return authorLastName;
	}


	public void setAuthorLastName(String authorLastName) {
		this.authorLastName = authorLastName;
	}


	public String getPublicationDate() {
		return publicationDate;
	}


	public void setPublicationDate(String publicationDate) {
		this.publicationDate = publicationDate;
	}


	@Override
	public String toString() {
		return "Book [title=" + title + ", authorFirstName=" + authorFirstName + ", authorLastName=" + authorLastName
				+ ", publicationDate=" + publicationDate + "]";
	}


	public Book(String title, String authorFirstName, String authorLastName, String publicationDate) {
		super();
		this.title = title;
		this.authorFirstName = authorFirstName;
		this.authorLastName = authorLastName;
		this.publicationDate = publicationDate;
	}


}

Since we’re not implementing any business logic, we don’t need to create an ItemProcessor. We are going to be outputting the results to a pipe-delimited file using a standard FileItemWriter, so we can build the writer in the job configuration like this:

@Bean
public FlatFileItemWriter<Book> bookExtractWriter()
{
	Resource outputResource = new FileSystemResource(Path.of(System.getProperty("java.io.tmpdir"), "/BookListing.txt"));
	return new FlatFileItemWriterBuilder<Book>().delimited()
	.delimiter("|")
	.names("title", "authorFirstName","authorLastName","publicationDate")
	.saveState(false)
	.name("bookExtractWriter")
	.resource(outputResource)
	.build();
}

And here’s the full job configuration.

package com.keyhole.blog.batch;
import java.nio.file.Path;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.item.file.FlatFileItemWriter;
import org.springframework.batch.item.file.builder.FlatFileItemWriterBuilder;
import org.springframework.batch.support.transaction.ResourcelessTransactionManager;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.FileSystemResource;
import org.springframework.core.io.Resource;

@Configuration
public class BookExtractJobConfig {
		
	
	@Bean
	public BookService bookService() {
		return new BookServiceImpl();
	}
	
	@Bean
	@StepScope
	public BookReader bookReader() {
		return new BookReader();
	}
	
	@Bean
	public Step processBooksStep(StepBuilderFactory stepFactory) {
		return stepFactory.get("processBooksStep")
		.transactionManager(new ResourcelessTransactionManager())
		.<Book,Book>chunk(500)
		.reader(bookReader())
		.writer(bookExtractWriter()).build();
		
	}
	
	@Bean
	public Job bookExtractJob(JobBuilderFactory jobBuilderFactory) {
		return jobBuilderFactory.get("BookExtractJob")
				.start(processBooksStep(null))
				.build();
	}
	
	@Bean
	public FlatFileItemWriter<Book> bookExtractWriter()
	{
		Resource outputResource = new FileSystemResource(Path.of(System.getProperty("java.io.tmpdir"), "/BookListing.txt"));
		return new FlatFileItemWriterBuilder<Book>().delimited()
		.delimiter("|")
		.names("title", "authorFirstName","authorLastName","publicationDate")
		.saveState(false)
		.name("bookExtractWriter")
		.resource(outputResource)
		.build();
	}

}

Testing

As you can see, it’s a pretty simple job that calls a service to receive data and then outputs the results to a pipe-delimited file. To test the output file, I want to introduce a couple of options.

The first is a class provided in Spring Batch Test that allows you to assert that the two files are the same. The code looks like this:

AssertFile.assertFileEquals(actualFile, expectedFile);

The other option, which I think is better, is using Apache’s Commons IO IOUtils. It has a couple of options for verifying that the contents of two files are identical, one of which ignores the end-of-line characters. The code looks something like this:

try (BufferedReader outputReader = Files.newBufferedReader(actualFile.toPath());
BufferedReader acceptanceReader = Files.newBufferedReader(expectedFile.toPath())) {
	// verify that the contents two files match
Assertions.assertThat(IOUtils.contentEquals(outputReader, acceptanceReader)).isTrue();
	// if you want to verify the contents of the two files match but ignore EOL
	// characters
	Assertions.assertThat(IOUtils.contentEqualsIgnoreEOL(outputReader, acceptanceReader)).isTrue();


}

This has made comparing the output of text files much easier than what I did previously, which was reading each file and comparing them line by line. Or if the output was JSON or XML, parsing those files into objects and comparing the results.

If we did have business logic that was being applied to the data before sending it to the FlatFileItemWriter, we could create multiple scenarios in the mocked data that would exercise all of the variations that would generate different file output results. You still may find that you could benefit from unit tests covering the business logic in services that transform the data, but this is a good place to start.

This strategy works great if the files are text-based, such as delimited, fixed length, XML, JSON, or any other plain text data file. But what if you want to compare MS Excel files? Unfortunately, I haven’t found as simple of a solution at this time for Excel. Because of their nature, you’ll need to read them using Apache POI (or another library suitable for reading MS Excel files) and compare the sheets, rows, and cells for equality.

For your convenience, the full code for the unit test is found at the end of this blog.

Conclusion

Testing the output of Spring Batch jobs offers a significant advantage in ensuring the quality and reliability of your data processing workflows. In your Spring Batch journey, effective testing practices are pivotal, and the strategies outlined in this article can serve as a valuable starting point for building robust and reliable batch-processing applications.

I hope you found this post useful! Please let me know if you have questions in the comments section, and check out my other posts surrounding Spring and Spring Batch on the Keyhole Dev Blog.

Unit Test Full Code

package com.keyhole.blog.batch;
import static org.mockito.Mockito.when;
import java.io.BufferedReader;
import java.io.File;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.List;

import org.apache.commons.io.IOUtils;
import org.assertj.core.api.Assertions;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.batch.core.ExitStatus;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.test.AssertFile;
import org.springframework.batch.test.JobLauncherTestUtils;
import org.springframework.batch.test.JobRepositoryTestUtils;
import org.springframework.batch.test.context.SpringBatchTest;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.mock.mockito.MockBean;
import org.springframework.core.io.ClassPathResource;
import org.springframework.test.context.junit.jupiter.SpringJUnitConfig;
import com.keyhole.blog.DemoApplication;

@SpringBatchTest
@SpringJUnitConfig(classes = DemoApplication.class)
class BookExtractJobTest {

	@Autowired
	@Qualifier(value = "bookExtractJob")
	private Job job;

	@Autowired
	private JobLauncherTestUtils jobLauncherTestUtils;

	@Autowired
	private JobRepositoryTestUtils jobRepositoryTestUtils;

	@MockBean
	private BookService bookService;

	@BeforeEach
	void setUp() throws Exception {
		// setup the test execution
		this.jobLauncherTestUtils.setJob(this.job); // this is optional if the job is unique
		this.jobRepositoryTestUtils.removeJobExecutions();
		// define the mocking for the book service
		when(this.bookService.findAllBooksByAuthorLastName("King")).thenReturn(mockedBookList());

	}

	private List<Book> mockedBookList() {
		// Create the objects used in the results returned by the mocked service
		List<Book> books = new ArrayList<>();
		books.add(new Book("The Shining", "Stephen", "King", "1977-01-08"));
		books.add(new Book("Dreamcatcher", "Stephen", "King", "2001-02-20"));
		books.add(new Book("Pet Sematary", "Stephen", "King", "1983-11-14"));
		books.add(new Book("The Stand", "Stephen", "King", "1978-10-03"));
		books.add(new Book("Salem's Lot", "Stephen", "King", "1975-10-17"));
		books.add(new Book("Joyland", "Stephen", "King", "2013-06-04"));
		return books;
	}

	@Test
	void test() throws Exception {
		// given
		JobParameters jobParameters = this.jobLauncherTestUtils.getUniqueJobParameters();

		// when
		JobExecution jobExecution = this.jobLauncherTestUtils.launchJob(jobParameters);

		// then
		// Verify that the job completed successfully.
		Assertions.assertThat(ExitStatus.COMPLETED).isEqualTo(jobExecution.getExitStatus());

		// define the files that we'll be comparing for verification.
		File actualFile = new File(System.getProperty("java.io.tmpdir").concat("BookListing.txt"));
		File expectedFile = new ClassPathResource("BookListing-approval.txt").getFile();

		// verify using the Spring Batch assertion 
		// this is deprecated in future versions due to better options
		AssertFile.assertFileEquals(actualFile, expectedFile);

		// create the buffered readers for each of the files being compared.
		try (BufferedReader outputReader = Files.newBufferedReader(actualFile.toPath());
				BufferedReader acceptanceReader = Files.newBufferedReader(expectedFile.toPath())) {
			// verify that the contents two files match
			Assertions.assertThat(IOUtils.contentEquals(outputReader, acceptanceReader)).isTrue();
			// if you want to verify the contents of the two files match but ignore EOL
			// characters
			Assertions.assertThat(IOUtils.contentEqualsIgnoreEOL(outputReader, acceptanceReader)).isTrue();

		}

	}

}

Optimizing Output File Testing in Spring Batch

Optimizing Output File Testing in Spring Batch

The Example

Testing

Conclusion

Unit Test Full Code

Recommend

Using ML.NET to Build Machine Learning Models

What are the Benefits of Outsourcing Software Development for Your Business?

.NET的各种对象在内存中如何布局[博文汇总]

Go Developer Survey 2023 H2 Results

Java, Maven and Gitpod: Getting Started

你知道.NET的字符串在内存中是如何存储的吗？

MinIO Console — MinIO Object Storage for Linux

Five common story mapping mistakes

GNU Affero General Public License

Alternatives to Accountability for Cross-functional Teams

About Joyk