3

Scalable Programming

 1 year ago
source link: https://devm.io/java/scalable-programming-java-001
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

The right algorithm in the right place — binary search and sorting algorithms

Scalable Programming

15. Feb 2023


Java continuously introduces new, useful features. For instance, Java 8 introduced the Stream API, one of the biggest highlights of the past few years. But is aggregating data with the Stream API a panacea? In this article, I’d like to explore if there’s a better alternative for certain cases from a complexity perspective.

Some of you have probably used the following code in a program in order to spontaneously measure a logic’s runtime:

long start = System.currentTimeMillis();
doSomething();
long time = System.currentTimeMillis() - start;

Clearly, it's easy to implement and you can quickly check the code’s speed. But there are also some disadvantages. First, the measured values can contain uncertainties as they can be influenced by other processes running on the same machine. Second, you can’t compare the readings with other readings taken from different environments. Declaring that one solution is faster than the other isn’t helpful if they were measured on different machines with different CPUs and RAM. Third, it’s difficult to estimate how the runtime could extend when working with larger amounts of data in the future. It’s become much easier to filter and aggregate data since the Stream API was introduced in Java 8. The Stream API even opens up the possibility to parallelize processing [1]. But do these solutions continue to perform when you need to work with 10 or 100 times the amount of data? Is there a measurement that we can use to answer this question?

Time complexity

Time complexity is a measurement for roughly estimating the time efficiency of an algorithm. It focuses on how runtime increases as the input gets longer. For example, if you iterate a list of n elements with a for loop, then n and the runtime have a linear relationship. If you have multiple for loops nested and executed n times each, then this logic has an exponential effect on runtime.

Big O notation is a way to represent the relationship between the input length and the runtime. A linear relationship is represented by O(n), O(n²) represents a quadratic relationship, where n is the input’s length. If the runtime is independent of the input’s length and is constant, then we write O(1). Figure 1 shows typical big O notation values for how the runtime grows as the input’s length increases.

Fig. 1

Fig. 1: The relationship between runtime and input length per time complexity.

There are two important rules for representation using big O-notation:

  • Only the term with the highest degree is considered. For example: If the time complexity is n + nlogn + n², simply write O(n²), as the term has the strongest effect on runtime.
  • The coefficient is not considered. For example, the time complexity of 2n², 3n², and ½n² is equal to O(n²).

It’s important to emphasize that time complexity only focuses on scalability. Especially when n is a smaller value, one algorithm may have a longer runtime even if it has a better time complexity than others.

Space complexity

In addition to time complexity, there’s another measure for representing an algorithm’s efficiency: space complexity. It looks at how memory requirements grow as the input’s length increases. When you copy a list with n elements into a new list, the space complexity is O(n) because the need for additional memory increases linearly when you work with a larger input list. If an algorithm only needs a constant amount of memory, regardless of the input length, then the space complexity is O(1).

There’s often a trade-off relationship between time complexity and space complexity. Depending on the case, when comparing multiple algorithms, it’s important to consider if runtime or memory is more important.

Binary search

As shown in Figure 1, an algorithm with time complexity O(logn) has better time performance than O(n). Binary search is one of the algorithms with this time complexity. It’s applicable when you want to search for a target value from a sorted list. In each operation, the algorithm compares if the target value is in the left or right half of the search area. For example, imagine a dictionary. You probably won’t start on the first page of the dictionary to find the word you’re looking for. You’ll open up to a page in the middle of the book and start searching from there.

Fig. 2

Fig. 2: Binary search sequence

Figure 2 shows how the binary search proceeds when searching for the target value 7 in a list of eleven elements. The element marked in red represents the middle of the current operation’s search area. If the number of elements in the search area is an even number, then it takes the “left” element in the middle. In each operation, you compare if the target value (7, in this case) is less than or greater than the middle. Cut the search area in half until you reach the target value.

log2n is the maximum number of necessary comparison operations to find the target value with the binary search, where n is the length of the input list. Let’s take n = 8 as an example. The length of the search area starts with 8 and decreases to 4 after the first operation. After the second operation, it is divided in half again to 2 and after the third operation, there’s just one value in the search area. From this example, we can conclude that the number of operations needed is at most a logarithm of 8 to the base 2 (log28 = 3), because 23= 8. In big O notation, we omit the base and write only O(logn).

In Java, implementation of binary search is found in the java.util.Arrays.binarySearch [2] and java.util.Collections.binarySearch methods [3]. If you work with an array, you can use the methods in the java.util.Arrays. If you work with a list, then the methods in the class java.util.Collections are applicable.

Sorting algorithm There are several kinds of sorting algorithms, each with different time complexities and space complexities. In practice, typical sorting algorithms used are Quicksort, Mergesort, and their variants. On average, the time complexity of these two methods is O(nlogn) [4], [5]. There are also sorting algorithms with better time complexities, but these often have limitations in the...


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK