HashMap vs TreeMap: Put

In previous posts, we introduced the get operation, on the Map collection, comparing how  HashMap and TreeMap behaves.

In this post, we are going to compare HashMap and TreeMap performance using the put operation.

TRY IT YOURSELF: You can find the source code of this post here.

Java Collections Map Series

If you like this post and are interested in hearing more about my journey as a Software Engineer, you can follow me on Twitter and travel together.

Environment

  • JDK 11.0.3, OpenJDK 64-Bit Server VM, 11.0.3+7-Ubuntu-1ubuntu218.04.1
  • Ubuntu 18.04
  • Memory 15,6 GiB
  • Processor Intel® Core™ i7-7500U CPU @ 2.70GHz × 4
  • Graphics Intel® HD Graphics 620 (Kaby Lake GT2)
  • OS 64-bit
  • Disk 387,9 GB
  • Intellj IDE Community 2018.3

Approach

  • We are going to use JMH for the micro benchmark.
  • We are going to use a list of 10.000.000 of elements, generated randomly, and we are going to calculate the average response time in 10 iterations.
  • As JMH doesn’t allow sharing data between benchmarks, we are going to create a file with the list of elements, and read from it by each benchmark. In that way, we guarantee we are measuring the same scenario.

Putting an Entry – put(key, value)

Now, let’s see the code for measuring how HashMap and TreeMap behave when we need to put an entry:

    @Test
  public void put()
          throws RunnerException, IOException {
    Utils.populateToFile();

    Options opt = new OptionsBuilder()
            .include(
                    HashMapVsTreeMapPutTest.class
                            .getSimpleName())
            .detectJvmArgs()
            .build();

    Collection runResults = new Runner(opt)
            .run();

    Result hashMapResult = Utils
            .find("hashMap",
                    runResults);
    Result treeMapResult = Utils
            .find("treeMap",
                    runResults);

    assertThat(hashMapResult.getScore())
            .isLessThan(treeMapResult.getScore());
  }

  @Benchmark
  @Measurement(iterations = Utils.amountIterations, batchSize = 1, time = 1)
  @BenchmarkMode(Mode.SingleShotTime)
  @Fork(1)
  @Warmup(iterations = 10)
  @OutputTimeUnit(TimeUnit.MILLISECONDS)
  public HashMap hashMap(
          HashMapExecutionPlan executionPlan) {
    HashMap hashMap = executionPlan.hashMap;

    hashMap.put(amountData, amountData);

    return hashMap;
  }

  @Benchmark
  @Measurement(iterations = Utils.amountIterations, batchSize = 1, time = 1)
  @BenchmarkMode(Mode.SingleShotTime)
  @Fork(1)
  @Warmup(iterations = 10)
  @OutputTimeUnit(TimeUnit.MILLISECONDS)
  public TreeMap treeMap(
          TreeMapExecutionPlan executionPlan) {
    TreeMap treeMap = executionPlan.treeMap;

    treeMap.put(amountData, amountData);

    return treeMap;
  }

Now, let’s see the results:

HashMap vs TreeMap performance, put operation

TreeMap and HashMap are pretty fast, besides, it is hard to say with this data set which one performs better. Let’s see how they internally work.

HashMap

Putting an entry into the HashMap follows the same process as getting a value, however, there are cases where the internal array needs to be resized to adapt to the amount of entries on the Map.

There are two important properties to take into account in the HashMap configuration regarding the resizing: load factor and initial capacity:

  • Initial capacity: it is the initial size of the array. This size changes over time (resizing) depending on the amount of entries inside the HashMap. It usually duplicates itself based on the load factor.
  • Load factor: it is a percentage, telling the max amount of entries allowed inside the HashMap before a resizing. By default, it is 0.75, which means, if the HashMap has 10 buckets in the array, the collection will accept 10 * 0.75 = 7 entries before resizing. If a 8 entry arrives, the array will resize.

This two properties are pretty important when the performance of the HashMap needs to be aligned to some use cases. In the normal cases, you will not need to change the default values as they perform pretty well.

Now, let’s use the following example for HashMap and see how the put operation works:

NOTE: The HashMap state could not be exactly the same for the given entries, this is just an example.

HashMap example

There, we see the current state of the HashMap, where the number is the hash code of the key. Also, we have a initial capacity of 7 and a load factor of 0.75. Now, we want to put the “(98-C, Art)” entry. The following is the general algorithm HashMap does to put the key

Note: The algorithm is pretty similar to the get operation we saw in HashMap vs TreeMap: Get and Contains key.

  1. Get the hash code of the key you are putting in. For this case, as the key is a String, the algorithm uses the String.hashCode() method. The value will be 98.
int code = "C".hashCode()

2. Now, we need to find the right index for the 98 hash code, over the HashMap array. The HashMap array has 7 positions, so, an xor operation is performed (the details of how it works it is not necessary). Let’s suppose we get 0, so, we choose the 0 index to put the “C” entry:

Finding the current index using the hash code

3. Next, we need to check if the linked list on the 0 index, contains the key “C” we are putting in. We use the String.equals() method to check if the key exists or not:

Search the key on the linked list using equals method

If the key doesn’t exist, we add the new entry, at the end of the linked list as show below:

Inserting the new entry

If the key exists, HashMap updated the value.

4. Validate resizing: if the size adding the new entry is greater than the capacity * load factor, a resizing process starts. The capacity is duplicated, and the entries need to be distributed again over the new array. This distribution uses the same put algorithm and its cost is the amount of entries on the HashMap. The initial capacity and load factor helps to define when a resizing process applies, that’s why you should calibrate those values if performance is pretty important for your use cases.

We can conclude:

  • hashCode and equals are pretty important on the key object.
  • Accessing the current index on the array, requires O(1).
  • Searching where to put the entry over the linked list, requires O(M), where M is the size of the linked list. This size should be lower than N, which is the total elements on the Map.
  • If the HashMap doesn’t balance well because the hashCode is not well generated, the worst case scenario could be that all the elements will be in the same index, which will require O(N) to put an entry.
  • Also, the memory space of a HashMap depends on the array size (S) plus the amount of entries (N). The array could have empty positions, which is a waste of space. Initial capacity and load factor are important to define that.
  • And finally, if the HashMap needs to resize, the cost is O(N), as all the entries need to be moved again.

TreeMap

Putting an entry into the TreeMap follows the same process as getting a value, however, as the TreeMap is a red-black tree, a binary tree who balances itself, the tree will require to move entries around to comply to the red-black tree properties like being a binary tree, plus others.

Let’s use the following example for the TreeMap:

TreeMap example

There, we see the current state of the TreeMap. As we see, the tree is not balanced as we are missing one leave on the C node. The following is the general algorithm TreeMap does to put the entry “(E, Bio)”:

  1. We start on the root. We compare “E” with “B” keys using the Comparable interface "E".compareTo("B"). This will return if E is lower, greater or equal to B, and based on that, we choose a side of the binary tree. In this case, we choose right, as E is greater than B:
Next node to check is C

2. We repeat the process from C node. We compare “E” with “C” keys using the Comparable interface "E".compareTo("C") and we choose right, as E is greater than C:

Next node to check is D

3. Again, we repeat the process from D node. We compare “E” with “D” keys using the Comparable interface "E".compareTo("D") and we choose right, as E is greater than D. As D doesn’t have children, we put the entry E as his right child:

Insert entry E as a child of D

5. Now, as you see, the tree is unbalance as D entry must have two children, and only have one, same as C. In this case, the TreeMap needs to re balance itself. In this case, C, D and E nodes will be moved one position to left, ending with the following tree structure:

NOTE: The red-black tree has several conditions on when to re balance itself, and how, what we show here is only one case.

Rotate entries C, D and E to left

3. Finally, we have a balanced binary tree.

NOTE: As you can see, the implementation of Comparable or Comparator are pretty important to get a TreeMap working well.

We can conclude:

  • Comparable or Comparator are pretty important on the key object.
  • An error will be thrown if you try to create a TreeMap where the key is not Comparable, or you didn’t set a Comparetor object.
  • TreeMap is a binary tree, so, it requires O(log n) to put a key.
  • Also, the memory space is N, as the number of entries in the Map.
  • Putting an entry on the map uses the same algorithm than getting a entry, plus the cost of re balancing, which is mostly constant.

Summary

This is a summary of the Big O complexity in put operation between HashMap and TreeMap:

HashMapTreeMap
OperationMethodBestWorstBestWorst
Putput(key, value)O(1)O(n)O(1)O(log n)

Final Thought

As we saw, HashMap and TreeMap are pretty fast regarding a put operation. Also, take into account that if you need to traversal the map in order, TreeMap should be the first option.

If you like this post and are interested in hearing more about my journey as a Software Engineer, you can follow me on Twitter and travel together.

Advertisement

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s