Deciding When to Use a Hash Table

2 years ago

source link: https://www.codesd.com/item/deciding-when-to-use-a-hash-table.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Deciding When to Use a Hash Table

advertisements

I was soving a competitive programming problem with the following requirements:

I had to maintain a list of unqiue 2d points (x,y), the number of unique points would be less than 500.

My idea was to store them in a hash table (C++ unordered set to be specific) and each time a node turned up i would lookup the table and if the node is not already there i would insert it.

I also know for a fact that i wouldn't be doing more than 500 lookups. So i saw some solutions simply searching through an array (unsorted) and checking if the node was already there before inserting.

My question is is there any reasonable way to guess when should i use a hash table over a manual search over keys without having to benchmark them?

My question is is there any reasonable way to guess when should i use a hash table over a manual search over keys without having to benchmark them?

I am guessing you are familiar with basic algorithmics & time complexity and C++ standard containers and know that with luck hash table access is O(1)

If the hash table code (or some balanced tree code, e.g. using std::map - assuming there is an easy order on keys) is more readable, I would prefer it for that readability reason alone.

Otherwise, you might make some guess taking into account the approximate timing for various operations on a PC. BTW, the entire http:///norvig.com/21-days.html page is worth reading.

Basically, memory accesses are much more slow than everything else in the CPU. The CPU cache is extremely important. A typical memory access with cache fault requiring fetching data from DRAM modules is several hundreds times slower than some elementary arithmetic operation or machine instruction (e.g. adding two integers in registers).

In practice, it does not matter that much, as long as your data is tiny (e.g. less than a thousand elements), since in that case it is likely to sit in L2 cache.

Searching (linearly) in an array is really fast (since very cache friendly), up to several thousand of (small) elements.

IIRC, Herb Sutter mentions in some video that even inserting an element inside a vector is practically -but unintuitively- faster (taking into account the time needed to move slices) than inserting it into some balanced tree (or perhaps some other container, e.g. an hash table), up to a container size of several thousand small elements. This is on typical tablet, desktop or server microprocessor with a multimegabyte cache. YMMV.

If you really care that much, you cannot avoid benchmarking.

Notice that 500 pairs of integers is probably fitting into the L1 cache!

Recommend

nosferalatu.com 4 years ago
Cache

A Simple GPU Hash Table

I released a new project A Simple GPU Hash Table on Github. It is a simple GPU hash table capable of hundreds of mill...

rachelbythebay.com 3 years ago
Cache

Freezers as an analogy for hash table behavior

Freezers as an analogy for hash table behavior While I was out of town on vacation recently, there was a cluster of advisories about malicious HTTP GET and POST requests. In short, crafting the right sort of request and directing...

rachelbythebay.com 3 years ago
Cache

Hash table attacks and overzealous analytics

Hash table attacks and overzealous analytics There's a hash table attack making the rounds this week. It involves what happens inside a bunch of web frameworks and other librarie...

www.pkslow.com 3 years ago
Cache

How to partition the big table in PostgreSQL - Range List Hash

1 Why we need partitioning Table Partitioning can resolve some performance issue caused by big tables, such as query latency. If the size of one table is reaching to the memory of the machine, it suggests that you should partition th...

www.codesd.com 2 years ago
Cache

Using a Date as a Hash Table Key

Using a Date as a Hash Table Key advertisements How can I create a hash table object in JavaSript and use a date as the key? So far I've got t...

Github github.com 2 years ago
Cache

Don't hash the key when searching in an empty table. by nnethercote · Pull Reque...

New issue Don't hash the key when searching in an empty table. #305

realpython.com 2 years ago
Cache

Build a Hash Table in Python With TDD

Build a Hash Table in Python With TDD Invented...

gist.github.com 2 years ago
Cache

hash_table.py

Copy link

nullprogram.com 2 years ago
Cache

The quick and practical "MSI" hash table

The quick and practical "MSI" hash table August 08, 2022 I

www.geeksforgeeks.org 1 year ago
Cache

Hash Table vs Trie

What is Hash Table?An array that stores pointers to records corresponding to a given element. An entry in the hash table is NIL if no existing element has a hash function value equal to the index for the entry. In simple terms, we can sa...

Deciding When to Use a Hash Table

Deciding When to Use a Hash Table

Recommend

About Joyk