7

Deciding When to Use a Hash Table

 2 years ago
source link: https://www.codesd.com/item/deciding-when-to-use-a-hash-table.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Deciding When to Use a Hash Table

advertisements

I was soving a competitive programming problem with the following requirements:

I had to maintain a list of unqiue 2d points (x,y), the number of unique points would be less than 500.

My idea was to store them in a hash table (C++ unordered set to be specific) and each time a node turned up i would lookup the table and if the node is not already there i would insert it.

I also know for a fact that i wouldn't be doing more than 500 lookups. So i saw some solutions simply searching through an array (unsorted) and checking if the node was already there before inserting.

My question is is there any reasonable way to guess when should i use a hash table over a manual search over keys without having to benchmark them?


My question is is there any reasonable way to guess when should i use a hash table over a manual search over keys without having to benchmark them?

I am guessing you are familiar with basic algorithmics & time complexity and C++ standard containers and know that with luck hash table access is O(1)

If the hash table code (or some balanced tree code, e.g. using std::map - assuming there is an easy order on keys) is more readable, I would prefer it for that readability reason alone.

Otherwise, you might make some guess taking into account the approximate timing for various operations on a PC. BTW, the entire http:///norvig.com/21-days.html page is worth reading.

Basically, memory accesses are much more slow than everything else in the CPU. The CPU cache is extremely important. A typical memory access with cache fault requiring fetching data from DRAM modules is several hundreds times slower than some elementary arithmetic operation or machine instruction (e.g. adding two integers in registers).

In practice, it does not matter that much, as long as your data is tiny (e.g. less than a thousand elements), since in that case it is likely to sit in L2 cache.

Searching (linearly) in an array is really fast (since very cache friendly), up to several thousand of (small) elements.

IIRC, Herb Sutter mentions in some video that even inserting an element inside a vector is practically -but unintuitively- faster (taking into account the time needed to move slices) than inserting it into some balanced tree (or perhaps some other container, e.g. an hash table), up to a container size of several thousand small elements. This is on typical tablet, desktop or server microprocessor with a multimegabyte cache. YMMV.

If you really care that much, you cannot avoid benchmarking.

Notice that 500 pairs of integers is probably fitting into the L1 cache!


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK