HashWick V8 Vulnerability

22 August 2018
security

Hash Wick

About one year ago, I've discovered a way to do a Denial-of-Service (DoS) attack on a local Node.js instance. The process involved sending huge amounts of data to the HTTP server running on the same machine as the attacker, and measuring the timing differences between various payloads. Given that the scope of attack was limited to the same machine, it was decided by V8 team and myself that the issue wasn't worth looking in yet. Nevertheless, a blog post was published.

This year, I had a chance to revisit the Hash Seed guessing game with restored enthusiasm and new ideas. The results of this experiment are murky, and no fix is available yet in V8. Thus all V8 release lines are vulnerable to the HashWick attack.

(Disclaimer: the issue was disclosed responsibly. This blog post is published after more than 90 days since the initial report)

What is a Hash Seed? #

The Hash Seed is a random number that is used as an initial value for the (non-cryptographic) hash functions inside of V8 instances. Such numbers are used not only in V8 or other VMs, they're used in kernels, databases, and many different kinds of software.

The reason to have seeded hash functions is to prevent collision attacks on hash maps (e.g. JavaScript objects/dictionaries). In ideal scenario, use of random seed should make guessing the hash value of a string/number an impossible endeavor.

Whenever a Node.js instance parses HTTP headers or JSON object, V8 has to create a hash map for it. Each hash map is backed by a list (storage) of length 2 * N. The keys are inserted at even positions, the values are inserted at odd (right after the key). The position index is determined by the hash of the key modulo the storage size. Two equal keys will have two equal hashes, and will point to the same cell in the list.

In ideal scenario the indices for two different keys are always different. It is easy to see that it isn't possible due to the limited list size. The more key/value pairs we insert into the object, the more likely the "collision" to happen. When the hash values are the same, but the keys are different, V8 has to place the key in the next cell... or in the cell after the next, if the next is filled already.

All in all, this provides quite optimal insertion/lookup performance at relatively low memory costs.

What if attacker knows Hash Seed? #

Every publicly exposed system is subject for external attacks. In the worst case, an attacker gains access to the system or data in it. In less worse cases the attacker might be able to overload the resources of the system by repeatedly hitting the "slow" paths in the system's code. Such attacks are called "Denial of Service" attacks or simply DoS.

For the hash maps in V8, the hash collisions are exactly the slow path. Being able to generate a lot of (or precompute) keys with similar hash values (for various hash map storage capacities) without using many resources is a guarantee of success for such an attack. The worst case scenario is that a small JSON object (or HTTP headers) could take seconds to parse. Sending thousands of such objects would render any server unresponsive.

What stands in a way of generating the collisions is a random Hash Seed number. Given sufficient size of that number and a strong hash function, the indices of the different keys would look absolutely random. Guessing the Hash Seed becomes impractical both time-wise and information-wise.

Unfortunately for V8, the hash function isn't strong enough and the hash seed is a small number.

How to find V8's Hash Seed? #

Imagine the following scenario: a Node.js HTTP server is available publicly and accepts JSON bodies for POST requests. It'd be reasonable to say that many public Node.js servers are of this kind. It's important to note that this is also applicable if Node.js is not directly exposed to a public internet port, such as the case of being proxy-passed via Nginx.

How can an attacker find the value of V8's Hash Seed in such case?

This was a subject of my research this and last year. Assuming absence of access to the server itself, the only way seems to be through the timing differences between processing different crafted HTTP requests.

Hitting the slow path in hash map key insertion that we discussed above should give a slightly slower response time. Knowing a lot of key combinations that cause the slowdown (and as many combinations that doesn't cause one) is enough for reconstructing the seed value. The problem is, the difference between slow and fast path is in the order of microseconds, while the network latency is often at least a couple of milliseconds. The network jitter overwrites any minor timing differences without leaving us a chance to measure them.

What is needed for successful attack is an "amplification". The request has to be crafted in such way that the key insertion is triggered many times in a sequence for the same key and same hash map contents. Each insertion would take just a couple of microseconds, but a thousand of them would add up to a millisecond!

Without giving away the complete tools for crafting such an attack, I can say that the JSON body might look like this:

{"100000":0,"101":1,"101":1,...repeat many times...}

It exploits several implementation details in V8 and a few quirks of JavaScript. In particular, a JavaScript number lookup obj[100000] is equivalent to looking up the "stringified" value of the same number obj["100000"].

The next quirk is that JSON objects might contain duplicate keys. Each key will be parsed and looked up in the object, regardless of how many times it is repeated in the JSON. Each lookup triggers either slow or fast path in the insertion code.

The object above essentially becomes a sparse Array backed by a hash map with very shallow storage capacity (just 4 values). Here's how V8 inserts key into it:

Initially storage is [ null, null, null, null ]
The index of the first key (100000) is computed using the hash function and the hash seed: index = hash(100000, seed) % 4
The value 0 is inserted into that slot of storage (e.g. index = 1): [ null, 0, null, null ]
The index of the second key is computed index_2 = hash(random_key, seed) % 4
while (storage[index_2] != null) index_2++
The value is stored in index_2 slot

Step 5 is crucial for my attack. The extra check takes extra time, and this time difference could be measured to get information about the hash seed.

Thus fixing the first key to 100000, and trying out different keys (probes) will give us the list of timings per each random key. Quarter of the probes will have the same index as 100000 and will hit the slow path due to collision. The rest will go through the fast path. The processing script would later go through the list of timings and separate it by 75% percentile of latency. Most probes would be correctly classified as fast/slow. Misclassified probes would increase Hash Seed search time.

The result of processing is a list of probe keys, each with an assigned class: "same index as 100000" or "different index". Each entry of this list is a constraint on the Hash Seed value. An OpenCL script goes through all possible 32-bit Hash Seed values (0 - 0x3fffffff), and finds the one that satisfies the majority of the constraints. That value is the most likely Hash Seed of the attacked V8 instance. The OpenCL computation could take place on either CPU or GPU, and completes the search in about one minute on my MacBook Pro.

My unoptimized PoC sends about 2 gb of JSON data to the server, collects the timing information, and computes the Hash Seed in a couple of minutes (this includes OpenCL brute-force time).

Prevention #

This sounds dreary, but the fixes can be made. First of all, the hash seed size has to be increased from 32 bits to 64 bits (this is already done in V8). Next, the hash function has to be changed to SipHash or other hash function with PRF (Pseudo-Random Function) properties.

Google has an amazing team working on V8. I'm really hopeful that the remaining fixes will be completed soon.

Prior-Art #

The hash collision attacks have a long history, and most of the software projects has moved to SipHash and at least 64-bit Hash Seed since then.

Credits #

Huge thanks to Aaron Heckmann, Greg Wilburn, Guillermo Rauch, and Shaun Warman for providing feedback on this article.

Previous: HyperBloom
Next: Structure of FTS5 Index in SQLite