Context Triggered Piecewise Hashes (CTPH) and SSDEEP

It has been a while since I posted anything here, but then life had another plans for me in past three years. Anyways cutting short the crap, today I will discuss about hashes. To be more precise, CONTEXT TRIGGERED PIECEWISE HASHES (CTPH).

This term came into my mind when I was going through the Pyramid of Pain, which happens to be a simple diagram that shows the relationship between the types of indicators you might use to detect an adversary’s activities and how much pain it will cause them when you are able to deny those indicators to them. We can discuss Pyramid of pain some other day. Lets’s talk about hashes first.

We have all used cryptographic hashes to determine the integrity of the files, vastly used during any data forensics investigation. So if a single bit is changed in the input, it will change tha hashed output value drastically. But with the advancement of attacks, it is highly possible to change a bit of an malware to fail in the cryptographic match by a forensic profession even with keeping the functionality of an malware intact.

For overcoming such attacks, we will look into the concept called Context Triggered Piecewise Hashes (CTPH) and how to use an application called ssdeep for threat attribution.

What is CTPH

Unlike any other cryptographic hashes which create a single hash for entire file, CTPH calculates multiple hashes for multiple fixed-size segments of file. It uses a rolling hash.

Few likes from an associated paper:

A rolling hash algorithm produces a pseudo-random value based only on the current context of the input. The rolling hash works by maintaining a state based solely on the last few bytes from the input. Each byte is added to the state as it is processed and removed from the state after a set number of other bytes have been processed.

The current context can be imagined as a moving window across the input. The window length (number of bytes) depends on the implementation of CTPH.

Each recorded value in the CTPH signature depends only on part of the input, and changes to the input will result in only localized changes in the CTPH signature.

Two files similar to each other will have large sequences of identical bits in the same order. The main aim of CTPH is to find similarity between binaries.

If a byte of the input is changed, at most two, and in many cases, only one of the traditional hash values will be changed; the majority of the CTPH signature will remain the same. Because the majority of the signature remains the same, files with modifications can still be associated with the CTPH signatures of known files.

Lets see how it works. I will use ssdeep for the demo.

Installing ssdeep

I am installing it on OsX so using homebrew first.

I will be using sample text files to compare and see how this tool work. I have created 3 text files, named one.txt, onecopy.txt(it has a minor change from one.txt) and different.txt which has completely different content.

First let’s see how the conventional md5 hashing works and give the output.

As we can see, how the each hash is completely different from each other. Now as we see below the ssdeep gives an output which is different and last two sample don’t have much difference.

Next we can compare them in detail rather than observing visually.

ssdeep considered one and onecopy to be 99%similar, that means these two files have more identical sequences. I tried to do the same for images. Below are two same images, one in .jpg and another one in .png format.

I have made couple of changes in the png image by adding white rectangles as below.

Now, lets repeat the commands.

As seen above .jpg image doesn’t match with .png file even being all the same, so it’s not always possible use ssdeep for image hash comparison though it worked averagely with png giving the comparison details. Even for slightest of change the comparison is not so well.

Feel free to drop the questions in the comment section.