I recently had a case involving Discord where the case investigator had observed images within the thread on an iPhone but they were not appearing in the threads in Cellebrite Physical Analyzer. The investigator described the images to me and I was able to locate them in a folder associated with Discord so I figured there had to be a way to make the connection. My research started by noting that the images existed in a folder named com.hackemist.SDImageCache within the Discord application container. If you research this folder name you will find that its associated with a library called SDWebImage. This library’s main function is to provide a caching mechanism for images. I’m going to explain how this library works, and once we know that we can then figure out how to connect the images to their threads. In theory the concepts shown here will apply to any application that uses SDWebImage.
As of the day of writing, Cellebrite Physical Analyzer will show the Discord text threads with attachments indicated by a default placeholder icon and a URL that should lead to the attachment in the cloud. You will notice in these URLs there is an “is” and “ex” parameter which likely stand for “issued” and “expiration”. These are timestamps in a hexadecimal representation that can be decoded and interpreted as Unix Seconds. They likely indicate the day the link was generated, and the day the link will no longer function. I’ve found in my test cases that this time period is exactly 2 weeks down to the second. It’s currently unclear if the “issued” time is when the message was initially sent, or when it was first requested (downloaded) by the device you are examining. Further testing needs to be done. So, if you’ve examined this device within the 2 week time period indicated by this link you will be able to browse to the URL and see the attachment in a web browser, but its also likely you will find this same image in your devices filesystem.
What is SDWebImage
SDWebImage is a library that allows developers to add in a mechanism for downloading, displaying a placeholder image while the download is in progress and also caching the images so they don’t need to be re-downloaded every time the user views the content. Developers pass a URL (similar to the one PA is displaying in the text thread) to the SDWebImage function that they want to display. SDWebImage first calculates an MD5 hash of the URL, adds on the extension of the file type thats indicated in the url, and then checks it’s cache to see if a file exists with this name. If it doesn’t find a file with this name it downloads it from the server, and places a copy of it in the cache folder using the MD5 hash as its name (plus original extension) and displays this image to the user. By default this cache folder is within the application’s container in the folder below.
I created a sample app in XCode to demonstrate how this works. I chose a random image file from the internet to use as a sample and pasted the URL into the WebImage() call for the SDWebImage library exactly as I found it. See the code below and note the _ga= (Google Analytics) parameter I intentionally left in the URL. Note that in this case I used the SwiftUI version of SDWebImage.
I ran the application twice. The first time the image had not been previously downloaded to my device so you will notice the lag between when the text first appears and then the image appears.
On the second run below you will notice the image loads much faster. This is because the device does not need to re-download the image from the internet and is just displaying a cached version.
After running the app above, I checked the cache path and ended up with an image named: 74f275dd8c44183618195800ca246bff.jpg as shown below.
If you take the full URL: https://dl.fujifilm-x.com/global/products/cameras/gfx100s/sample-images/gfx100s_sample_02_eibw.jpg?_ga=2.74327122.1746275040.1698493688-419314666.1698283461 and calculate its MD5 hash you will end up with 74f275dd8c44183618195800ca246bff which matches our filename. Please do not confuse this MD5 value with the actual MD5 value of the file itself. This is only the MD5 value of the URL not of the image file itself.
So this means that for any application (probably not limited to Discord) that uses SDWebImage we should be able to match up the cached file to the URL that produced it. As long as we have the URL somewhere we should be able to make this connection. However I found that in Discord its not this simple. Let’s demonstrate one more thing. Below, I’ve altered the URL in my code to remove the _ga parameter.
When I run the code, I end up with the same result on my device except this time it generated a new file in the cache.
This new name da0acd336e1ad6256f4cd6e7f17c1221 is derived from the new URL without the Google Analytics parameter on the end. In this scenario both images are identical (visually and binary).
Connecting SDWebImage Caches to Discord Threads
Now let’s apply this concept to the Discord attachments. You will find that if you take the URL found in Physical Analyzer for the attachment this will never produce an MD5 hash that matches any images in the Discord cache folder. This is because there are actually 2 URLs listed for each attachment and we need to look into the source files to see this. See example below (Note the proxy_url, width and height values we will be talking more about below).
This file can be found at the following path: private/var/mobile/Containers/Data/Application/[GUID]/Library/Caches/com.hammerandchisel.discord/fsCachedData/[FILE_GUID]
When we look into the above source file you will find a “url” and “proxy_url” value. PA currently displays the “url” but I’ve found in my testing that that the “proxy_url” is what Discord seems to actually use to load the images. Focusing on the “proxy_url” you will find that it begins with https://media.discordapp.net and ends with an ‘&’. If you take these “proxy_url” entries and calculate the MD5 hash of them, on occasion you may find a match but its not very common. During my testing I was about to give up because I could not get any of the URLs to produce an MD5 that matched. Additionally at this stage of my research I still wasn’t sure if I should be using the “url” or “proxy_url”. I finally found one single URL that generated a hash that matched one of my images. After this I just figured I was doing something wrong all the other times and repeated the hashing a little more carefully, but I still could not produce any more that matched except for this one. This got me thinking that I definitely had the right concept but that Discord had to be altering the URL in some way. After all, I knew these images were the ones that belonged in the thread so there had to be a way to connect them.
It wasn’t until I decided to run a test scenario on one of my devices that I ended up figuring out the solution. In my test device I was able to find an artifact in a file called ‘breadcrumbs.1.state’ file that assisted me in determining how this URL was being altered before being passed to the SDWebImage library. See example below.
Full path of this breadcrumbs.1.state file is: private/var/mobile/Containers/Data/Application/[DISCORD_GUID]/Library/Caches/io.sentry/[SOME_HASH_VALUE]/breadcrumbs.1.state
In that file I found that Discord was adding on a width and height parameter to the end of the url. The ending part of the URL ended up looking something like this which for anyone familiar with query strings seems a little abnormal with what appears to be an extra = and &:
For that one example I added this to the end of the “proxy_url” and it worked so I knew I was on the right track at this point. I initially tried just setting the width and height to the values found in the source files but that didn’t work. When I looked closer I realized in my test case that the width and height found in the breadcrumbs did not actually match the width and height found in the source file but was some reduced size like a thumbnail possibly. Now I just needed to know how could I predict what width and height to use in the URL It seemed somewhat arbitrary but it had to be for a reason. Clearly it was intending to save bandwidth by requesting a version of the image that was smaller than the full size. What I found out after a bunch of testing was that Discord was coming up with a maximum size image that it would request and it is ultimately based on the devices resolution. I tested the following steps while holding the device in a portrait orientation. Most display resolutions define the larger dimension first as if you are holding it in landscape mode so I will be transposing the device resolution to match how the images are defined using width x height. My test device was an iPhone SE (2020) which has a resolution of 1334 × 750 pixels. So again lets transpose and define this as 750 wide by 1334 high. What I found is that Discord establishes the maximum width at the exact width of the display’s resolution so in my case it was 750px and the maximum height as half of the displays height, so in my case this was 667px (1334 / 2).
Discord first checks if the image is wider than it is taller. If it’s wider, it reduces the width to the maximum width, and then retains the aspect ratio and reduces the height accordingly, however if the image is a square or the height is larger, it first reduces the height to the maximum height and then calculates the new width keeping the aspect ratio. If the image is smaller than the maximum width and height, Discord does not adjust these values. So knowing this algorithm and knowing the URL format we can reproduce the full URL that discord is using to fetch the image from their servers. With this full URL we can now calculate the resulting hash, and ultimately locate the filename in our devices cache. There’s one last caveat that I didn’t mention. For an image that is already smaller than the maximum width and height, Discord does not pass a width and height, however for some reason it still adds the ‘=’ to the end of the url.
Original URL found in source file: https://.................../image.jpg?ex=3123123&is=1231435&hm=abcdef1234567890fedcba&
URL for image not needing resize: https://.................../image.jpg?ex=3123123&is=1231435&hm=abcdef1234567890fedcba&=
URL for image needing resize: https://.................../image.jpg?ex=3123123&is=1231435&hm=abcdef1234567890fedcba&=&width=750&height=667
So this means that in order to properly decide which width and height to use to construct the URL we need to know the devices resolution at decoding time. I have added this decoding capability to ILEAPP by creating a list of all iPhone and iPad devices and their corresponding resolutions. First, I detect the device model by looking at the activation_record.plist file which is the same file that Physical Analyzer retrieves the Model Identifier on their Extraction Summary page. Then I lookup the resolution for this model, decide whether we need to reduce the image size based on the maximum width and height for this model phone, construct the URL accordingly and generate the MD5 hash of the URL. I also need to extract the original extension used in the URL and add that to end the of the MD5 hash. Hopefully the file is found in the device’s Discord Cache folder, if not I just display the URL of the image.
While this blog primarilly focused on Discord, we should now know that anytime we see an image named by what appears to be an MD5 hash, we should look to see if that app might use the SDWebImage library. Another hint an app uses this library is by finding the default folder name of ‘com.hackemist.SDImageCache’.
Notes & Update
The above article is based on Discord versions 197.0 and 201.0 which a the time of research there appeared to be no difference between how this worked on either of the above versions. On 10/31/2023 a user was testing the original ILEAPP script and during that testing we found that the script was seeing a slight variation to the URL where there was no “ex”, “is” or “hm” parameters. That was also found in version 197.0, so it seems that the version may not have anything to do with the parameter’s presence at this time and as of right now its unknown why they are present in some URLs but not others. For URL’s without paramters the only variation is that we need to add a “?” character before adding the width and height. Also, in this case we don’t end up with the extra “&=” like we did above. The ILEAPP script has been updated to work with the variations of the URL’s
URL format for urls that do not contain the "ex", "is" and "hm" parameters: https://.................../image.jpg?width=750&height=667
One thing I didn’t really directly mention in the article is why this is so complicated although you probably realize that by now if you’ve read to this point. During Alexis and Heathers podcast Digital Forensics Now when they featured this plugin and article, I picked up on Alexis saying people might say “well those are just pictures” which I agree many might take for granted. This process adds a bit of a puzzle that needs to be solved first. In most chat applications the attachments are very easy to connect to their thread like a direct file path stored in the database. In this case and every case that uses SDWebImage, the filenames are obscured because they are an MD5 hash of the URL. So unless you figure out the exact URL that was used to load the image, you will not know how to connect the attachment to the thread. As most people in the DFIR world know this process is very similar to how passwords are stored in a database. It is a security flaw to store someone’s plaintext password in a database so a hash value allows a fingerprint of the password to be stored in the database instead. The only way to regenerate that fingerprint and confirm the user entered the correct password is by knowing the password and re-generating the hash. Many modern password algorithms add additional layers to this either by hashing multiple times (iterations), or by adding some salt (a little bit of extra data). So to compare this process to password storage, the URL is like the “password” and the parameters (including the extra “&” and “=” characters) are like the “salt”. In the beginning we had the password, but not the salt. Then after the first round of testing I realized the salt may be non existent or slightly different in some cases. In future versions, if Discord changes the URL or parameters just slightly we will may need to do additional research. After reading this article you should now realize what might go into testifying to this kind of evidence. If you have an important case you may want to run through this scenario manually for your important images so that you can explain the process and how the tool is making this connection.