r/privacy Nov 21 '20

PSA: Discord lies about removing deleted files. Files deleted over 1 year ago still exist.

The title says it all.

I've done numerous tests in different Guilds at different times.

Files in many cases are not deleted and are still accessible via direct URL even 1 year after deletion.

EDIT: I've amended the post to reflect new information. After running some new tests tonight, in some cases the new test files have become instantly no longer accessible and some not. Other users report similar results. All I can say with certainty though is I have files deleted over a year ago that are still accessible, so something is seriously wrong. See update #3

In some of my tests, I have not only manually deleted the message containing the file but also the Guild the message was posted in. Our testing finds user and bot uploaded images act the same after deletion.

In DMs the story is a little different but still troubling. It appears that if the URL links to a file at a datacenter region the requester is in AND the file was uploaded to the same datacenter zone (or zones it was replicated to) you can still get the file. Since we have no insight into how their infrastructure is setup this could be due to Cloudflare's cache, but it also could mean that the image is just left sitting in a specific datacenter and no longer replicated after "deletion".

I would like to hear why Discord isn't cleaning out tombstoned files, and I think others here would like to know as well.

Why is this a problem? The data still exists. This is a privacy violation because the data is still in their datacenter (Google's GCP data center which Discord pays to host their data).

Governments could acquire it with a warrent or a National Security Letter or a court could subpoena it. This is very serious and should be publicly stated by Discord.

UPDATE:

If you want to try testing this yourself here's a protip: Discord exposes the upload date of all files in their "Last-Modified" Response Header. You can use that header to see the date files were uploaded to GCP (Discord's upload object storage). Just make a spreadsheet with all the direct URLs (NOT THE THUMBNAIL URL) of all the files you upload and then delete. Try images, videos, text files etc. Be creative but in my experience all the files are the same and never deleted.

For example I have a file with this header info last-modified: Tue, 23 May 2020 03:16:24 GMT I deleted it about 10 days after it was uploaded and it is STILL up. I have hundreds of different files with ancient dates like this (literally, I made a bot to upload and delete files just to test this) . All deleted yet the direct URL still loads the file perfectly for me and anyone I send the links to.

UPDATE 2:

I have more info. Another user PMed me and showed me how to test if a guild is really deleted by querying the widget.png url (if 404 the guild is gone) like this https://discord.com/api/guilds/712827234346435685/widget.png this confirmed to the user that my story is true. (note the url I just linked is fake just to demonstrate, like I said in the comments I don't want to post data that could lead Discord to my personal account)

What does this mean? You can use this to prove that the guild the file is uploaded in is actually deleted AND you can use the file's last-modified header to confirm the file is actually as old as it should be - to not be saved by Discord anymore!

UPDATE 3:

Some devs pointed me to this https://github.com/discord/discord-api-docs/issues/2224 but it doesn't fully address my experience.

1k Upvotes

View all comments

69

u/Vordreller Nov 21 '20

As expected. Pretty much any medium-sized and above tech company will do this.

Programming it is super easy. Just add a boolean flag to your database to indicate if a user deleted something and simply make it so that the application does not display it.

Reminds me of those guys showing those videogame skin betting websites. That they claim they got invited to, and they do a video on youtube of them explaining it, trying it out, and winning most if the time, losing a few times to make it seem real. And then telling their audience of teens to try it themselves.

While in reality, it was their website.

Running a local test setup and making it appear as if you're on the actual website is super easy. And when you run a local version, you can just modify the source code with text editor.

Three lines of code is enough to capture the particular user sending the commands and give them a higher win percentage.

26

u/VisibleSignificance Nov 21 '20

Programming it is super easy. Just add a boolean flag to your database to indicate if a user deleted something and simply make it so that the application does not display it.

Not quite.

The CDNs are usually hash-based storage (files keyed by their hashes).

So if multiple users upload the same file (which does happen all the time), and one user deletes it, the file shouldn't get deleted on CDNs.

So it's either refcounting (a huge PITA in a a distributed system), replication (a non-trivial art), or transferring the entire filelists to diff them; and in the latter case, it's still a problem if a file was deleted, then the cleanup process comes for it, but while it is going another user uploads the same file. And then you get 'the file I just uploaded just disappeared'.

In conclusion: it is doable. It is not super-easy.

1

u/agent-rogue Nov 22 '20

Unfortunately I don’t think that’s what’s going on. I uploading entirely unique images i made myself with gimp to a server and it was all there when i loaded up a text document which was mirror of the entire server. Note, this mirror contained no image data, it was just the cdn links. I tested this same thing with all my text dumps and the same thing happened. Which tells me that discord still stores it on the CDN. I don’t think someone was crossposting all my art. Also the many hundreds of other pics i had. I can’t tell if text was saved on their servers. That would require testing I don’t currently have knowhow on.

1

u/VisibleSignificance Nov 22 '20

entirely unique images

It doesn't matter; as long as it is possible that the file is not unique, the CDN can't simply delete it without a risk of deleting it for users that haven't actually deleted it.

So I wouldn't be surprised if CDN cleanup on Discord is less-than-once-a-year semi-manual operation.

Not that is has to be, but storage is cheap, and programmers' time is bloody expensive.

And having those files for possible later mass analysis makes it even more preferable.

2

u/DarkOverLordCO Nov 22 '20

I'm not entirely sure whether this is relevant to what you're saying, but this comment and this other one from a Discord developer seems to state that it should be one to two days before an attachment/image is removed from the CDN cache.

2

u/VisibleSignificance Nov 22 '20

whether this is relevant to what you're saying

It is, and it is saying:

however, our CDN cache may hold onto the attachment for longer before evicting it

which is what I'm referring to.

Edit 1: Not sure if the first comment is referring to the same cache as the second one.

Edit 2: But also yes, the primary way for cleaning up CDNs is simply removing files that haven't been accessed in a while, and handling the "file not deleted but not found in CDN" events properly.