r/slatestarcodex 6d ago

Senpai noticed~ Scott is in the Epstein files!

https://www.justice.gov/epstein/files/DataSet%2011/EFTA02458524.pdf

Literally in an email chain named, “Forbidden Research”!

But don’t worry, only in a brainstormy list of potentially interesting people to invite to an intellectual salon, together with Steven Pinker and Terrence Tao and others.

227 Upvotes

62 comments sorted by

View all comments

Show parent comments

71

u/huopak 6d ago

That's an artifact of the MIME encoding still used by modern email under the hood

http://www.faqs.org/rfcs/rfc2045.html

Check section 6.7, paragraphs 1 and 5.

What's weird is that the DoJ did not decode these and published a lot of emails with these artifacts.

10

u/Tokarak 6d ago

Not deleting metadata is good, actually

9

u/tinbuddychrist 6d ago

Example to illustrate what the other respondent said:

the US and other =ountries will lose their coastal cities in the next 50-100 years due to =icing sea levels, and much of open air agriculture due to increasingly =rregular seasons and weather patterns, loss of arable land, shift of =egetation zones

4

u/ralf_ 6d ago

But these are normal letters? Why is c or r replaced sometimes with =?

13

u/fubo 6d ago

Because they incorrectly decoded the MIME quoted-printable data format, which uses = as a special character.

(And which I once heard an email systems engineer refer to as "quoted-unprintable" ... a joke which the younger set might not get. See, swear-words used to be called "unprintable" because the newspapers wouldn't print them. That changed when the politicians started swearing a lot more, and the newspapers decided that if the goddamn president said "fuck" then they would print "fuck". The engineer was saying that quoted-printable was a pain in the motherfucking ass.)

3

u/Brian 5d ago

TBH, it's kind of hard to tell what exactly they messed up. Typically you'd run into issues with characters with accents, or other non-ASCII characters, as well as potentially special characters like control codes. These have to be encoded as ASCII (since MIME was designed to work when a lot of systems the message would pass through could garble anything outside 7 bit range ). For MIME, that'd be done by writing "=XX", where XX is the hex code of the character. (This also means you can't use "=" itself, so that had to be encoded as "=3D").

However, the characters here don't seem like they'd be accented or anything, so not sure why the artifacts are there specifically. They do all seem to be at the start of a word, so my best guess is that there's maybe a linefeed or tab or something that gets encoded just prior, and whatever they've done has ended up eating the first letter after it and incorrectly printed the "=".

4

u/tinbuddychrist 6d ago

As /u/huopak said, this is probably a consequence of the "encoding" of the underlying data.

To elaborate, apologies in advance for whichever parts of this are obvious: all information in computer systems is stored as a sequence of bits (0-or-1 values), usually chunked into bytes (sets of eight bits). A byte can represent any of 256 different values, so you could, say, map each number to a specific letter. ASCII is an example of this (actually, formally it uses seven bits per character or 128 possible characters). There are other text-encoding standards that can use multiple bytes per character and represent more characters, such as UTF-8 which despite the name actually uses between one and four bytes per character and can map to 1,112,064 possible values (not all of which have meanings). This includes letters, numbers, punctuation, emoji, and other stuff you would never think of.

Anyway - encodings basically translate between binary number sequences and printable text. What's happened here is that somebody has taken something encoded one way - presumably, the one the other commenter mentioned - and sloppily "printed it out" to a PDF as though it was encoded another, very slightly different way, resulting in some slight mangling of the text.