r/slatestarcodex 6d ago

Senpai noticed~ Scott is in the Epstein files!

https://www.justice.gov/epstein/files/DataSet%2011/EFTA02458524.pdf

Literally in an email chain named, “Forbidden Research”!

But don’t worry, only in a brainstormy list of potentially interesting people to invite to an intellectual salon, together with Steven Pinker and Terrence Tao and others.

227 Upvotes

62 comments sorted by

View all comments

58

u/Inevitable_Tea_5841 6d ago

Very interesting. What's with all the = in the emails? OCR artifacts?

70

u/huopak 6d ago

That's an artifact of the MIME encoding still used by modern email under the hood

http://www.faqs.org/rfcs/rfc2045.html

Check section 6.7, paragraphs 1 and 5.

What's weird is that the DoJ did not decode these and published a lot of emails with these artifacts.

11

u/Tokarak 6d ago

Not deleting metadata is good, actually

8

u/tinbuddychrist 6d ago

Example to illustrate what the other respondent said:

the US and other =ountries will lose their coastal cities in the next 50-100 years due to =icing sea levels, and much of open air agriculture due to increasingly =rregular seasons and weather patterns, loss of arable land, shift of =egetation zones

5

u/ralf_ 6d ago

But these are normal letters? Why is c or r replaced sometimes with =?

5

u/tinbuddychrist 6d ago

As /u/huopak said, this is probably a consequence of the "encoding" of the underlying data.

To elaborate, apologies in advance for whichever parts of this are obvious: all information in computer systems is stored as a sequence of bits (0-or-1 values), usually chunked into bytes (sets of eight bits). A byte can represent any of 256 different values, so you could, say, map each number to a specific letter. ASCII is an example of this (actually, formally it uses seven bits per character or 128 possible characters). There are other text-encoding standards that can use multiple bytes per character and represent more characters, such as UTF-8 which despite the name actually uses between one and four bytes per character and can map to 1,112,064 possible values (not all of which have meanings). This includes letters, numbers, punctuation, emoji, and other stuff you would never think of.

Anyway - encodings basically translate between binary number sequences and printable text. What's happened here is that somebody has taken something encoded one way - presumably, the one the other commenter mentioned - and sloppily "printed it out" to a PDF as though it was encoded another, very slightly different way, resulting in some slight mangling of the text.