7

Bert Hubert 🇺🇦 (@bert_hu_bert): "A fun decryption story! In 1914, The Nethe...

 2 years ago
source link: https://nitter.net/bert_hu_bert/status/1539153322321526785
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

"A fun decryption story! In 1914, The Netherlands sent a peace mission to Albania (I did not know this either). The mission commander, Major Lodewijk Thomson, was killed in battle under circumstances that are still unclear. And we'd love to know! https://en.wikipedia.org/wiki/Lodewijk_Thomson"|nitter

A fun decryption story! In 1914, The Netherlands sent a peace mission to Albania (I did not know this either). The mission commander, Major Lodewijk Thomson, was killed in battle under circumstances that are still unclear. And we'd love to know! en.wikipedia.org/wiki/Lodewi…
media%2FFVwbRsSX0AAUaGO.jpg%3Fname%3Dsmall

Jun 21, 2022 · 7:48 AM UTC · Twitter Web App

Recently (2009), an encrypted Albanian telegram from that time was found in Dutch military archives. Could this perhaps shed some light on the situation? Intriguingly, no one had ever been able to decrypt the message.
media%2FFVwb_ucXwAAXoMy.jpg%3Fname%3Dsmall
Dutch researcher Florentijn van Kampen, affiliated with Radboud University's iHub, decided to give it a try using modern cryptographic techniques. I mean, 1914 encryption, how hard could it be?! ecp.ep.liu.se/index.php/hist…
media%2FFVwcxmzWIAEN8-B.png%3Fname%3Dsmall
First of all, some basic facts: The telegram consists of 736 numbers between 19 and 119. There are 49 different numbers. The frequency per number ranges from 1 to 74. Since there appear to be a maximum of 100 different numbers, we can put this on a 10*10 matrix:
media%2FFVwd19eXwAMSPWF.png%3Fname%3Dsmall
Here, the article makes an assumption: the very rare numbers are likely mistakes in transcription. It is a wartime hand-written telegram, it is entirely possible that errors have crept in. So, do these numbers look like they contain information? Enter the Index of Coincidence:
media%2FFVwe2eFWUAAjjCA.png%3Fname%3Dsmall
When the IC is calculated for randomly distributed characters, the result is 1. This is what you'd get looking at modern encryption. If the number is >1, there is an uneven distribution of letters, and you might have something to work on. The Albanian telegram has an IC of 2.18!
media%2FFVwgm8lWQAAeUgt.png%3Fname%3Dsmall
In a monoalphabetic substitution cipher, every letter is replaced by another letter. In theory these are among the easiest ciphers to decrypt. But, this being real life, things turn out to be way more interesting. What language is the telegram even written in? And what alphabet?
media%2FFVwhdWDWQAA2aiO.png%3Fname%3Dsmall
"How hard could this be?!" Well, it turns out the Albanian alphabet is ready to confuse you. It consists of 36 letters! "a b c ç d dh e ë f g gj h i j k l ll m n nj o p q r rr s sh t th u v x xh y z zh".
media%2FFVwiRLcWUAApz6j.jpg%3Fname%3Dsmall
To attack substitution cipher, we need corpus of Albanian text. It turns out that one was not readily available so Florentijn found a bunch of Albanian books on archive.org and used Google Drive's OCR abilities to convert these into text:
media%2FFVwi2idXwAQJvl7.png%3Fname%3Dsmall
In 2009 an attempt had been made to decrypt the telegram, and this had failed. Because of this, Florentijn decided to handcraft a tool chain that would offer full control over the decryption process. This involved tetragram frequencies, hill climbing and simulated annealing:
media%2FFVwj4nrWYAAKBZY.png%3Fname%3Dsmall
Despite this impressive algorithmic assault, nothing came out! One thing the software did discover however was that two of the codes definitely mapped to spaces, which is quite rare for this kind of encryption. And this proved to be helpful! Remember the low-frequency characters?
Since the code for spaces are now known, it is possible to chop up the telegram into words. Any word containing a super rare character is assumed to have been an error in transcription or encryption. Once these words are removed, the software came up with this:
media%2FFVwlympWAAElJwN.png%3Fname%3Dsmall
Many of the characters in the key are placed in the order of the Albanian alphabet. Because the software used by Florentijn did not know about this order, the fact that it uncovered a key that has this structure is a strong indication something real was found!
media%2FFVwmb-5XEAExCcq.png%3Fname%3Dsmall
With some manual work, the most probable key was recovered. However, when this was used, the resulting decrypted text still did not look like normal Albanian. At this point, a native Albanian speaker was found. Turns out, the telegram is written in an extinct Albanian dialect!
media%2FFVwmuF3WIAAPLD0.png%3Fname%3Dsmall
In addition, it turns out that the writer of the text was very sloppy, often substituting e for ë and n for nj. In addition, the morse code operator also frequently messed up numbers in a predictable manner (off by one errors):
media%2FFVwoFn3XEAEcyRn.png%3Fname%3Dsmall
Once these things had been taken into account, and making good use of native Albanian expertise, the plaintext could be reconstructed and translated:
media%2FFVwoZ32XsAgEJSJ.png%3Fname%3Dsmall
media%2FFVwowEOWQAAzUx6.png%3Fname%3Dsmall
media%2FFVwo1cVXsAA7ACN.png%3Fname%3Dsmall
This fun article illustrated a few interesting points. Decryption is not just about algorithms and computers. Like any other kind of reverse engineering, it only works if sufficient context is known & the right expertise is consulted.
For the complete story & many historical details, do head to the actual paper "Thomson’s Telegram, Decrypting a Secret Message from Albania, 1914" -> ecp.ep.liu.se/index.php/hist…
media%2FFVwqfhWWQAE1nsG.jpg%3Fname%3Dsmall

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK