0

The Curious Jumbled Text Phenomenon in Python

 2 years ago
source link: https://www.codedrome.com/the-curious-jumbled-text-phenomenon-in-python/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
jumbled_text_python.png

On your travels round the internet you may have seen pieces of text with all but the first and last letters of each word jumbled up but still easily readable. If you are tempted to try jumbling up words yourself then read on - in this article I'll implement a simple program in Python to do just that.

The Jumbled Text Phenomenon

A bit of Googling for phrases like "jumbled text" will bring up loads of examples, explanations and creation myths surrounding paragraphs such as:

“Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.”

For example, this is just one article from the British newspaper The Independent.

The reason why this 'difficult' puzzle is so easy to read

Of course it's pretty easy to write a program to carry out such jumblings so let's do so . . .

The Project

This project consists of the following files:

  • textjumbler.py
  • jumbledtextdemo.py

The files can be downloaded as a zip, or you can clone/download the Github repository if you prefer.

Source Code Links

ZIP File
GitHub

Source Code

This is the code for textjumbler.py which does the actual jumbling.

textjumbler.py

import random
from datetime import datetime


def jumble(text, amount):

"""
    Jumbles the letters (except first and last) of the words
    in text. Amount is a decimal between 0.0 and 1.0 and
    specifies the amount of jumbling to be carried out.
    """

jumbled_words = []
    word = ""

for index, character in enumerate(text):

if character.isalpha():

word += character

else:

word = _scramble(word, amount)
            jumbled_words.append(word)
            jumbled_words.append(character)
            word = ""

if index == (len(text) - 1):

word = _scramble(word, amount)
            jumbled_words.append(word)

return "".join(jumbled_words)


def _scramble(word, amount):

if len(word) > 3:

random.seed(datetime.now())

i = 0
        l = len(word)
        max = (l * amount) - 1
        letters = list(word)

while i < max:

c1 = random.randint(1, l - 2)
            c2 = random.randint(1, l - 2)

if c1 != c2:

letters[c1] = chr(ord(letters[c1]) ^ ord(letters[c2]))
                letters[c2] = chr(ord(letters[c1]) ^ ord(letters[c2]))
                letters[c1] = chr(ord(letters[c1]) ^ ord(letters[c2]))

i += 1

return "".join(letters)

else:

return word

jumble

This is the public function and takes as arguments the text to jumble and the amount of jumbling to be done. This is a real value between 0.0 and 1.0, and specifies the proportion of the letters to be mixed up.

The jumbled_words list is used to assemble the finished result, and will consists of both the jumbled words and any other characters such as spaces and punctuation which are left as they are.

At the heart of this function is a loop which iterates the characters of the input text. If a character is a letter (tested with the isalpha method) it is added to the word variable. If not then we have come to the end of a word so this is scrambled using a separate private function, _scramble, and the result added to jumbled_words. We also append the non-alphabetic character and reset word to an empty string.

The text will probably end with a full stop or other punctuation mark but if not we need to scramble word and add it to jumbled_words.

Finally jumbled_words is joined into a string and returned.

_scramble

This function scrambles the central letters of individual words. If the word has 3 or fewer letters it is just returned unchanged.

These are the variables involved:

  • i - the loop counter
  • l - the length of the word
  • max - the number of letter swaps to carry out, calculated from the length of the word and the amount argument
  • letters - a list of the letters in the word, necessary so that we can swap them.

Next we use a while loop to carry out the required number of character swaps. The actual characters to swap are chosen at random and, after checking the two random indexes aren't the same, the letters are swapped using three XOR or exclusive or operations.

Using XOR for swapping values is a nifty little trick which eliminates the need for a third temporary variable. Exclusive or or XOR is 1 when exactly one of the two bits being XOR'ed is 1, otherwise it is 0. This is the truth table.

bit 1bit 2bit 1 XOR bit 2
000
110
101
011

So how does this help us swap values? The following snippet makes this clear. Note that in Python (and many other languages) the caret ^ is the XOR operator.

XOR Swapping in Python

a = 45 # 00101101 = 45
b = 99 # 01100011 = 99
a = a ^ b # 01001110 = 78 (irrelevant intermediate value)
b = a ^ b # 00101101 = 45
a = a ^ b # 01100011 = 99

Notice the binary values after each operation, and that after the last two lines the values of a and b are swapped.

Back to our word-jumbling problem, where we need to swap letters, not numbers. Python won't let you use XOR on characters so we have to use the ord function to get the letter's ASCII codes, do an XOR swap, and then use chr to convert back to a letter.

After each swap we increment the loop counter i, and after the loop terminates the letters are joined into a freshly scrambled word. The else deals with the situation where the word has 3 or fewer letters, as mentioned above.

Now let's look at jumbledtextdemo.py in which we try out the above code on actual pieces of text.

jumbledtextdemo.py

import textjumbler


def main():

print("------------------")
    print("| codedrome.com  |")
    print("| Jumbled Text   |")
    print("------------------\n")

sample = "It doesn't matter in what order the letters in a word are, the
    only important thing is that the first and last letter be at the right
    place. The rest can be a total mess and you can still read it without
    problem. This is because the human mind doss not read every letter by
    itself, but the word as a whole."

Ozymandias = 'I met a Traveller from an antique land\nWho said: Two vast
    and trunkless legs of stone\nStand in the desart.  Near them, on the
    sand,\nHalf sunk, a shattered visage lies, whose frown,\nAnd wrinkled lip,
    and sneer of cold command,\nTell that its sculptor well those passions
    read\nWhich yet survive, stamped on these lifeless things,\nThe hand that
    mocked them and the heart that fed:\nAnd on the pedestal these words
    appear:\n"My name is Ozymandias, king of kings:\nLook on my works, ye
    Mighty, and despair!"\nNothing beside remains.  Round the decay\nOf that
    colossal wreck, boundless and bare\nThe lone and level sands stretch far
    away.'

TaleOfTwoCities = "It was the best of times, it was the worst of times, it
    was the age of wisdom, it was the age of foolishness, it was the epoch of
    belief, it was the epoch of incredulity, it was the season of Light, it was
    the season of Darkness, it was the spring of hope, it was the winter of
    despair, we had everything before us, we had nothing before us, we were all
    going direct to Heaven, we were all going direct the other way undefined in short,
    the period was so far like the present period, that some of its noisiest
    authorities insisted on its being received, for good or for evil, in the
    superlative degree of comparison only."

jumbled = textjumbler.jumble(sample, 0.3)

print(jumbled)


if __name__ == "__main__":

main()

The code is dominated by three big chunks of sample text. The second is the poem Ozymandias by Percy Bysshe Shelley, and the third is the opening paragraph of A Tale of Two Cities by Charles Dickens.

After these we simply call textjumbler.jumble and print the result. Here I have used 0.3 for amount; I have found that values of between 0.3 and 0.5 work best. Higher values tend to make the text a bit too hard to read comfortably.

Running the Program

Now we can run the program like this.

Running the Program

python3 jumbledtextdemo.py

This is an example of the output but due to the random nature of the jumbling yours will almost certainly be different.

Program Output

It dseon't mttaer in waht odrer the letetrs in a wrod are, the olny ipmoatrnt tihng is taht the fisrt and lsat leettr be at the rgiht plcae. The rset can be a toatl mses and you can sitll raed it wutohit pobrlem. Tihs is beucase the huamn mnid dsos not raed eevry leettr by iestlf, but the wrod as a whloe.

You can run the program with the other two strings just by changing the variable name in the call to the jumble function. You can of course also use your own text and play around with the amount variable.

For updates on the latest posts please follow CodeDrome on Twitter


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK