17

Guess the Gibberish If You Can

 4 years ago
source link: https://towardsdatascience.com/guess-the-gibberish-if-you-can-29433bc658db?gi=527072c49bcf
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

A simple Algorithm to Create your Gibberish Challenge

jAvi6nn.png!web

Image Source: Self-created

Social media is the powerful platform of the 21st century. It has become the primary source of digital marketing , expressing opinions, making friends, mode of entertainment, and whatnot. Social media apps keep innovating and changing to keep the users engaged by showing the right content understanding their behavior. It’s also common to see new challenges trending from time to time where many celebrities post about it. One such challenge trending at current times is the “ Guess the Gibberish challenge ”.

I am sure you must have taken or at least seen people take this challenge. In this challenge, some gibberish text appears on the screen which seems nonsense but sounds to something meaningful. Let’s see some examples.

jAvi6nn.png!web

Image Source: Self-created

ruyyq2.png!web

Image Source: Self-created

eyuMjeU.gif

Me taking the Gibberish Challenge

I have personally tried it and I can vouch that it can be addicting at times. Being an algorithm enthusiast first thing that came to mind was how can I create one such challenge myself. I did some research around and created a simple version of the game. This simple version can be made complicated with some tweaks around that I will discuss at the end as an open-ended problem. There can be many different ways of creating it. In this blog post, I will discuss one such possible way. All the codes used in this blog post can be found here . So, let’s get started.

Phonetic Algorithms

As per Wikipedia, Phonetics is the science of the sounds of the human voice. Subject matter experts in this field are called Phoneticians. The Linguistic based study of Phonetics is termed as Phonology .

A phonetic algorithm is an algorithm for indexing of words by their pronunciation . These algorithms provide the capability to identify words with a similar pronunciation.

The first question that can come to the reader’s mind is why are we discussing Phonetics and Phonetic algorithms. The answer is that in the problem we are trying to solve, “Guess the Gibberish challenge”, the Gibberish sounds similar to something meaningful that is to be decoded and which is also the final motive. Intuitively, it comes to mind that some Phonetic Algorithm can help with that. There are many good Phonetic Algorithms and one such popular and simple one is the Soundex algorithm .

Soundex Algorithm

Soundex is a phonetic algorithm for indexing names by sound. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling .

Soundex Algorithm encodes English words into a code where the first place consists of a letter followed by ‘k-1’ numeric digits, given that we want ‘k’ bit encoding.

The algorithm is very well explained in Wikipedia . Taking parts from it with slight simplifications, this is how it looks like:

Step 1:Retain the first letter of the word and drop all other occurrences of a, e, i, o, u, y, h, w.

Step 2:Replace consonants with digits as follows (after the first letter):

★ b, f, p, v → 1

★ c, g, j, k, q, s, x, z → 2

★ d, t → 3

★ l → 4

★ m, n → 5

★ r → 6

zmIfQrA.png!web

Image Source ~ Wikipedia

The logic behind Step 2:

Consonants at a similar place of articulation share the same digit so, for example, the labial consonants B, F, P, and V are each encoded as the number 1.

Step 3:If two or more letters with the same number are adjacent in the original word, only retain the first letter; also two letters with the same number separated by ‘h’ or ‘w’ are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter.

Step 4:If final encoding is less than k-bits, fill the remaining ones with 0’s. If there are more than k-bits, retain only the first k.

Using this algorithm, both ‘Robert’ and ‘Rupert’ return the same string ‘R163’ while ‘Rubin’ yields ‘R150’. ‘Ashcraft’ and ‘Ashcroft’ both yield ‘A261’. ‘Tymczak’ yields ‘T522’ not ‘T520’ (the chars ‘z’ and ‘k’ in the name are coded as 2 twice since a vowel lies in between them). ‘Pfister’ yields ‘P236’ not ‘P123’ (the first two letters have the same number and are coded once as ‘P’), and “Honeyman” yields ‘H555’.

In python, the fuzzy package provides a good implementation of Soundex and other phonetic algorithms.

A Slight Modification To Soundex

We will use the Soundex algorithm to generate the encoding with a slight variation in Step 1.

Instead of deleting all occurrences of a, e, i, o, u, y, h, w, we will further cluster/number them.

★ e, i, y → 7

★ o,u → 8

★ a,h,w → Ignore them

Reasoning: e, i, and y seem similar for eg. ‘pic’, ‘pec’, ‘pyc’ sound similar.

I can’t use the fuzzy package as I wish to perform these suggested modifications in the original Soundex algorithm. I found this great implementation by Carlos Delgado . It’s not completely correct but good enough for our use case.

The modified Soundex Algorithm:

def get_soundex_modified(name):
    # Get the soundex code for the string
    name = name.upper()soundex = ""
    soundex += name[0]# numbering of letters based on phonetic similarity
    dictionary = {"BFPV": "1", "CGJKQSXZ":"2", "DT":"3", "L":"4",   "MN":"5", "R":"6","EIY":"7","OU":"8","AHW":"."}for char in name[1:]:
        for key in dictionary.keys():
            if char in key:
                code = dictionary[key]
                if code != soundex[-1]:
                    soundex += codesoundex = soundex.replace(".", "")
     
    # We prefer a 8-bit output
    soundex = soundex[:8].ljust(8, "0")return soundex

Guess The Gibberish Challenge Algorithm

With the slight modification to the Soundex algorithm, we are in a position to complete the “Guess the Gibberish Challenge” algorithm. The steps will be

Step 1:Given the sentence, take one word at a time, generate 8-bit encoding using the modified sounded algorithm above. For eg. the word ‘under’ gets an encoding of ‘U5376000’.

Step 2:Take the encoding from Step 1 and take one letter of the encoding at a time and do the following:

  • If the letter is a character, keep it as it is.
  • If the letter is a number, randomly pick a character from that cluster. For example, if the letter is 2, we can randomly pick any one of c, g, j, k, q, s, x, and z. The clusters are:

★ b, f, p, v → 1

★ c, g, j, k, q, s, x, z → 2

★ d, t → 3

★ l → 4

★ m, n → 5

★ r → 6

★ e, i, y → 7

★ o,u → 8

  • If the letter is 0 or no more letter left of the encoding, we are done for that word

Step 3:Repeat the same process for all the words in the sentence.

Guess the Gibberish Challenge Algorithm Implementation

Let’s visualize our gibberish output for some proverbs.

eQr2Ivf.png!web

Final Output: Gibberish Generated

The output is gibberish though sounds similar to the actual proverbs. That’s our simple “Guess the Gibberish Challenge” solution.

Possible Enhancements

In the present approach, we are creating a gibberish per word that can possibly be changed. Also, the first character remains the same both in the original input and in the gibberish output. We can have some workaround with those. I will provide two possible ways in which we can enhance our “Guess the Gibberish” algorithm and rest keep it to the audience as an open-ended problem to show creativity.

  1. Some smart heuristic to split and combine the words intelligently and then perform the encoding using a modified Soundex algorithm. For eg., United Kingdom’s one possible interesting gibberish is “Ewe night ted king dumb”.
  2. The first character doesn’t have to be the same in both the original input and in the gibberish output. For example, one possible way of encoding the letter ‘U’ can be ‘Ewe’.

Conclusion

Through this blog post, we developed an interesting and indeed simple algorithm to create our own ‘Guess the Gibberish Challenge’. In the process, we also learned about Phonetics, Phonology, and Phonetic algorithms. We also looked at ways on how we can make it more challenging. Hope you liked it. All the codes used in this blog post can be found here .

If you have any doubts or queries, do reach out to me. I will be interested to know if you think of some more possible enhancements to it.

About the author-:

Abhishek Mungoli is a seasoned Data Scientist with experience in ML field and Computer Science background, spanning over various domains and problem-solving mindset. Excelled in various Machine learning and Optimization problems specific to Retail. Enthusiastic about implementing Machine Learning models at scale and knowledge sharing via blogs, talks, meetups, and papers, etc.

My motive always is to simplify the toughest of the things to its most simplified version. I love problem-solving, data science, product development, and scaling solutions. I love to explore new places and working out in my leisure time. Follow me on Medium , Linkedin or Instagram and check out my previous posts . I welcome feedback and constructive criticism. Some of my blogs:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK