27

Death Match: AI Bernie vs. AI Joe

 4 years ago
source link: https://towardsdatascience.com/death-match-ai-bernie-vs-ai-joe-87e80347b0a5?gi=73b205c400a7
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

NNRr2yr.png!web

Left image source , right image source . Both images are free to use commercially and modify.

Death Match: AI Bernie vs. AI Joe

How anyone can train a model to speak like anyone

Vjqi22M.jpg!web

Mar 8 ·8min read

Look, I’ve been making a public decision. You know, if you let them tolerate endorsements and our aid. The fact of the matter is I have no possibility of the world standing up and making sure we lead. Look, the fact is that we’re in a position where we have to say, “We’re going to walk with a drug council”.

That was Democratic party candidate AI Joe Biden, character for character. That’s right — AI Joe Biden. He was created in 4 lines of code and can generate passages that are almost identical to Joe Biden. Don’t believe it? Keep on reading! In this article, I’ll cover:

  • What a Recurrent Neural Network is and how it can model language
  • How anyone can create a Recurrent Neural Network in 4 lines of code
  • Using RNNs on a fun death match between AI Joe Biden and AI Bernie Sanders!

What is a Recurrent Neural Network?

A Recurrent Neural Network, or an RNN, is a type of neural network that specializes in processing sequences. This is especially helpful in the scenario where we input a certain text — say, ‘cats and ____’ and we expect the model to output ‘dogs’.

One issue with standard neural networks is that it has a fixed input and output size. For example, in a convolutional neural net trained on the MNIST dataset, each training and testing example can only be 784 values — no more, no less. While this is practical in tasks like image recognition, it is certainly not for natural language processing tasks, where the input and output may vary between a few characters to several sentences or even more.

RNNs allows for variable-length inputs and outputs. RNNs can look like any of the below, where red is the input, green is the RNN, and blue is the output:

iee6Nv2.png!web

For our purposes, we will be using a many-to-many RNN. In the below diagram, y denotes the output, h a hidden layer, and x the input.

VZfYzii.png!web

At any given step t , the next hidden state h . t (where a.b denotes the b th instance of a ) is calculated using the previous state h.t- 1 and the next input x . t .

RNNs use the same weights for each step — this is what makes it recurrent. A typical RNN will use three weights — one for all x.m to h.m links, one for all h.m to h.n links, and one for all h . n to y.n links (where m and n are arbitrary numbers). Rephrased, the weights used are input to layer, layer to layer, and layer to output. Two biases are also added — one for calculating h , and one for calculating y, alongside an activation function.

This way, any number of x s can be used to produce any number of y s.

RNNs model language especially well because in language there is an inherent structure to the way words are formed. This inherent structure is representative of how you or I or a politician might talk. Although it might be difficult to point exactly what it is, most people can point intuitively to which tweet was posted by Donald Trump and which one by Barrack Obama, not by the content of their ideas but by the way it is expressed:

Did you get it? The first tweet was posted by Barrack Obama and the second by Donald Trump. The usage of the words “Great” and “prevail” followed by a trademark “!” are signature parts of the structure of Donald Trump’s speech — the way he speaks, regardless of content, is just somehow intuitively unique. You were able to identify these tweets not by the content of their tweets — I guarantee that both politicians support the right to vote and Tennessee — but by how the words were expressed.

RNNs tries to develop the intuition you just demonstrated by finding a small but very accurate set of weights and biases that replicates the unique structure of speech that everyone has, something that convolutional neural networks (CNNs) cannot do.

How anyone can make a RNN in 4 lines of code

Recurrent Neural Networks can be manually created with Keras, but a handy module called textgenrnn allows the user to create a RNN without having to deal with messy data dimensions, transformations, encoding, vectorizing, etc.

The textgenrnn module can be implemented via:

!pip install textgenrnn

Next, the model can be trained on content from a text file:

from textgenrnn import textgenrnn  
textgen = textgenrnn()
textgen.train_from_file('text.txt', num_epochs=50)

This will initiate a familiar Keras training sequence for a pre-built RNN. For a quick (but most likely bad-quality) result, just train for one epoch.

Finally, the model can generate content!

textgen.generate(5,temperature=0.9)

The first number represents the number of instances the model should generate. ‘5’ will output 5 unique content lines. Temperature is a measure of how original the content generated is, set default as 0.5. A high temperature will have fun and interesting results but may not be as strictly resembling of the training data as a low temperature.

For more information on documentation and the architecture of the RNN in textgenrnn, check out its GitHub page .

Finally, the death match

OK… maybe it’s not a death match. But it sure will be entertaining!

The data was taken from the 2020 Democratic Debate Transcript dataset on Kaggle . The below code takes all the text that Biden and Sanders have each said and stores them in a NumPy array:

joe = np.array(data[data['speaker'] == 'Joe Biden']['speech'].reset_index().drop('index',axis=1))
bernie = np.array(data[data['speaker'] == 'Bernie Sanders']['speech'].reset_index().drop('index',axis=1))

…which can then be converted into a text file.

joe_text = open("joe.txt","w+")
for item in joe:
    joe_text.write(item[0])
    joe_text.write('\n')
joe_text.close()
bernie_text = open("bernie.txt","w+")
for item in bernie:
    bernie_text.write(item[0])
bernie_text.close()

When reading ‘joe.txt’, the file reads…

e6fuEbf.png!web

…where each line is a new training example.

Training the model is simple.

!pip install textgenrnn
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.train_from_file('joe.txt', num_epochs=10)
textgen.generate()

Another model that replicates how Bernie Sanders talks can be created by simply substituting ‘joe.txt’ with ‘bernie.txt’.

We get some interesting results!

Beforehand, some things to take note of

  • The RNN knew nothing about the English language before training. After a few hours, it has learned punctuation, capitalization, and a fundamental understanding of grammar. Given a week of training and a more specialized architecture, the RNN could probably pass the Turing Test (it would be indifferentiable from the real Joe Biden or Bernie Sanders). That’s the power of machines — they can literally learn a language that takes people at least a decade to develop fully in a matter of hours.
  • The RNN is able to replicate key aspects of one’s speech — for example, the very often ‘the fact of the matter is’ and ‘look’ that is embedded in Joe Biden’s phraseology, but also what issues they talk about. For example, Joe Biden talks a lot about the NRA — that’s why it shows up many times.
  • The RNN is clearly not perfect. The downside of the textgenrnn model is that with ease of creation comes lack of originality — I am sure that tweaks to the model architecture could improve performance. Another issue is running time — these AIs were trained after 50 epochs (about 3 hours of training). That being said, here are some of the best passages (character-for-character) generated by AI Joe and AI Bernie. Decide for yourself who won the death match! A big notice — all of these passages were generated by textgenrnn. What AI Joe and AI Bernie say means nothing about the real Joe Biden and Bernie Sanders.

AI Joe Biden

  • Look, I would make sure that we get out of the NRA talking about the United States of America. The fact of the matter is the idea that the fact is that I was deeply involved in the NRA talking about that insurance to get the first place, which I was a public option. I said to Afghanistan, I’m the guy.
  • I was part of the existential threat to the American people who are in jail. The fact of the matter is they did not have a consequence of money. And by the way, the fact is that we have to read NATO. On this state on the Paris Climate Accord. I was able to buy into the country.
  • I would have to make sure that we have to start three opioids, the fact of the matter is I was opposed to the United States of America. Look, that’s why we should be putting pressure on China to get a chance to see the tax code the same thing we have to have $800 billion.

AI Bernie Sanders

  • Billionaires have $850 billion more than Mr. Putin, who has given Mr. Pete Medicare For All. Good —thank you. No, I didn’t say that, Pete. No, that’s not the problem. No, let’s talk about Medicare For All.
  • Look, first of all, I think she was talking about my plan, not Xi. Let’s talk about math! Let’s talk about math! Excuse me, is it my turn? I will respond to the attack. Is it my turn?
  • Furthermore, it is my view that the time is now. Joe made this point. Look, at the end of the day of the day. No, here is the good news. Look, because of the mass shootings, the American people rally that we must be aggressive on China.

Who won?

With textgenrnn, anyone can create an AI Joe, an AI Bernie, or an AI anyone, as long as the data is available! It is a great introduction into what machines can do.

If you’re interested in a more high-level RNN implementation in TensorFlow, check out this tutorial on creating an AI that writes Shakespearean plays.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK