46

Text Classification — RNN’s or CNN’s?

 5 years ago
source link: https://www.tuicool.com/articles/jyYFjir
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

RNN is a class of artificial neural network where connections between nodes form a directed graph along a sequence. It is basically a sequence of neural network blocks that are linked to each other like a chain. Each one is passing a message to a successor. If you want to dive into the internal mechanics, I highly recommend Colah’s blog . This architecture allows RNN to exhibit temporal behavior and capture sequential data which makes it a more ‘natural’ approach when dealing with textual data since text is naturally sequential.

CNN is a class of deep, feed-forward artificial neural networks where connections between nodes do not form a cycle. CNNs are generally used in computer vision, however they’ve shown promising results when applied to various NLP tasks as well. Again for diving into the specifics, Colah’s blogs are a great place to start.

An RNN is trained to recognize patterns across time, while a CNN learns to recognize patterns across space.

Which DNN type performs better when dealing with text data depends on how often the comprehension of global/long-range semantics is required. For tasks where length of text is important, it makes sense to go with RNN variants. These types of tasks include: question-answering, translation etc.

It turns out that CNNs applied to certain NLP problems perform quite well. Let’s briefly see what happens when we use CNN on text data.

The result of each convolution will fire when a special pattern is detected. By varying the size of the kernels and concatenating their outputs, you’re allowing yourself to detect patterns of multiples sizes (2, 3, or 5 adjacent words).Patterns could be expressions (word ngrams?) like “I hate”, “very good” and therefore CNNs can identify them in the sentence regardless of their position. Based on the above explanation, the most natural fit for CNNs seem to be classifications tasks, such as Sentiment Analysis, Spam Detection or Topic Categorization. Convolutions and pooling operations lose information about the local order of words, so that sequence tagging as in PoS Tagging or Entity Extraction is a bit harder to fit into a pure CNN architecture (though not impossible, you can add positional features to the input). Pooling also reduces the output dimensionality but (hopefully) keeps the most salient information. You can think of each filter as detecting a specific feature, such as detecting if the sentence contains a negation like “not amazing” for example. If this phrase occurs somewhere in the sentence, the result of applying the filter to that region will yield a large value, but a small value in other regions. By performing the max operation you are keeping information about whether or not the feature appeared in the sentence, but you are losing information about where exactly it appeared.

RNNs are designed to make use of sequential data, when the current step has some kind of relation with the previous steps. This makes them ideal for applications with a time component (audio, time-series data) and natural language processing. RNN’s perform very well for applications where sequential information is clearly important, because the meaning could be misinterpreted or the grammar could be incorrect if sequential information is not used. Applications include image captioning, language modeling and machine translation.

CNN’s are good at extracting local and position-invariant features whereas RNN’s are better when classification is determined by a long range semantic dependency rather than some local key-phrases. For tasks where feature detection in text is more important, for example, searching for angry terms, sadness, abuses, named entities etc. CNN’s work well whereas for tasks where sequential modeling is more important, RNN’s work better. Based on the above characterization, it makes sense to choose a CNN for classification tasks like sentiment classification since sentiment is usually determined by some key phrases and to choose RNNs for a sequence modeling task like language modeling or machine translation or image captioning as it requires flexible modeling of context dependencies. RNNs usually are good at predicting what comes next in a sequence while CNNs can learn to classify a sentence or a paragraph.

A big argument for CNNs is that they are fast. Very fast. Based on computation time CNN seems to be much faster (~ 5x ) than RNN. Convolutions are a central part of computer graphics and implemented on a hardware level on GPUs. Applications like text classification or sentiment analysis don’t actually need to use the information stored in the sequential nature of the data. For example, a hypothetical restaurant review: I was very disappointed with this restaurant. The service was incredibly slow, and the food was mediocre. I will not be back. While there is sequential information in the data, if you are trying to predict whether the sentiment was good or bad a CNN model may be sufficient and even better in terms of computation. The important information needed to make the prediction lie in the phrases “very disappointed”, “incredibly slow”, and “mediocre.” If you are only using 2-grams, a RNN may be able to additionally capture that it was the service that was incredibly slow, compared to something else that may be good for being slow (music, perhaps?). But oftentimes, this is not necessary, and the more complex RNN could overfit compared to a simpler model.

References


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK