13

Mining Order From Chaos: the Ingenious and Creative Fusion of NLP & Graph Th...

 4 years ago
source link: https://mc.ai/mining-order-from-chaos-the-ingenious-and-creative-fusion-of-nlp-graph-theory/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

A knowledge (semantic) graph is, daresay, one of the most fascinating concepts in data science. The applications, extensions, and potential of knowledge graphs to mine order from the chaos of unstructured text is truly mind-blowing.

The graph consists of nodes and edges, where a node represents an entity and an edge represents a relationship. No entity in a graph can be repeated twice, and when there are enough entities in a graph, the connections between each can reveal worlds of information.

Just with a few entities, interesting relationships begin to emerge. As a general rule, entities are nouns and relationships are verbs; for instance, “the USA is a member of NATO” would correspond to a graph relationship “[entity USA] to [entity NATO] with [relationship member of]”. Just using text from three to four sentences of information, one could construct a rudimentary knowledge graph:

Imagine the sheer amount of knowledge possessed in a complete Wikipedia article, or even an entire book! One could perform detailed analyses with this abundance of data; for example, identifying the most important entities or what the most common action or relationship an entity is on the receiving end of. Unfortunately, while building knowledge graphs is simple for humans, it is not scalable. We can build simple rule-based automated graph-builders.

To demonstrate the automation of knowledge-graph building, consider an expert of a biography of the great computer scientist and founder of artificial intelligence, Alan Turing. Since we’ve established that entities are nouns and verbs are relationships, let us first split the text into chunks, where each contains a relationship between two objects.

A simple method to do this is to separate by sentence, but a more rigorous method would be to separate by clause, since there may be many clauses and hence relationships in a single sentence (“she walked her dog to the park, then she bought food”).

Identifying the objects involved — entity extraction — is a more difficult task. Consider “Turing test”: this is an example of a nested entity, or an entity within the name of another entity. While POS (part of speech) tagging is sufficient for single-word nouns, one will need to use dependency parsing for multi-word nouns.

Dependency parsing is the task of recognizing a sentence and assigning a syntax-based structure to it. Because dependency trees are based on grammar and not word-by-word, it doesn’t care how many words an object consists of, as long as it is enclosed by other structures like verbs (‘proposed’) or transitioning phrases (‘as a…’). It is also used to find the verb that relates the two objects, systematically following what it believes is the syntax of the sentence and the rules of grammar. One can also use similar methods to link pronouns (‘he’, ‘she’, ‘they’) to the person it refers to (pronoun resolution).

It is worth mentioning that one may also benefit from building a knowledge graph by adding synonyms; tutorials will often show examples with the same word repeated many times for simplicity, but to humans using the same word repeatedly is so looked down-upon that writers actively find synonyms (words that mean the same thing as another word). One way to do this is with Hearst patterns, named after Marti Hearst, a computational linguistics researcher and professor at UC Berkeley. In her extensive research, she discovered a set of reoccurring patterns that can be reliably used to extract information.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK