3

Transform any text into a patent application

 2 years ago
source link: https://lav.io/2014/05/transform-any-text-into-a-patent-application/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Transform any text into a patent application

Figure 2 From Scheerbart's Perpetual Motion Machine

Figure 2 From Paul Scheerbart’s Perpetual Motion Machine

[Hello everyone! Thanks for taking a look at my blog! If you want updates on this project or other things I’m working on, just follow me on twitter]

I wrote a program that transforms literary and philosophical texts into patent applications. In short, it reframes texts as inventions or machines. You can view the code on github.

I was partially inspired by Paul Scheerbart’s Perpetual Motion Machine, a sort of technical/literary diary in which Scheerbart documents and reflects on various failed attempts to create a perpetual motion machine. Scheerbart frequently refers to his machines as “stories” – I wanted to reverse the concept and transform stories into machines.

In this post I’ll provide some details about how I wrote the program, and describe some of the tools that I used.

First, here’s some sample output, listed by invention title and source text:

The program operates in four parts. First it generates a title for the invention, then an abstract, then a list of illustrations, and finally a more detailed description of the “embodiments” of the invention.

In general, my methodology is to find common grammatical structures in patent applications, and then extract sentences containing similar grammatical structures from my input texts. To do this, I make heavy use of the Pattern library, which, among many other wonderful features, allows you to perform regular-expression-like searches using parts of speech. For example, here’s how you can use pattern to to search through a text for all instances of an adjective followed by a plural noun:

Python
from pattern.search import search
from pattern.en import parsetree
t = parsetree('A lot of things are ruining a lot of other things')
print search('JJ NNS', t)

Title Generation

There are a number of grammatical patterns that I noticed in patent application titles – one that stuck out to me is “[NOUN] (and [NOUN]) for [GERUND] [NOUN PHRASE]”. For example:

To create my invention titles, I simply search through the source text for “VBG * JJ? NP”, which translates to “a gerund, followed by anything, followed by an optional adjective, followed by a noun phrase.” The program selects an arbitrary title from all the options it finds, and then prefixes the title with a random combination of “system”, “method”, “apparatus”, and “device”. Occasionally it’ll add “web-based” into the mix as well. Here are a few of the many possible titles generated from the Communist Manifesto:

  • a web-based method and device for haunting Europe
  • an apparatus and device for rounding of the Cape
  • an apparatus and system for surpassing Egyptian pyramids
  • a web-based method and device for revolutionising the instruments
  • a system and device for clearing of whole continents
  • a web-based apparatus and method for paving the way
  • a system and apparatus for diminishing the means
  • an apparatus and system for fighting the bourgeoisie
  • a method and device for depicting the most general phases
  • a method and device for begetting a new supply
  • a method and apparatus for appropriating material products
  • a device and system for appropriating intellectual products
  • a system and device for springing from your present mode
  • a method and apparatus for having the wives
  • a web-based apparatus and method for desiring to abolish countries and nationality
  • a system and method for fluctuating between proletariat and bourgeoisie
  • a system and method for redressing social grievances

Generating an “Abstract”

Typically a patent application will have an abstract that describes in brief what the invention is comprised of. To generate my abstracts, I follow a similar method to the title generation, searching through my source text for instances of adjectives followed by singular or plural nouns. However, in this case I make a small but significant change. I restrict the possible nouns to those that fit into the category of “artifacts”. For example, here’s the abstract that gets generated from Heidegger’s essay on technology:

The devices comprises a wooden bridge, a technical apparatus, a high-frequency apparatus, a whole structure, a human handiwork, a mere handiwork, an autonomous tool, a hydroelectric plant, an actual chalice, an old windmill, a sacrificial chalice.

In order to do this, I wrote a function that searches first for grammatical patterns, and then filters that output based on hypernyms. A hypernym is a word that fits into a level of categorical abstraction up from another given word. So, a hypernym for “car” is “machine”, a hypernym for “pigeon” is “bird”. We can consider words sharing hypernyms as belonging to the same abstract category. This in itself is a fun tool to play with. For example, I can enter the following into my my program:

python search.py < kafka.txt 'organism' 'JJ NN'

And I get a list of all the adjectives followed by nouns that fit into the “organism” category in Kafka’s The Metamorphosis:

  • old man
  • own mother
  • timorous visitor
  • observant sister
  • tired man
  • junior salesman
  • expressive violinist
  • middle gentleman
  • good man
  • unfortunate son
  • old maid
  • elderly widow
  • horrible vermin
  • elderly mother
  • chief clerk
  • wild man
  • fall victim
  • sensible person
  • lazy son
  • much mother
  • commercial traveller
  • young lady

Illustrations

The next part of the program generates a list of “illustrations”. This time I search for phrases that fit into a grammatical structure that looks like “DT JJ NP IN * NN”, or, “determiner followed by adjective followed by noun phrase followed by a conjunction followed by anything, followed by a singular noun“. I attach these phrases to other randomly selected phrases commonly found in descriptions of patent illustrations. These “illustrations” come from Gargantua and Pantagruel by Rabelais:

  • Figure N illustrates the great puffguts of the counsellor.
  • Figure N is a schematic drawing of the first book of this translation.
  • Figure N is a perspective view of the old women in rut and heat.
  • Figure N is an isometric view of the middle finger of his right hand.
  • Figure N schematically illustrates a little peach-coloured bonnet with a great capon.
  • Figure N is a block diagram of the inundation of the urinal deluge.
  • Figure N is a cross section of the perfect image of my body.

I debated attaching actual illustrations to these descriptions, and even wrote a script to scrape Bing images for various patent illustrations, but in the end I decided the texts alone were better. I might explore the idea of programmatically creating illustrations in the future.

Detailed Description

The last part of my program creates a more detailed description of the invention. It does this by searching for “VB|VBD|VBZ|VBG * NN IN * NN”, or “any verb, followed by anything, followed by a noun, followed by a conjunction, followed by anything, followed by a noun.” I attach these, as in the illustrations section, to commonly found phrases in patent descriptions like “the present invention”, and “according to preferred embodiment”. Here are some excerpted results, from the Communist Manifesto:

The present invention is itself the product of a long course. The present invention finds its fitting complement in the most slothful indolence. The present invention creates a world after its own image. The present invention endangers the existence of bourgeois property. The present invention becomes an appendage of the machine.

According to another embodiment, the device layers the foundation for the sway. The device abolishes the right of personally acquired property. The device is the groundwork of all personal freedom.

According to another embodiment, the device is the miserable character of this appropriation. The device is the non-existence of any property. According to a preferred embodiment, the device deprives no man of power.

In accordance with an alternative specific embodiment, the present invention finds its complement in practical absence. The invention alters the character of intervention.

According to another embodiment, the device keeps even pace with dissolution. The invention is the most radical rupture with traditional property. The present invention is the condition for free development. The present invention comprehends the march of modern history.

According to another embodiment, the device is the necessary offspring of its own form. The present invention conceals the reactionary character of criticism. The present invention is the expression of the struggle.

According to a preferred embodiment, the invention expresses the struggle of one class. The invention presupposed the existence of modern bourgeois society. The invention improves the condition of every member.

Closing Thoughts

Soon I’ll create a little web service that runs the script on any user input. For the moment though you can download the code on github. The code base includes a number of (possibly) useful tools:

  • “machine.py” generates patents
  • “search.py” searches texts for parts of speech and hypernym combinations (among other things)
  • “get_illustrations.py” scrapes Bing for patent illustrations
  • “scraper.py” downloads the full text of patent applications based on keywords

BY THE WAY, if you ever want to kill a few hours, just search google patents for the dirtiest words you can come up with. You will not be disappointed.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK