Python编程9——Python分析Gettysburg Address演讲
source link: https://iphyer.github.io/blog/2013/01/13/pythongettsburg/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Python编程9——Python分析Gettysburg Address演讲
这是对于葛底斯堡演讲所作的分析。
林肯的这篇演讲是美国历史上的重要演讲,文章最后的“that government of the people, by the people, for the people, shall not perish from the earth.”更是传唱千古的名句。 这里简单分析了这篇演讲,用词并不是很多。
这里的练习是为了让大家熟练掌握需要的语言分析技巧。其实用过的很多的程序,比如R等,但是说实话,分析文本数据处理其他的东西而言 还是Python特别顺手。
程序1——简单分析
#getttsburg addresss analysis
#count words, unique words, common wrods
def makewordList(gFile):
"""create a list of the words in the address"""
speech=[]
for lineString in gFile:
lineList=lineString.split()
for word in lineList:
if word !="--":
speech.append(word)
return speech
gFile=open("gettysburg.txt","rU")
speech=makewordList(gFile)
print speech
print "Speech Length:",len(speech)
可以看到全文用词只有271个。
程序2——改进消除重复
#getttsburg addresss analysis
#count words, unique words, common wrods
#here is the case that print the unique words in the addresss
def makewordList(gFile):
"""create a list of the words in the address"""
speech=[]
for lineString in gFile:
lineList=lineString.split()
for word in lineList:
word=word.lower()
word=word.strip('.,')
if word !="--":
speech.append(word)
return speech
def makeunique(speech):
"""this is pick out the unique word and put them in uniquelist"""
unique=[]
for word in speech:
if word not in unique:
unique.append(word)
return unique
gFile=open("gettysburg.txt","rU")
speech=makewordList(gFile)
unique=makeunique(speech)
print speech
print "Speech Length:",len(speech)
print unique
print "Length of the unique words",len(unique)
这里会发现用词格外的少,去然只有138个不同的用词。可以发现林肯的演讲确实是非常的精炼。
Recommend
-
60
ARP协议分析&python编程实现ARP欺骗抓图片
-
7
PyCon China 2020 演讲: Python 技术名词发音指南李辉greyli.com这是我在今年 PyCon Ch...
-
8
Python编程12——Python作葛底斯堡演讲的词频统计图 这篇的主要程序部分和“Python编程9——Python分析葛底斯堡演讲的词频统计”完全一样,就是多引入了两个库,同时多做了一个做图函数。
-
5
Python编程10——Python做葛底斯堡演讲的词频统计 这里继续接着前面的工作做下去。 这里主要不单单是统计出葛底斯堡演讲的单词分别打印出来,还希望可以作进一步的分析。比如,这篇演讲使用频率最多的词汇是哪些。 这里主要的特...
-
2
Here is a Python script that does wake on lan (if your MAC address is01-23-45-67-89-0a Instantly share code, notes, and snippets.
-
5
Python爬虫编程思想(89):如何用逆向工程分析异步加载页面 ...
-
3
Python爬虫编程思想(90):分析异步装载页面返回的json数据 ...
-
8
In this article, we will discuss three different ways to check if a string is a valid IP Address in Python. Table of Contents Check If a String is a valid IP Address using Regex In Python, the regex mo...
-
2
Home Che...
-
5
Python program to check if email address is valid or notSkip to content Python program to...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK