Quick python code to parse mbox files, specifically those used by GMail. Extract...
source link: https://gist.github.com/benwattsjones/060ad83efd2b3afc8b229d41f9b246c4
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Quick python code to parse mbox files, specifically those used by GMail. Extracts sender, date, plain text contents etc., ignores base64 attachments. · GitHub
Instantly share code, notes, and snippets.
row 68, suggestion: print('Parsing email {0} of {1}'.format(idx + 1, num_entries))
(idx fix)
The comment # ~*~ utf-8 ~*~
is useless; the default source code encoding for Python 3 is UTF-8 anyway, and if you wanted to communicate this fact to Emacs etc, the proper format uses dashes, not tildes, and a token coding:
before the encoding name. See PEP-263.
Thanks! This helped me out. In parse_email(), would it make more sense to assign the email parts to instance variables? E.g.self.email_from = self.email_data['From']
instead ofemail_from = self.email_data['From']
Otherwise, how is a user of this class meant to access these?
@benwattsjones Brilliant, thanks for posting this!
@redcay yes, that or something like it is needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK