5

Importing Cookies from a Firefox Profile in Python (Shallow Thoughts)

 2 years ago
source link: https://shallowsky.com/blog/programming/python-firefox-cookies.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Importing Cookies from a Firefox Profile in Python

I wrote at length about my explorations into selenium to fetch stories from the New York Times (as a subscriber). But I mentioned in Part III that there was a much easier way to fetch those stories, as long as the stories didn't need JavaScript.

That way is to use normal file fetching (using urllib or requests), but with a CookieJar object containing the cookies from a Firefox session where I'd logged in.

FeedMe was already using an empty CookieJar, since some sites die or go into infinite loops if they can't set cookies. Its CookieJar started out empty and just let each site write cookies as they saw fit.

from http.cookiejar import CookieJar
import urllib.request, urllib.error, urllib.parse

cookiejar = CookieJar()

opener = urllib.request.build_opener(
    urllib.request.HTTPCookieProcessor(self.cookiejar))
response = opener.open(request, timeout=100)

FeedMe uses the built-in urllib rather than requests, because the code is old, and since urllib works fine, I've never gotten around to rewriting it. But it's even easier with requests:

response = requests.get(url, cookies=cookiejar)

That just left importing cookies from a Mozilla profile.

http.cookiejar includes a class called MozillaCookieJar. So it sounds like the functionality is already there, right?

Well, no. From the documentation linked in the previous paragraph:

class http.cookiejar.MozillaCookieJar(filename, delayload=None, policy=None)

A FileCookieJar that can load from and save cookies to disk in the Mozilla cookies.txt file format (which is also used by the Lynx and Netscape browsers).

Firefox stopped using the cookies.txt format around 2008, as best I can determine, when they switched to using cookies.sqlite instead. There was a bug on MozillaCookieJar filed back then on the issue, with a patch, but the bug was rejected because the Python 2.6/3.0 release was about to happen, and the bug was closed at that time rather than merely being postponed. I filed a new bug hoping to re-raise the issue.

But meanwhile, the only way to use a MozillaCookieJar is to write code to read the sqlite file and translate it to the old cookies.txt format. The best code I've found for doing that comes from a 2009 blog post: Reading Firefox 3.x cookies in Python which I found via a StackOverflow thread, Accessing Firefox 3 cookies in Python. The code is in both places, so I needn't repeat it here.

The method is a little squinchy, using a StringIO to emulate a cookies.txt file, but it works fine, at least until such time as someone sees fit to replace the almost 15 years out of date MozillaCookieJar code with something that actually works.

Tags: programming, python, cookies, firefox, scraping
[ 12:22 Dec 03, 2021    More programming | permalink to this entry | ]


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK