3

PyPi is Reducing Stored IP Address Data - Slashdot

 1 year ago
source link: https://developers.slashdot.org/story/23/05/27/2122238/pypi-is-reducing-stored-ip-address-data
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

PyPi is Reducing Stored IP Address Data

Become a fan of Slashdot on Facebook

binspamdupenotthebestofftopicslownewsdaystalestupid freshfunnyinsightfulinterestingmaybe offtopicflamebaittrollredundantoverrated insightfulinterestinginformativefunnyunderrated descriptive typodupeerror

Do you develop on GitHub? You can keep using GitHub but automatically sync your GitHub releases to SourceForge quickly and easily with this tool so your projects have a backup location, and get your project in front of SourceForge's nearly 30 million monthly users. It takes less than a minute. Get new users downloading your project releases today!

Sign up for the Slashdot newsletter! or check out the new Slashdot job board to browse remote jobs or jobs in your area
×

PyPi is Reducing Stored IP Address Data (theregister.com) 7

Posted by EditorDavid

on Sunday May 28, 2023 @10:34AM from the Python-Package-Index dept.
The PyPi registry of open source Python packages "began evaluating ways to reduce the amount of identifying information that it stores," reports the Register, "even before the U.S. Justice Department came asking for data on suspect users."

But now, "the Python community package registry wants developers to understand that it's working to minimize the user data that it stores." The goal is not to be unable to respond to lawful requests for information; rather it's to store only the minimum amount of data necessary so as not to expose users to unnecessary privacy intrusion. Coincidentally, data minimization may prevent organizations from becoming a preferred source of on-demand surveillance: having excessive amounts of information about users invites legal demands, which staff then have to handle...

Mike Fiedler, a member of the PyPI admin team, said in a statement on Friday that the organization's effort to improve user privacy and security dates back to 2020. Since the receipt of the subpoenas in March and April, that effort has been reinvigorated.

Much of the concern focuses on IP address data, which gets stored in conjunction with web log access; user events such as logins; project events including uploads; events associated with recently introduced organizations; and administrative PyPI journal entries. According to Fiedler, PyPI was able to stop storing IP data for journal entries — an append-only transaction log — because these were only exposed to administrators... To obscure IP addresses, PyPI is salting them — adding an arbitrary value — and then hashing them — running the data through a one-way scrambling function that creates a value called a hash. This provides a way to store a reference to potentially identifying data without actually storing raw data... PyPI has been using its CDN provider Fastly to pass along a salted hash of the IP address for requests via a custom header, along with broad GeoIP data (the country and city where the user is located), and is using that instead of the raw IP address. In April, the registry adopted code changes for hashing and salting IP addresses for requests that PyPI handles directly in Warehouse, the web application that implements the official Python package index.

And over the past few days, it has been replacing IP addresses in the PyPI user interface with geolocation data. PyPI still relies on IP address information to identify abuse — the creation of malicious packages, harassments, and so on — but Fiedler says even that is being looked at. "We're thinking about how to manage that without storing IP data, but we're not there yet," he said. Fiedler says the PyPI team will be weighing whether it can remove IP data from event history records after a period of time and whether the service can handle all its requests via CDN.

Do you have a GitHub project? Now you can sync your releases automatically with SourceForge and take advantage of both platforms.
Do you have a GitHub project? Now you can automatically sync your releases to SourceForge & take advantage of both platforms. The GitHub Import Tool allows you to quickly & easily import your GitHub project repos, releases, issues, & wiki to SourceForge with a few clicks. Then your future releases will be synced to SourceForge automatically. Your project will reach over 35 million more people per month and you’ll get detailed download statistics.
Sync Now


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK