5

Diving into Digital Ephemera: Identifying Defunct URLs in the Web Archives - Sla...

 2 years ago
source link: https://it.slashdot.org/story/22/08/05/1742254/diving-into-digital-ephemera-identifying-defunct-urls-in-the-web-archives
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Diving into Digital Ephemera: Identifying Defunct URLs in the Web Archives

Slashdot is powered by your submissions, so send in your scoop

binspamdupenotthebestofftopicslownewsdaystalestupid freshfunnyinsightfulinterestingmaybe offtopicflamebaittrollredundantoverrated insightfulinterestinginformativefunnyunderrated descriptive typodupeerror

Do you develop on GitHub? You can keep using GitHub but automatically sync your GitHub releases to SourceForge quickly and easily with this tool so your projects have a backup location, and get your project in front of SourceForge's nearly 30 million monthly users. It takes less than a minute. Get new users downloading your project releases today!
×
Olivia Meehan, who worked on the web archiving team at the US Library of Congress, evaluates how well online archives of the Papal Transition 2005 Collection from 2005 have survived: Based on the results I have so far and conversations I've had with other web archivists, the lifecycle of websites is unpredictable to the extent that accurately tracking the status of a site inherently requires nuance, time, and attention -- which is difficult to maintain at scale. This data is valuable, however, and is worth pursuing when possibleÂ. Using a sample selection of URLs from larger collections could make this more manageable than comprehensive reviews.

Of the content originally captured in the Papal Transition 2005 Collection, 41% is now offline. Without the archived pages, the information, perspectives, and experiences expressed on those websites would potentially be lost forever. They include blogs, personal websites, individually-maintained web portals, and annotated bibliographies. They frequently represent small voices and unique perspectives that may be overlooked or under-represented by large online publications with the resources to maintain legacy pages and articles.

The internet is impermanent in a way that is difficult to quantify. The constant creation of new information obscures what is routinely deleted, overwritten, and lost. While the scope of this project is small within the context of the wider internet, and even within the context of the Library's Web Archive collections as a whole, I hope that it effectively demonstrates the value of web archives in preserving snapshots of the online world as it moves and changes at a record pace.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK