0

Lib.rs website improvements

 6 months ago
source link: https://users.rust-lang.org/t/lib-rs-website-improvements/108218
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

What's new on //lib.rs/ 330

  • Social media image previews. Links to crates on lib.rs shared on on Mastodon, Facebook, etc. look fancier now.

    resvg.svg
    41

    The previews are dynamically generated. For compatibility, they must be raster images, so they're PNGs 5 rendered with resvg 25 from SVG 5 templates. Resvg is awesome, but text layout in SVG was such a pain even for the simple 3-4 lines of text. I want to add more info there, so suggestions for data and design are welcome.

  • Better caching + purging of pages at the CDN. The lag between crate publication and visibility on lib.rs is down from hours to ~15 minutes (I still have work to do to refresh the index more often). Most pages are also compressed with Brotli level 11 to less than 10% of their raw HTML size, and distributed globally. Pages that are pre-cached on the CDN can load so fast the site can feel like an app running locally, and that isn't even a trick-laden serviceworker webapp, just plain JS-less HTML!

  • I've rewritten automatic keyword guessing. Previously it'd scrape README looking for words that could be keywords (with TF-IDF), but that used to pick unrelated words like "join us on discord" picking "discord" as the crate's keyword. Now I'm preferring keywords that appear in multiple sources: doc comments, identifiers in the code, the README, and crate and repository metadata. It's also smarter about synonyms and can pick 2-3 compound-word keywords. It's still imperfect, so please give your crates explicit keywords and categories 4!

  • Filtering of bot/mirror traffic from download numbers. I'm denoising download numbers and estimating noise floor from oldest, least used versions of crates. It lessens the impact of the recent change to how crates-io counts downloads.

  • Search ranking improvements. The top few crates are picked using different criteria — some are by relevance, some are by popularity. When words can have multiple meanings, I try to include all of them (e.g. search for "image 3" gives an image codec, but also docker image and kernel image). I've tuned handling of exact matches: you don't always want an exact match, e.g. there's an abandoned crate named error 14 which may be older than std::error::Error itself.

  • It's possible to sort category pages by number of downloads or most recently published crates 1. Personally I don't think they're useful, but it's one of the oldest feature requests.

  • The /audit subpage 20 notes which crates are available in Debian and guix. That's better than nothing, but unfortunately that alone is not a safety guarantee (as I've been informed by Debian maintainers), so supply chain security remains a tough problem.

  • Rendering of Markdown is closer to GitHub's rendering. There's a long tail of quirks and tweaks in GitHub's Markdown flavor (e.g. dark theme images 20), so it may still be imperfect. BTW, proper handling of relative URLs in readmes continues to have mindboggingly complex edge cases of symlinks + relative paths + Cargo fixups + proprietary URL schemes + repos changing layout between releases. Please use absolute URLs whenever you can, and don't use readme = "../README" in Cargo.toml.

  • In addition to stats which versions of Rust are supported by crates 3, now I have data which versions of Rust people use 27. The data is scraped from a still-unofficial source, and is likely full of bot and CI build traffic, so take it with a big grain of salt.

  • I'm retiring libs.rs and crates.rs domains to avoid confusion. They show a big warning now that it's just lib.rs 330 (lib, singular).

  • I had to do some work on scaling, performance, and memory usage. In the beginning I laughed how easy it is that I can just load all crates into RAM and compute all the data on the fly. That was easy with 5-10K crates. Now there's 140K of them, and I track much more data, so soon I'll have to start using a real database instead of serdeing HashMaps from disk. Also there are so many crates now that rate limiting of GitHub and crates-io APIs are often a bottleneck. At a rate of 1 req/s it takes almost two days to go through all of them, and with several calls per crate, if I cache anything for less than a week, I may exceed request quotas!

  • I've got a beefier machine for building crates and estimating their MSRV, so now more crates should have a useful range of versions they likely support. Also many crates specify rust-version now, which is super helpful (but remember to keep that version up to date with code changes!)

  • The new 31 page isn't overwhelmed by daily auto-releasing crates. It prefers more notable updates, based on how big the semver increase was and how long ago previous version has been released.

  • I've rewritten processing of Cargo.toml [features], which is now a reusable crate 21. The maintainer dashboard 160 how warns when you forget to use the dep: syntax in features.

  • See the previous list of what was added last year 6.

  • Also shoutout to crates-io team for deleting a ton of namesquatted crates. I see in my logs waves of crates appearing and disappearing, so it's not just that one guy who took a bunch of crates, but an ongoing battle to keep the registry clean.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK