0

Optimize File::read_to_end and read_to_string by jkugelman · Pull Request #89582...

 2 years ago
source link: https://github.com/rust-lang/rust/pull/89582
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Copy link

Contributor

jkugelman commented 15 days ago

edited

Reading a file into an empty vector or string buffer can incur unnecessary read syscalls and memory re-allocations as the buffer "warms up" and grows to its final size. This is perhaps a necessary evil with generic readers, but files can be read in smarter by checking the file size and reserving that much capacity.

std::fs::read and std::fs::read_to_string already perform this optimization: they open the file, reads its metadata, and call with_capacity with the file size. This ensures that the buffer does not need to be resized and an initial string of small read syscalls.

However, if a user opens the File themselves and calls file.read_to_end or file.read_to_string they do not get this optimization.

let mut buf = Vec::new();
file.read_to_end(&mut buf)?;

I searched through this project's codebase and even here are a lot of examples of this. They're found all over in unit tests, which isn't a big deal, but there are also several real instances in the compiler and in Cargo. I've documented the ones I found in a comment here:

#89516 (comment)

Most telling, the documentation for both the Read trait and the Read::read_to_end method both show this exact pattern as examples of how to use readers. What this says to me is that this shouldn't be solved by simply fixing the instances of it in this codebase. If it's here it's certain to be prevalent in the wider Rust ecosystem.

To that end, this commit adds specializations of read_to_end and read_to_string directly on File. This way it's no longer a minor footgun to start with an empty buffer when reading a file in.

A nice side effect of this change is that code that accesses a File as impl Read or dyn Read will benefit. For example, this code from compiler/rustc_serialize/src/json.rs:

pub fn from_reader(rdr: &mut dyn Read) -> Result<Json, BuilderError> {
    let mut contents = Vec::new();
    match rdr.read_to_end(&mut contents) {

Related changes:

  • I also added specializations to BufReader to delegate to self.inner's methods. That way it can call File's optimized implementations if the inner reader is a file.

  • The private std::io::append_to_string function is now marked unsafe.

  • File::read_to_string being more efficient means that the performance note for io::read_to_string can be softened. I've added @camelid's suggested wording from #80218 (comment).

r? @joshtriplett


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK