Conill: How efficient can cat(1) be?

[Posted July 18, 2022 by corbet]

Ariadne Conill explores ways to make the Unix cat utility more efficient on Linux.

The first possible option is the venerable sendfile syscall, which was originally added to improve the file serving performance of web servers. Originally, sendfile required the destination file descriptor to be a socket, but this restriction was removed in Linux 2.6.33. Unfortunately, sendfile is not perfect: because it only supports file descriptors which can be memory mapped, we must use a different strategy when using copying from stdin.

(Log in to post comments)

Conill: How efficient can cat(1) be?

Posted Jul 18, 2022 19:00 UTC (Mon) by qyliss (guest, #131684) [Link]

Since version 9.1, cat in GNU coreutils will use copy_file_range if possible. Surprisingly, it's actually the first program in coreutils to use that API. Which means that, for a disk image build I have, that requires assembling individual partition images into a combined GPT image, it's significantly faster to write each partition into place with cat than it is to use dd, which I'd have expected to be the better tool for the job.

Conill: How efficient can cat(1) be?

Posted Jul 19, 2022 10:45 UTC (Tue) by ddevault (subscriber, #99589) [Link]

Here's my implementation of cat in Hare, for general interest. io::copy uses sendfile when possible, and perhaps splice could also be added, but the extra pipe(2) is not great IMO, could fail if file descriptors are exhausted and has other side-effects besides. Simplicity trumps efficiency for tools like this in my book.

use fmt;
use fs;
use getopt;
use io;
use main;
use os;

export fn utilmain() (main::error | void) = {
	const cmd = getopt::parse(os::args,
		('u', "POSIX compatibility, ignored"),
		"[file...]");
	defer getopt::finish(&cmd);

	if (len(cmd.args) == 0) {
		io::copy(os::stdout, os::stdin)?;
		return;
	};

	for (let i = 0z; i < len(cmd.args); i += 1z) {
		const file = open(cmd.args[i]);
		match (io::copy(os::stdout, file)) {
		case size => void;
		case let err: io::error =>
			io::close(file): void;
			return err;
		};
		io::close(file)?;
	};
};

fn open(path: str) io::handle = {
	if (path == "-") {
		return os::stdin;
	};

	match (os::open(path)) {
	case let err: fs::error =>
		fmt::fatal("Error opening '{}': {}", path, fs::strerror(err));
	case let file: io::file =>
		return file;
	};
};

Conill: How efficient can cat(1) be?

Posted Jul 19, 2022 14:06 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

This looks like Nim or Zig to me. Care to be specific for those less familiar with these languages than even I am? :)

Conill: How efficient can cat(1) be?

Posted Jul 19, 2022 14:12 UTC (Tue) by ddevault (subscriber, #99589) [Link]

Hare is closer to C than to Nim or Zig. You can learn about it here:

https://harelang.org

Conill: How efficient can cat(1) be?

Posted Jul 19, 2022 14:43 UTC (Tue) by zdzichu (subscriber, #17118) [Link]

> Simplicity trumps efficiency for tools like this in my book.

Nevertheless, the original blog post was about efficiency. Could you please benchmark the code from blog and your implementation?

Conill: How efficient can cat(1) be?

Posted Jul 19, 2022 15:07 UTC (Tue) by ddevault (subscriber, #99589) [Link]

My point is to suggest that efficiency is not always the correct measure. I am questioning the article's premise. There are vanishingly few situations where cat is your bottleneck, and you would be better off with a simpler (and thus likely to be less buggy!) implementation which can be relied upon to work and be understood.

But sure. I'm not on a particularly good laptop here, so I'm working with a 2G test file instead of 4G. Re-running the benchmark from the programs in the blog post in order, I get 2.7 GB/s, 6.4 GB/s, and 7.1 GB/s. My implementation gets 6.4 GB/s. This is not really surprising since io::copy uses the same sendfile approach as the second program in the blog post and the bottlenecks have everything to do with the syscalls and almost nothing to do with the userspace code.

But what's important is that all of these numbers are way more than "fast enough" for essentially any use-case of cat.

Conill: How efficient can cat(1) be?

Posted Jul 19, 2022 15:27 UTC (Tue) by Wol (subscriber, #4433) [Link]

> There are vanishingly few situations where cat is your bottleneck, and you would be better off with a simpler (and thus likely to be less buggy!) implementation which can be relied upon to work and be understood.

Which may be true for code that is not run that often.

But for a utility like cat that is used again and again, while the gain may be minimal per invocation, it is also colossal over time. A tiny gain in heavily used code can easily trump a huge gain in rarely used code.

And if the code is heavily used, there are (hopefully) a lot of people who understand it. Just because you think the gain isn't worth it - for you - there are plenty of people who will think the gain worth a lot. Especially in datacentres where those few milliseconds add up to lots of £££ ...

Cheers,
Wol

Conill: How efficient can cat(1) be?

Posted Jul 19, 2022 15:33 UTC (Tue) by ddevault (subscriber, #99589) [Link]

The microseconds added up are a rounding error in comparison to the amount of time spent debugging needlessly complex software when it breaks, and brain time costs much, much more than CPU time. All three of the cat implementations in this blog post are pretty simple, but just take a look at GNU cat for a demonstration is gross excess in software design. Generalize this across our entire industry, beyond simple tools like cat(1), and the result is quite serious.

Conill: How efficient can cat(1) be?

Conill: How efficient can cat(1) be?

Conill: How efficient can cat(1) be?

Conill: How efficient can cat(1) be?

Conill: How efficient can cat(1) be?

Conill: How efficient can cat(1) be?

Conill: How efficient can cat(1) be?

Conill: How efficient can cat(1) be?

Conill: How efficient can cat(1) be?

Conill: How efficient can cat(1) be?

Recommend

Logitech MX Mechanical Mini Wireless Keyboard review

三箭资本欠区块链开发商 Moonbeam 基金会超过 2700 万美元

Go 每日一库之 roaring

Go 每日一库之 gore

路维光电实控人不避忌或干扰独立性同关联方频繁资金往来超亿元

微星AMD 600系主板M.2插槽将采用免工具安装：已申请专利，将在产品中推广

网商银行发布“百灵”风控系统：人机互动技术首次应用于信贷审批

火山引擎质量平台已对外开放助力打造更优体验的视频服务-品玩

深化T型战略助供应链升级 2022美沃斯大会新氧开辟成长新动能-品玩

AIDA 64已支持GeForce RTX 4090，首个Ada Lovelace架构GPU临近

About Joyk