Toward a better list iterator for the kernel

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

Free trial subscription

Try LWN for free for 0 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

Linked lists are conceptually straightforward; they tend to be taught toward the beginning of entry-level data-structures classes. It might thus be surprising that the kernel community is concerned about its longstanding linked-list implementation and is not only looking for ways to solve some problems, but has been struggling to find that solution. It now appears that some improvements might be at hand: after more than 30 years, the kernel developers may have found a better way to safely iterate through a linked list.

Kernel linked lists

C, of course, makes the creation of linked lists relatively easy. What it does not do, though, is help in the creation of generic linked lists that can contain any type of structure. By its nature, C lends itself to the creation of ad hoc linked lists in every situation where they are needed, resulting in boilerplate code and duplicated definitions. Every linked-list implementation must be reviewed for correctness. It would be far nicer to have a single implementation that was known to work so that kernel developers could more profitably use their time introducing bugs into code that is truly unique to their problem area.

The kernel, naturally, has a few solutions for linked lists, but the most commonly used is struct list_head:

    struct list_head {
	struct list_head *next, *prev;
    };

This structure can be employed in the obvious way to create doubly linked lists; a portion of such a list might look like:

struct list_head can represent a linked list nicely, but has one significant disadvantage: it cannot hold any other information. Usually, this kind of data structure is needed to link some other data of interest; the list structure by itself isn't the point. C does not make it easy to create a linked list with an arbitrary payload, but it is easy to embed struct list_head inside the structure that the developer actually wants to organize into a list:

This is how linked lists are typically constructed in the kernel. Macros like container_of() can be used to turn a pointer to a list_head structure into a pointer to the containing structure. Code that works with linked lists will almost always use this macro (often indirectly) to gain access to the larger payload.

One final detail that is worthy of note is that the actual head of the list tends to be a list_head structure that is not embedded within the structure type of interest:

For a real-world example of how this infrastructure is used, consider struct inode, which represents a file within a filesystem. Inodes can be on a lot of lists simultaneously, so struct inode contains no less than five separate list_head structures; unfortunately, your editor's meager artistic skills are not up to the task of showing what the resulting data structure looks like. One of those list_head structures, i_sb_list, is used to associate the inode with the superblock of the filesystem it belongs to. The list_head structure that anchors this list is the s_inodes field of struct super_block. That is the one list_head structure in this particular list that is not embedded within an instance of struct inode.

Traversal of a linked list will typically begin at the anchor and follow the next pointers until the head is found again. One can, of course, open-code this traversal, but the kernel also provides a long list of functions and macros for this purpose. One of those is list_for_each_entry(), which will go through the entire list, providing a pointer to the containing structure at each node. Typical code using this macro looks like this:

    struct inode *inode;

    /* ... */
    list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
	/* Process each inode here */
    }
    /* Should not use the iterator here */

Within the loop, the macro uses container_of() to point inode at the containing inode structure for each list entry. The problem is: what is the value of inode on exit from the loop? If code exited the loop with a break statement, inode will point to the element under consideration at that time. If, however, execution passes through the entire list, inode will be the result of using container_of() on the separate list head, which is not contained within an inode structure. That puts the kernel deeply into undefined-behavior territory and could lead to any of a number of bad things.

For this reason, the rule for macros like list_for_each_entry() is that the iterator variable should not be used outside of the loop. If a value needs to be accessed after the loop, it should be saved in a separate variable for that purpose. It's an implicit rule, though; nobody felt the need to actually note this restriction in the documentation for the macros themselves. Unsurprisingly, this rule is thus more of a guideline at best; the kernel is full of code that does, indeed, use the iterator variable after the loop.

The search for a safer iterator

When we last looked at this issue, Jakob Koschel had posted patches fixing some of these sites; he continued this project afterward. Linus Torvalds, however, thought that this approach was inadequate because it did nothing to prevent future problems from being introduced:

So if we have the basic rule being "don't use the loop iterator after the loop has finished, because it can cause all kinds of subtle issues", then in _addition_ to fixing the existing code paths that have this issue, I really would want to (a) get a compiler warning for future cases and (b) make it not actually _work_ for future cases.
Because otherwise it will just happen again.

Along the way, the developers came to the realization that moving to a newer version of the C standard might help, since it would allow the declaration of the iterator variable within the loop itself (thus making it invisible outside of the loop). Torvalds made an initial attempt at a solution that looked like this:

    #define list_for_each_entry(pos, head, member)				\
	for (typeof(pos) __iter = list_first_entry(head, typeof(*pos), member);	\
	     !list_entry_is_head(__iter, head, member) && (((pos)=__iter),1);	\
	     __iter = list_next_entry(__iter, member))

This version of the macro still accepts the iterator variable as an argument, keeping the same prototype as before; this is important, since there are thousands of instances of this macro in the kernel source. But it declares a new variable to do the actually iteration, and only sets the passed-in iterator within the loop itself. Since the loop itself may never be executed (if the list is empty), the possibility exists that it will not set the iterator, so it could be uninitialized afterward.

This version was quickly followed by a second attempt, described as "a work of art":

    #define list_for_each_entry(pos, head, member)				\
	for (typeof(pos) pos = list_first_entry(head, typeof(*pos), member);	\
	     !list_entry_is_head(pos, head, member);				\
 	     pos = list_next_entry(pos, member))

Now the loop-scope iteration variable is declared with the same name as the outer variable, shadowing it. With this version, the iterator variable declared in the outer scope will never be used within the loop at all.

Torvalds's hope with both of these attempts was that this would cause the compiler to generate warnings if the (outer) iterator was used outside the loop, since it will no longer have been initialized by the loop itself. That did not work, though; there are places in the code that explicitly initialize the iterator now and, in any case, the "use of uninitialized variable" warning is disabled in the kernel due to excessive false positives.

James Bottomley suggested a different approach:

    #define list_for_each_entry(pos, head, member)				\
	for (pos = list_first_entry(head, typeof(*pos), member);		\
	     !list_entry_is_head(pos, head, member) && ((pos = NULL) == NULL;	\
	     pos = list_next_entry(pos, member))

This version would explicitly set the iterator variable to NULL on exit from the loop, causing any code that uses it to (presumably) fail. Torvalds pointed out the obvious problem with this attempt: it changes the semantics of a macro that is widely used throughout the kernel and would likely introduce bugs. It would also make life difficult for developers backporting patches to stable kernels that didn't have the newer semantics.

Yet another approach was proposed by Xiaomeng Tong:

    #define list_for_each_entry_inside(pos, type, head, member)		\
	for (type * pos = list_first_entry(head, type, member);		\
	     !list_entry_is_head(pos, head, member);			\
	     pos = list_next_entry(pos, member))

Tong's patch set created a new set of macros, with new names, with the idea that existing code could be converted over one usage at a time. There would be no externally declared iterator at all; instead, the name and type of the iterator are passed as arguments, and the iterator is declared within the scope of the loop itself. Torvalds, however, disliked this approach as well. Its use leads to long, difficult-to-read lines of code in almost every use and, he said, puts the pain in the wrong place: "We should strive for the *bad* cases to have to do extra work, and even there we should really strive for legibility".

A solution at last?

After having rejected various solutions, Torvalds went off to think about what a good solution might look like. Part of the problem, he concluded, is that the type of the containing structure is separate from the list_head structure, making the writing of iterator macros harder. If those two types could be joined somehow, things would be easier. Shortly thereafter, he came up with a solution that implements this idea. It starts with a new declaration macro:

     #define list_traversal_head(type,name,target_member) \
	union { struct list_head name; type *name##_traversal_type; }

This macro would be used to declare the real head of the list — not the list_head entries contained within other structures. Specifically, it declares a variable of this new union type containing a list_head structure called name, and a pointer to the containing structure type called name_traversal_type. The pointer is never used as such; it is just a way of tying the type of the containing structure to the list_head variable.

Then, there is a new iterator:

    #define list_traverse(pos, head, member) \
	for (typeof(*head##_traversal_type) pos = list_first_entry(head, typeof(*pos), member);\
	    !list_entry_is_head(pos, head, member);	\
	    pos = list_next_entry(pos, member))

Code can walk through a list by using list_traverse() instead of list_for_each_entry(). The iterator variable will be pos; it will only exist within the loop itself. The anchor of the list is passed as head, while member is the name of the list_head structure within the containing structure. The patch includes a couple of conversions to show what the usage would look like.

This, Torvalds thinks, is "the way forward". Making this change is probably a years-long project; there are over 15,000 uses of list_for_each_entry() (and variants) within the kernel. Each of those will eventually need to be changed, and the declaration of the list anchor must also change at the same time. So it is not a quick fix, but it could lead to a safer linked-list implementation in the kernel in the long run.

One might argue that all of this is self-inflicted pain caused by the continued use of C in the kernel. That may be true, but better alternatives are in somewhat short supply. For example, since the Rust language, for all of its merits, does not make life easy for anybody wanting to implement a linked list, a switch to that language would not automatically solve the problem. So kernel developers seem likely to have to get by with this kind of tricky infrastructure for some time yet.

Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion.

(Log in to post comments)

Welcome to LWN.net

Free trial subscription

Kernel linked lists

The search for a safer iterator

A solution at last?

Recommend

Ask HN: Is your company considering inflation in this year's comp review cycle?

直接检测到了婴儿期恒星系里的最大分子：二甲醚

Launch HN: Micro Meat (YC S21) – Technology for scaling cultivated meat

南通市出台跨境贸易便利化30条举措

'The more doom I see, the more I want to see it': Why we can't stop doomscrollin...

Quest 2 fitness tracking finally lands Apple Health integration

Why the 2020 census has 9 fake people in a single house

每日好价 0311

网传又有字节跳动员工被担架抬出，当事人回应来了！

Rust ❤️ C++

About Joyk