9

redefining for loop variable semantics · Discussion #56010 · golang/go · GitHub

 1 year ago
source link: https://github.com/golang/go/discussions/56010
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

redefining for loop variable semantics · Discussion #56010 · golang/go · GitHub

We have been looking at what to do about the for loop variable problem (#20733), gathering data about what a change would mean and how we might deploy it. This discussion aims to gather early feedback about this idea, to understand concerns and aspects we have not yet considered. Thanks for keeping this discussion respectful and productive!

To recap #20733 briefly, the problem is that loops like this one don’t do what they look like they do:

var all []*Item
for _, item := range items {
	all = append(all, &item)
}

That is, this code has a bug. After this loop executes, all contains len(items) identical pointers, each pointing at the same Item, holding the last value iterated over. This happens because the item variable is per-loop, not per-iteration: &item is the same on every iteration, and item is overwritten on each iteration. The usual fix is to write this instead:

var all []*Item
for _, item := range items {
	item := item
	all = append(all, &item)
}

This bug also often happens in code with closures that capture the address of item implicitly, like:

var prints []func()
for _, v := range []int{1, 2, 3} {
	prints = append(prints, func() { fmt.Println(v) })
}
for _, print := range prints {
	print()
}

This code prints 3, 3, 3, because all the closures print the same v, and at the end of the loop, v is set to 3. Note that there is no explicit &v to signal a potential problem. Again the fix is the same: add v := v.

Goroutines are also often involved, although as these examples show, they need not be. See also the Go FAQ entry.

We have talked for a long time about redefining these semantics, to make loop variables per-iteration instead of per-loop. That is, the change would effectively be to add an implicit “x := x” at the start of every loop body for each iteration variable x, just like people do manually today. Making this change would remove the bugs from the programs above.

In the Go 2 transitions document we gave the general rule that language redefinitions like what I just described are not permitted. I believe that is the right general rule, but I have come to also believe that the for loop variable case is strong enough to motivate a one-time exception to that rule. Loop variables being per-loop instead of per-iteration is the only design decision I know of in Go that makes programs incorrect more often than it makes them correct. Since it is the only such design decision, I do not see any plausible candidates for additional exceptions.

To make the breakage completely user controlled, the way the rollout would work is to change the semantics based on the go line in each package’s go.mod file, the same line we already use for enabling language features (you can only use generics in packages whose go.mod says “go 1.18” or later). Just this once, we would use the line for changing semantics instead of for adding a feature or removing a feature.

If we hypothetically made the change in go 1.30, then modules that say “go 1.30” or later get the per-iteration variables, while modules with earlier versions get the per-loop variables:

Code in modules that say go 1.30 gets per-iteration variable semantics; code in modules that say earlier Go versions gets per-loop semantics.

In a given code base, the change would be “gradual” in the sense that each module can update to the new semantics independently, avoiding a bifurcation of the ecosystem.

The specific semantics of the redefinition would be that both range loops and three-clause for loops get per-iteration variables. So in addition to the program above being fixed, this one would be fixed too:

var prints []func()
for i := 1; i <= 3; i++ {
	prints = append(prints, func() { fmt.Println(i) })
}
for _, print := range prints {
	print()
}

In the 3-clause form, the start of the iteration body copies the per-loop i into a per-iteration i, and then the end of the body (or any continue statement) copies the current value of the per-iteration i back to the per-loop i. Unless a variable is captured like in the above example, nothing changes about how the loop executes.

Adjusting the 3-clause form may seem strange to C programmers, but the same capture problems that happen in range loops also happen in three-clause for loops. Changing both forms eliminates that bug from the entire language, not just one place, and it keeps the loops consistent in their variable semantics. That consistency means that if you change a loop from using range to using a 3-clause form or vice versa, you only have to think about whether the iteration visits the same items, not whether a subtle change in variable semantics will break your code. It is also worth noting that JavaScript is using per-iteration semantics for 3-clause for loops using let, with no problems.

I think the semantics are a smaller issue than the idea of making this one-time gradual breaking change. I’ve posted this discussion to gather early feedback on the idea of making a change here at all, because that’s something we’ve previously treated as off the table.

I’ve outlined the reasons I believe this case merits an exception below. I’m hoping this discussion can surface concerns, good ideas, and other feedback about the idea of making the change at all (not as much the semantics).

I know that C# 5 made this change as well, but I’ve been unable to find any retrospectives about how it was rolled out or how it went. If anyone knows more about how the C# transition went or has links to that information, please post that too. Thanks!

The case for making the change:

A decade of experience shows the cost of the current semantics

I talked at Gophercon once about how we need agreement about the existence of a problem before we move on to solutions. When we examined this issue in the run up to Go 1, it did not seem like enough of a problem. The general consensus was that it was annoying but not worth changing.

Since then, I suspect every Go programmer in the world has made this mistake in one program or another. I certainly have done it repeatedly over the past decade, despite being the one who argued for the current semantics and then implemented them. (Sorry!)

The current cures for this problem are worse than the disease.

I ran a program to process the git logs of the top 14k modules, from about 12k git repos and looked for commits with diff hunks that were entirely “x := x” lines being added. I found about 600 such commits. On close inspection, approximately half of the changes were unnecessary, done probably either at the insistence of inaccurate static analysis, confusion about the semantics, or an abundance of caution. Perhaps the most striking was this pair of changes from different projects:

 	for _, informer := range c.informerMap {
+		informer := informer
 		go informer.Run(stopCh)
 	}
 	for _, a := range alarms {
+		a := a
 		go a.Monitor(b)
 	}

One of these two changes is unnecessary and the other is a real bug fix, but you can’t tell which is which without more context. (In one, the loop variable is an interface value, and copying it has no effect; in the other, the loop variable is a struct, and the method takes a pointer receiver, so copying it ensures that the receiver is a different pointer on each iteration.)

And then there are changes like this one, which is unnecessary regardless of context (there is no opportunity for hidden address-taking):

 	for _, scheme := range artifact.Schemes {
+		scheme := scheme
 		Runtime.artifactByScheme[scheme.ID] = id
 		Runtime.schemesByID[scheme.ID] = scheme
 	}

This kind of confusion and ambiguity is the exact opposite of the readability we are aiming for in Go.

People are clearly having enough trouble with the current semantics that they choose overly conservative tools and adding “x := x” lines by rote in situations not flagged by tools, preferring that to debugging actual problems. This is an entirely rational choice, but it is also an indictment of the current semantics.

We’ve also seen production problems caused in part by these semantics, both inside Google and at other companies (for example, this problem at Let’s Encrypt). It seems likely to me that, world-wide, the current semantics have easily cost many millions of dollars in wasted developer time and production outages.

Old code is unaffected, compiling exactly as before

The go lines in go.mod give us a way to guarantee that all old code is unaffected, even in a build that also contains new code. Only when you change your go.mod line do the packages in that module get the new semantics, and you control that. In general this one reason is not sufficient, as laid out in the Go 2 transitions document. But it is a key property that contributes to the overall rationale, with all the other reasons added in.

Changing the semantics is usually a no-op, and when it’s not, it fixes buggy code far more often than it breaks correct code

We built a toolchain with the change and tested a subset of Google’s Go tests and analyzed the resulting failures. The rate of new test failures was approximately 1 in 2,000, but nearly all were previously undiagnosed actual bugs. The rate of spurious test failures (correct code actually broken by the change) was 1 in 50,000.

To start, there were only 58 failures out of approximately 100,000 tests executed, covering approximately 1.3M for loops. Of the failures, 36 (62%) were tests not testing what they looked like they tested because of bad interactions with t.Parallel: the new semantics made the tests actually run correctly, and then the tests failed because they found actual latent bugs in the code under test. The next most common mistake was appending &v on each iteration to a slice, which makes a slice of N identical pointers. The rest were other kinds of bugs canceling out to make tests pass incorrectly. We found only 2 instances out of the 58 where code correctly depended on per-loop semantics and was actually broken by the change. One involved a handler registered using once.Do that needed access to the current iteration’s values on each invocation. The other involved low-level code running in a context when allocation is disallowed, and the variable escaped the loop (but not the function), so that the old semantics did not allocate while the new semantics did. Both were easily adjusted.

Of course, there is always the possibility that Google’s tests may not be representative of the overall ecosystem’s tests in various ways, and perhaps this is one of them. But there is no indication from this analysis of any common idiom at all where per-loop semantics are required. The git log analysis points in the same direction: parts of the ecosystem are adopting tools with very high false positive rates and doing what the tools say, with no apparent problems.

There is also the possibility that while there’s no semantic change, existing loops would, when updated to the new Go version, allocate one variable per iteration instead of once per loop. This problem would show up in memory profiles and is far easier to track down than the silent corruption we get when things go wrong with today’s semantics. Benchmarking of the public “bent” bench suite showed no statistically significant performance difference over all, so we expect most programs to be unaffected.

Good tooling can help users identify exactly the loops that need the most scrutiny during the transition

Our experience analyzing the failures in Google’s Go tests shows that we can use compiler instrumentation (adjusted -m output) to identify loops that may be compiling differently, because the compiler thinks the loop variables escape. Almost all the time, this identifies a very small number of loops, and one of those loops is right next to the failure. That experience can be wrapped up into a good tool for directing any debugging sessions.

Another possibility is a compilation mode where the compiled code consults an array of bits to decide during execution whether each loop gets old or new semantics. Package testing could provide a mode that implements binary search on that array to identify exactly which loops cause a test to fail. So if a test fails, you run the “loop finding mode” and then it tells you: “applying the semantic change to these specific loops causes the failure”. All the others are fine.

Static analysis is not a viable alternative

Whether a particular loop is “buggy” due to the current behavior depends on whether the address of an iteration value is taken and then that pointer is used after the next iteration begins. It is impossible in general for analyzers to see where the pointer lands and what will happen to it. In particular, analyzers cannot see clearly through interface method calls or indirect function calls. Different tools have made different approximations. Vet recognizes a few definitely bad patterns, and we are adding a new one checking for mistakes using t.Parallel in Go 1.20. To avoid false positives, it also has many false negatives. Other checkers in the ecosystem err in the other direction. The commit log analysis showed some checkers were producing over 90% false positive rates in real code bases. (That is, when the checker was added to the code base, the “corrections” submitted at the same time were not fixing actual problems over 90% of the time in some commits.)

There is no perfect way to catch these bugs statically. Changing the semantics, on the other hand, eliminates them all.

Changing loop syntax entirely would cause unnecessary churn

We have talked in the past about introducing a different syntax for loops (for example, #24282), and then giving the new syntax the new semantics while deprecating the current syntax. Ultimately this would cause a very significant amount of churn disproportionate to the benefit: the vast majority of existing loops are correct and do not need any fixes. It seems like an extreme response to force an edit of every for loop that exists today while invalidating all existing documentation and then having two different for loops that Go programmers need to understand, especially compared to changing the semantics to match what people overwhelmingly expect when they write the code.

My goal for this discussion is to gather early feedback on the idea of making a change here at all, because that’s something we’ve previously treated as off the table, as well as any feedback on expected impact and what would help users most in a roll-out strategy. Thanks!

You must be logged in to vote

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK