4

discussion: standard iterator interface · Discussion #54245 · golang/go · GitHub

 2 years ago
source link: https://github.com/golang/go/discussions/54245
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

This is a discussion that is intended to lead to a proposal.

This was written with lots of input from @jba and @rsc.

Background

Most languages provide a standardized way to iterate over values stored in containers using an iterator interface (see the appendix below for a discussion of other languages). Go provides for range for use with maps, slices, strings, arrays, and channels, but it does not provide any general mechanism for user-written containers, and it does not provide an iterator interface.

Go does have examples of non-generic iterators:

  • runtime.CallersFrames returns a runtime.Frames that iterates over stack frames; Frames has a Next method that returns a Frame and a bool that reports whether there are more frames.
  • bufio.Scanner is an iterator through an io.Reader, where the Scan method advances to the next value. The value is returned by a Bytes method. Errors are collected and returned by an Err method.
  • database/sql.Rows iterates through the results of a query, where the Next method advances to the next value and the value is returned by a Scan method. The Scan method can return an error.

Even this short list reveals that there are no common patterns in use today. This is in part because before generics were introduced, there was no way to write an interface that described an iterator. And of course there may be no simple pattern that will cover all of these use cases..

Today we can write an interface Iter[E] for an iterator over a container that has elements of type E. The existence of iterators in other languages shows that this is a powerful facility. This proposal is about how to write such an interface in Go.

What we want from Go iterators

Go is of course explicit about errors. Iterators over containers can't fail. For the most common uses it doesn't make any more sense to have iterators return an error than it does for a for range statement to return an error. Algorithms that use iterators should often behave differently when using iterators that can fail. Therefore, rather than try to combine non-failing and failing iterators into the same interface, we should instead return explicit errors from iterators that can fail. These errors can be part of the values returned by the iterator, or perhaps they can be returned as additional values.

Iterators have two fundamental operations: retrieve the current value, and advance to the next value. For Go we can combine these operations into a single method, as runtime.Frames does.

In the general case we may want to implement iterators with some additional state that is not trivially garbage collected, such as an open file or a separate goroutine. In C++, for example, this state would be cleared up by a destructor, but of course Go does not have destructors. Therefore, we should have some explicit way to indicate that we no longer need an iterator. This should be optional, as many iterators do not require any special cleanup. We should encourage iterators to use finalizers if necessary to clean up resources, and also to clean up after themselves when reaching the end of an iteration.

In Go the builtin type map permits values to be inserted and removed while iterating over the map, with well-defined behavior. In general for Go we should be flexible, though of course the program should never simply crash. We should let each container type define how it behaves if the container is modified while iterators are active. For example, container modification may cause arbitrary elements to be skipped or returned two or more times during the iteration. In some cases, hopefully rare, container modification may cause uses of existing iterators to panic, or to return values that have been removed from the container.

Proposal

We define a new package iter that defines a set of interfaces. The expectation is that containers and other types will provide functions and methods that return values that implement these interfaces. Code that wants to work with arbitrary containers will use the interfaces defined in this package. That will permit people to write functions that work with containers but are agnostic to the actual container type being used, much as interfaces like io.Reader permit code to be agnostic as the source of the data stream.

iter.Iter

The core interface in the iterators package is iter.Iter[E].

// Iter supports iterating over a sequence of values of type `E`.
type Iter[E any] interface {
	// Next returns the next value in the iteration if there is one,
	// and reports whether the returned value is valid.
	// Once Next returns ok==false, the iteration is over,
	// and all subsequent calls will return ok==false.
	Next() (elem E, ok bool)
}

We also define a related interface for containers, such as maps, for which elements inherently have two values.

// Iter2 is like Iter but each iteration returns a pair of values.
type Iter2[E1, E2 any] interface {
	Next() (E1, E2, bool)
}

An iterator that can fail will either return a single value that includes an error indication, or it will implement Iter2[E, error]. It's not yet clear which of those options is better.

As mentioned above, some iterators may have additional state that may be discarded when no more values are expected from an iterator (for example, a goroutine that sends values on a channel). Telling the iterator that no more values are expected is done using an optional interface that an iterator may implement.

// StopIter is an optional interface for Iter.
type StopIter[E any] interface {
	Iter[E]

	// Stop indicates that the iterator will no longer be used.
	// After a call to Stop, future calls to Next may panic.
	// Stop may be called multiple times;
	// all calls after the first will have no effect.
	Stop()
}

// StopIter2 is like StopIter, but for Iter2.
type StopIter2[E1, E2 any] interface {
	Iter2[E1, E2]
	Stop()
}

The Stop method should always be considered to be an optimization. The program should work correctly even if Stop is never called. If an iterator is read to the end (until Next returns false) calling Stop should be a no-op. If necessary, iterator implementations should use finalizers to clean up cases where Stop is not called.

As a matter of programming style, the code that calls a function to obtain a StopIter is responsible for calling the Stop method. A function that accepts an Iter should not use a type assertion to detect and call the Stop method. This is similar to the way that a function that accepts an io.Reader should not use a type assertion to detect and call the Close method.

iter.New functions

iter.Iter provides a convenient way for the users of a container to iterate over its contents. We also want to consider the other side of that operation, and provide convenient ways for containers to define iterators.

// NewGen creates a new iterator from a generator function gen.
// The gen function is called once.  It is expected to call
// yield(v) for every value v to be returned by the iterator.
// If yield(v) returns false, gen must stop calling yield and return.
func NewGen[E any](gen func(yield func(E) bool)) StopIter[E]

// NewGen2 is like NewGen for Iter2.
func NewGen2[E1, E2 any](gen func(yield func(E1, E2) bool)) StopIter2[E1, E2]

An appendix below discusses how these functions can be implemented efficiently.

Simpler containers may be able to easily capture all required state in a function.

// NewNext creates a new iterator from a next function.
// The next function is called for each call of the iterator's Next method.
func NewNext[E any](next func (E, bool)) Iter[E]

// NewNext2 is like NewNext for Iter2.
func NewNext2[E1, E2 any](next func (E1, E2, bool)) Iter2[E1, E2]

iterators for standard containers

The iter package will define iterators for the builtin container types.

// FromChan returns an iterator over a channel.
func FromChan[E any](<-chan E) Iter[E]

// FromMap returns an iterator over a map.
func FromMap[K comparable, V any](map[K]V) Iter2[K, V]

// FromSlice returns an iterator over a slice.
func FromSlice[E any]([]E) Iter[E]

Functions that accept iterators

The iter package could define functions that operate on iterators. We should be conservative here to start. It's not yet clear which of these functions will be useful.

// Map returns a new iterator whose elements are f applied to
// the elements of it.
func Map[E1, E2 any](f func(E1) E2, it Iter[E1]) Iter[E2]

// Filter returns a new iterator whose elements are those
// elements of it for which f returns true.
func Filter[E any](f func(E) bool, it Iter[E]) Iter[E]

// Reduce uses a function to reduce the elements of an
// iterator to a single value.  The init parameter is
// passed to the first call of f.  If the input iterator
// is empty, the result is init.
func Reduce[E1, E2 any](f func(E2, E1) E2, it Iter[E1], init E2) E2

// ToSlice collects the elements of the iterator into a slice.
// [ Perhaps this should be slices.FromIter. ]
func ToSlice[E any](it Iter[E]) []E

// ToMap collects the elements of the iterator into a map.
// [ Perhaps this should be maps.FromIter. ]
func ToMap[K comparable, V any](it Iter2[K, V]) map[K]V

// Concat returns the concatenation of two iterators.
// The resulting iterator returns all the elements of the
// first iterator followed by all the elements of the second.
func Concat[E any](it1, it2 Iter[E]) Iter[E]

Range loops

The for range syntax will be expanded to support iterators. Note that this is the only language change in this proposal. Everything else is library code and programming conventions.

If the argument to range implements Iter[E] or Iter2[E1, E2], then the loop will iterate through the elements of the iterator. For example, this code:

for e := range it {
	// statements
}

will be equivalent to this code:

for e, _ok := it.Next(); _ok; e, _ok = it.Next() {
	// statements
}

Here _ok is a hidden variable that is not seen by user code.

Note that breaking out of the loop will leave the iterator at that position, such that Next will return the next elements that the loop would have seen.

Using range with an Iter2[E1, E2] will permit using two variables in the for statement, as with range over a map.

Compatibility note: if the type of it is a slice, array, pointer-to-array, string, map, or channel type, then the Next method will be ignored and the for range will operate in the usual way. This is required for backward compatibility with existing code.

Because it's inconvenient to write for v := range c.Range(), we propose a further extension: we permit range c if c has a method Range that returns a value that implements Iter[E] or Iter2[E]. If the Range method implements StopIter[E] or StopIter2[E] then the range loop will ensure that Stop is called when exiting the loop. (Here whether the result implements StopIter is a static type check, not a dynamic type assertion: if the Range method returns the type Iter[E], Stop will not be called even if the actual type has a Stop method.)

For example:

for v := range c {
	// statements
}

where c.Range returns a value that implements StopIter[E], is roughly equivalent to:

_it := c.Range()
defer _it.Stop()
for e, _ok := it.Next(); _ok; e, _ok = it.Next() {
	// statements
	// Any goto L or continue L statement where L is outside the loop
	// is replaced by
	//   _it.Stop(); goto L (or continue L)
}
_it.Stop()

The compiler will arrange for _it.Stop to be called if the loop statements panic, even if some outer defer in the function recovers the panic. That is, the defer in the roughly equivalent code is run when leaving the loop, not just when leaving the function.

Note that if we adopt this change it will be the first case in which a language construct invokes a user-defined method.

That is all

That completes the proposal.

Optional future extensions

We can use optional interfaces to extend the capabilities of iterators.

For example, some iterators permit deleting an element from a container.

// DeleteIter is an Iter that implements a Delete method.
type DeleteIter[E any] interface {
	Iter[E]

	// Delete deletes the current iterator element;
	// that is, the one returned by the last call to Next.
	// Delete should panic if called before Next or after
	// Next returns false.
	Delete()
}

We could then implement

// Delete removes all elements from it which f returns true.
func Delete[E any](it DeleteIter[E], f func(E) bool)

Similarly some iterators permit setting a value.

// SetIter is an Iter that implements a Set method.
type SetIter[E any] interface {
	Iter[E]

	// Set replaces the current iterator element with v.
	// Set should panic if called before Next or after
	// Next returns false.
	Set(v E)
}

We could then implement

// Replace replaces all elements e with f(e).
func Replace[E any](it SetIter[E], f func(e) E)

Bi-directional iterators can implement a Prev method.

// PrevIter is an iterator with a Prev method.
type PrevIter[E any] interface {
	Iter[E]

	// Prev moves the iterator to the previous position.
	// After calling Prev, Next will return the value at
	// that position in the container. For example, after
	//   it.Next() returning (v, true)
	//   it.Prev()
	// another call to it.Next will again return (v, true).
	// Calling Prev before calling Next may panic.
	// Calling Prev after Next returns false will move
	// to the last element, or, if there are no elements,
	// to the iterator's initial state.
	Prev()
}

This is just a sketch of possible future directions. These ideas are not part of the current proposal. However, we want to deliberately leave open the possibility of defining additional optional interfaces for iterators.

Examples

This is some example code showing how to use and create iterators. If a function in this section is not mentioned above, then it is purely an example, and is not part of the proposal.

// ToSlice returns a slice containing all the elements in an iterator.
// [ This might be in the slices package, as slices.FromIter. ]
func ToSlice[E any](it iter.Iter[E]) []E {
	var r []E
	for v := range it {
		r = append(r, v)
	}
	return r
}

// ToSliceErr returns a slice containing all the elements
// in an iterator, for an iterator that can fail.
// The iteration stops on the first error.
// This is just an example, this may not be the best approach.
func ToSliceErr[E any](it iter.Iter2[E, error]) ([]E, error) {
	var r []E
	for v, err := range it {
		if err != nil {
			return nil, err
		}
		r = append(r, v)
	}
	return r
}

// Map returns a new iterator that applies f to each element of it.
func Map[E1, E2 any](f func(E1) E2, it Iter[E1]) Iter[E2] {
	return iter.NewNext(func() (E2, bool) {
		e, ok := it.Next()
		var r E2
		if ok {
			r = f(e)
		}
		return r, ok
	})
}

// Filter returns a new iterator that only contains the elements of it
// for which f returns true.
func Filter[E any](f func(E) bool, it Iter[E]) Iter[E] {
	return iter.NewNext(func() (E, bool) {
		for {
			e, ok := it.Next()
			if !ok || f(e) {
				return e, ok
			}
		}
	})
}

// Reduce reduces an iterator to a value using a function.
func Reduce[E1, E2 any](f func(E2, E1) E2, it Iter[E1], init E2) E2
	r := init
	for v := range it {
		r = f(r, v)
	}
	return r
}

// iter.FromSlice returns an iterator over a slice.
// For example purposes only, this iterator implements
// some of the optional interfaces mentioned earlier.
func FromSlice[E any](s []E) Iter[E] {
	return &sliceIter[E]{
		s: s,
		i: -1,
	}
}

type sliceIter[E any] struct {
	s []E
	i int
}

func (it *sliceIter[E]) Next() (E, bool) {
	it.i++
	ok := it.i >= 0 && it.i < len(it.s)
	var v E
	if ok {
		v = it.s[it.i]
	}
	return v, ok
}

// Prev implements PrevIter.
func (it *sliceIter[E]) Prev() [
	it.i—-
}

// Set implements SetIter.
func (it *sliceIter[E]) Set(v E) {
	it.s[it.i] = v
}

// FromChan returns an iterator for a channel.
func FromChan[E any](c <-chan E) Iter[E] {
	return iter.NewNext(func() (E, bool) {
		v, ok := <-c
		return v, ok
	})
}

// NewNext takes a function that returns (v, bool) and returns
// an iterator that calls the function until the second result is false.
func NewNext[E any](f func() (E, bool)) Iter[E] {
	return funcIter[E](f)
}

// funcIter is used by NewNext to implement Iter.
type funcIter[E any] func() (E, bool)

// Next implements Iter.
func (f funcIter[E]) Next() (E, bool) {
	return f()
}

// Equal reports whether two iterators have the same values
// in the same order.
func Equal[E comparable](it1, it2 Iter[E]) bool {
	for {
		v1, ok1 := it1.Next()
		v2, ok2 := it2.Next()
		if v1 != v2 || ok1 != ok2 {
			return false
		}
		if !ok1 {
			return true
		}
	}
}

// Merge takes two iterators that are expected to be in sorted order,
// and returns a new iterator that merges the two into a single
// iterator in sorted order.
func MergeIter[E constraints.Ordered](it1, it2 iter.Iter[E]) iter.Iter[E] {
	val1, ok1 := it1.Next()
	val2, ok2 := it2.Next()
	return &mergeIter[E]{
		it1:  it1,
		it2:  it2,
		val1: val1,
		ok1:  ok1,
		val2: val2,
		ok2:  ok2,
	}
}

type mergeIter[E constraints.Ordered] struct {
	it1, it2   iter.Iter[E]
	val1, val2 E
	ok1, ok2   bool
}

func (m *mergeIter[E]) Next() (E, bool) {
	var r E
	if m.ok1 && m.ok2 {
		if m.val1 < m.val2 {
			r = m.val1
			m.val1, m.ok1 = m.it1.Next()
		} else {
			r = m.val2
			m.val2, m.ok2 = m.it2.Next()
		}
		return r, true
	}
	if m.ok1 {
		r = m.val1
		m.val1, m.ok1 = m.it1.Next()
		return r, true
	}
	if m.ok2 {
		r = m.val2
		m.val2, m.ok2 = m.it2.Next()
		return r, true
	}
	return r, false
}

// Tree is a binary tree.
type Tree[E any] struct {
	val         E
	left, right *Tree
}

// Range returns an in-order iterator over the tree.
// This shows how to use iter.NewGen to iterate over a
// complex data structure.
func (t *Tree[E]) Range() iter.StopIter[E] {
	return iter.NewGen(t.iterate)
}

// iterate is used by Range.
func (t *Tree[E]) iterate(yield func(E) bool) bool {
	if t == nil {
		return true
	}
	// Keep providing values until yield returns false
	// or we have finished the tree.
	return t.left.iterate(yield) &&
		yield(t.Val) &&
		t.right.iterate(yield)
}

// SQLRowsRange returns an iterator over a sql.Rows.
// This shows one way to adapt an existing iteration
// mechanism to the new interface.
// We wouldn't design things this way from scratch.
func SQLRowsRange(r *sql.Rows) iter.StopIter[SQLRowVal] {
	it := &SQLRowsIter{r}
	runtime.SetFinalizer(it, r.Close)
	return it
}

// SQLRowsIter implements iter.Iter[SQLRowVal].
type SQLRowsIter struct {
	r *sql.Rows
}

// Next implements iter.Next.
func (it *SQLRowsIter) Next() (SQLRowVal, bool) {
	ok := it.r.Next()
	var rit SQLRowVal
	if ok {
		rit = SQLRowVal{r}
	} else {
		it.r.Close()
		runtime.SetFinalizer(it, nil)
	}
	return rit, ok
}

// Stop implements iter.StopIter.
func (it *SQLRowsIter) Stop() {
	// We don't care about the error result here.
	// It's never a new error from the close itself,
	// just a saved error from earlier.
	// If the caller cares, they should check during the loop.
	it.r.Close()
	runtime.SetFinalizer(it, nil)
}

// SQLRowVal is an iteration value.
type SQLRowVal struct {
	r *sql.Rows
}

// Err returns any error for the current row.
func (i1 SQLRowVal) Err() error {
	return i1.r.Err()
}

// Scan fetches values from the current row.
func (i1 SQLRowVal) Scan(dest ...any) error {
	return i1.r.Scan(dest...)
}

// Total is an example of how SQLRowsRange might be used.
// Note how the function uses the proposed Map and Reduce functions.
func Total(r *sql.Rows) (int, error) {
	var rowsErr error
	toInt := func(i1 SQLRowVal) int {
		if err := i1.Err(); err != nil {
			rowsErr = err
			return 0
		}
		var v int
		if err := i1.Scan(&v); err != nil {
			rowsErr = err
		}
		return v
	}
	it := SQLRowsRange(r)
	defer it.Stop()
	ints := iter.Map(toInt, it)
	r := iter.Reduce(func(v1, v2 int) int { return v1 + v2 }, ints, 0)
	// Capture an error that was not returned by any iteration, if any.
	if rowsErr == nil {
		rowsErr = r.Err()
	}
	return r, rowsErr
}

Appendix: Iterators in other languages

The C++ Standard Template Library defines a variety of iterator APIs. These are consistently implemented by C++ containers and are also used by other types such as files and streams. This makes it possible to write standard algorithms that work with all C++ containers.

C++ containers provide begin and end methods that return iterators. The begin method returns an iterator that refers to the beginning of the container. The end method returns an iterator that refers to the position just past the end of the container. Iterators to the same container may be compared for equality using the == and != operators. Any valid iterator (not the iterator returned by end) refers to a value in the container. That value is accessible using the unary * operator (which is the pointer dereference operator, thus iterators act like pointers into the container, and ordinary pointers act like iterators). The unary ++ operator advances the iterator to refer to the next element in the container. For any C++ container one can loop over all elements in the container by writing

  for (containerType::iterator p = c.begin(); p != c.end(); ++p)
    doSomething(*p);

As of C++11 this pattern is built into the language via the range-based for loop.

  for (auto&& var : container)
    doSomething(var);

This calls the begin and end methods of the container and loops as shown above.

Some C++ iterators have optional additional capabilities. Iterators can be grouped into five types.

  • Input iterators support the operations described above. They can be used to do a single sequential pass over a container. Example: an iterator that reads values from a file.
  • Output iterators permit setting a value through the iterator (*p = v), but do not permit retrieving it. Example: an iterator that writes values to a file.
  • Forward iterators support both input and output operations. Example: an iterator over a singly linked list.
  • Bidirectional iterators additionally support the unary -- operator to move to the preceding element. Example: an iterator over a doubly linked list.
  • Random access iterators additionally support adding or subtracting an integer, and getting the difference between two iterators to get the number of values between them, and comparing two iterators using < and friends, and indexing off an iterator to refer to a value. Example: an iterator over a slice (which C++ calls a vector).

C++ algorithms can use function overloading to implement the same algorithm in different ways depending on the characteristics of the iterator. For example, std::reverse, which reverses the elements in a container, can be implemented with a bidirectional iterator, but uses a more efficient algorithm when called with a random access iterator.

C++ iterators do not provide any form of error handling. Iterators over containers typically can't fail. An iterator associated with a file handles I/O errors by setting the file object into an error state, which can optionally cause an exception to be thrown.

Each C++ container type defines rules for when a modification to the container invalidates existing iterators. For example, inserting an element in a linked list does not invalidate iterators pointing to other elements, but inserting an element in a vector may invalidate them. Using an invalid iterator is undefined behavior, which can cause the program to crash or arbitrarily misbehave.

Java also supports iterators to step through containers. Java iterators are much simpler than C++ iterators. Java defines an interface Iterator<E> that has three main methods: hasNext, next, and remove. Calling next in a situation where hasNext would return false will throw an exception, and in general Java iterators throw an exception for any error. (By the way, in C++ removing an iterator from a container is generally implemented as an erase method on the container type that takes an iterator as an argument.)

A Java container will have an iterator method that returns an Iterator that walks over the elements of the container. This too is described as a Java interface: Iterable<E>.

Java has a iterator using loop syntax like that of C++11 (C++ copied the syntax from Java):

  for (elementType var : container)
    doSomething(var);

This calls the iterator method on the container and then calls hasNext and next in a loop.

As far as I know Java does not have a standard implementation of output iterators or random access iterators. Specific containers will implement iterators with an additional set method that permits changing the value to which the iterator refers.

If a Java iterator is used after a container is modified in some way that the iterator can't support, the iterator methods will throw an exception.

Python

A container will implement an __iter__ method that returns an iterator object. An iterator will implement a __next__ method that returns the next element in the container, and raises a StopIteration exception when done. Code will normally call these methods via the builtin iter and next functions.

The Python for loop supports iterators.

  for var in container:
    doSomething(var)

This calls iter and next, and handles the StopIteration exception, as one would expect.

Python iterators generally don't permit modifying the container while an iterator is being used, but it's not clear to me precisely how they behave when it happens.

Discussion

For C++ and Python, iterators are a matter of convention: any type that implements the appropriate methods can return an iterator, and an iterator itself must simply implement the appropriate methods and (for C++) operator overloads. For Java, this is less true, as iterators explicitly implement the Iterator<E> interface. The for loop in each language just calls the appropriate methods.

These conventions are powerful because they permit separating the details of an algorithm from the details of a container. As long as the container implements the iterator interface, an algorithm written in terms of iterators will work.

Iterators do not handle errors in any of these languages. This is in part because errors can be handled by throwing exceptions. But it is also because iterating over a container doesn't fail. Iteration failure is only possible when a non-container, such as a file, is accessed via the iterator interface.

It's worth noting that the C++ use of paired begin and end iterators permit a kind of sub-slicing, at least for containers that support bidirectional or random access iterators.

Appendix: Efficient implementation of iter.NewGen

The natural way to implement iter.NewGen is to use a separate goroutine and a channel. However, we know from experience that that will be inefficient due to scheduling delays. A more efficient way to implement NewGen will be to use coroutines: let the generator function produce a new value and then do a coroutine switch to the code using the iterator. When that code is ready for the next value, do a coroutine switch back to the generator function. A coroutine switch can be fast: simply change the stack pointer and reload the registers. No need to go through the scheduler.

Of course Go doesn't have coroutines, but we can use compiler optimizations to achieve the same effect without any language changes. This approach, and much of the text below, is entirely due to @rsc.

First, we identify programming idioms that provide concurrency without any opportunity for parallelism, such as a send immediately followed by a receive. Second, we adjust the compiler and runtime to recognize the non-parallel idioms and optimize them to simple coroutine switches instead of using the thread-aware goroutine scheduler.

Coroutine idioms

A coroutine switch must start another goroutine and then immediately stop the current goroutine, so that there is no opportunity for parallelism.

There are three common ways to start another goroutine: a go statement creating a new goroutine, a send on a channel where a goroutine is blocked, and a close on a channel where a goroutine is blocked.

There are three common ways to immediately stop the current goroutine: a receive of one or two values (with or without comma-ok) from a channel with no available data and a return from the top of a goroutine stack, exiting the goroutine.

Optimizations

The three common goroutine starts and three common goroutine stops combine for nine possible start-stop pairs. The compiler can recognize each pair and translate each to a call to a fused runtime operation that does both together. For example a send compiles to chansend1(c, &v) and a receive compiles to chanrecv1(c, &v). A send followed by a receive can compile to chansend1recv1(c1, &v1, c2, &v2).

The compiler fusing the operations creates the opportunity for the runtime to implement them as coroutine switches. Without the fusing, the runtime cannot tell whether the current goroutine is going to keep running on its own (in which case parallelism is warranted) or is going to stop very soon (in which case parallelism is not warranted). Fusing the operations lets the runtime correctly predict the next thing the goroutine will do.

The runtime implements each fused operation by first checking to see if the operation pair would start a new goroutine and stop the current one. If not, it falls back to running the two different operations sequentially, providing exactly the same semantics as the unfused operations. But if the operation pair does start a new goroutine and stop the current one, then the runtime can implement that as a direct switch to the new goroutine, bypassing the scheduler and any possible confusion about waking new threads (Ms) or trying to run the two goroutines in different threads for a split second.

Note that recognizing these coroutine idioms would have potential uses beyond iterators.

NewGen

Here is an implementation of iter.NewGen that takes advantage of this technique.

// NewGen creates a new iterator from a generator function gen.
// The gen function is called once.  It is expected to call
// yield(v) for every value v to be returned by the iterator.
// If yield(v) returns false, gen must stop calling yield and return.
func NewGen[E any](gen func(yield func(E) bool)) StopIter[E] {
	cmore := make(chan bool)
	cnext := make(chan E)

	generator := func() {
		// coroutine switch back to client until Next is called (1)
		var zero E
		cnext <- zero
		if !<-cmore {
			close(cnext)
			return
		}
		gen(func(v E) bool {
			// coroutine switch back to client to deliver v (2)
			cnext <- v
			return <-cmore
		})

		// coroutine switch back to client marking end (3)
		close(cnext)
	}

	// coroutine switch to start generator (4)
	go generator()
	<-cnext

	r := &genIter[E]{cnext: cnext, cmore: cmore}
	runtime.SetFinalizer(r, (*genIter[E].Stop))
	return r
}

// genIter implements Iter[E] for NewGen.
type genIter[E any] struct {
	cnext  chan E
	cmore  chan bool
	closed atomic.Bool
}

// Next implements Iter[E]
func (it *genIter[E]) Next() (E, bool) {
	// coroutine switch to generator for more (5)
	// (This panics if Stop has been called.)
	it.cmore <- true
	v, ok := <-it.cnext
	return v, ok
}

// Stop implements StopIter[E]
func (it *genIter[E]) Stop() {
	// Use the closed field to make Stop idempotent.
	if !it.closed.CompareAndSwap(false, true) {
		return
	}
	runtime.SetFinalizer(it, nil)
	// coroutine switch to generator to stop (6)
	close(it.cmore)
	<-cnext
}

The compiler would need to fuse the commented operation pairs for potential optimization by the runtime: send followed by receive (1, 2), close followed by return (3), go followed by receive (4), send followed by comma-ok receive (5), and close followed by receive (6).

You must be logged in to vote

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK