12

Github Faster list/array computation expressions by dsyme · Pull Request #11592...

 3 years ago
source link: https://github.com/dotnet/fsharp/pull/11592
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Copy link

Contributor

dsyme commented 12 days ago

edited

This implements RFC FS-1099 - library support for faster computed list and array expressions and improves the performance of computed list and array expressions

This is drawing from the lessons of RFC FS-1087 and FS-1097 that we should (somewhat obviously) have a struct collector and just generate the synchronous code. This is what the implementation does - it looks for the compiled internal form of [ ... ] and [| ... |] and transforms to synchronous code (which is the original code with yield and yield! replaced by calls to the corresponding collector method).

For historical reasons back to F# 1.0 the compiled internal form of [ ... ] and [| ... |] is like this

`Seq.toList (seq { ...Seq.append/Seq.singleton/Seq.empty/...  })`

`Seq.toArray (seq { ...Seq.append/Seq.singleton/Seq.empty/...  })`

Previously we compiled the seq { ... } into a state machine, but still called Seq.toList and Seq.toArray. This was based
on thinking overly-influenced by LINQ and its focus on IEnumerable for everything. However IEnumerable is a
needless inversion of control and computationally expensive with extra allocations, MoveNext etc., - so
much so that LINQ is routinely avoided by some teams.
Instead, when ultimately producing lists and arrays, we can produce near-optimal synchronous code directly.
we should always have compiled these things in this way...

Notes:

  • This is an optimization that preserves the same semantics and execution order, so should work for all existing code. It involves a small addition to FSharp.Core documented in the RFC above. The optimization kicks in if the collectors are present in the referenced FSharp.Core.

  • One particular optimization is that, for lists, a yield! of a list in tailcall position simply stitches that list into the result without copying (AddManyAndClose on ListCollector<T>). This is valid because lists are immutable - and we already do this for List.append for example. In theory this could reduce some O(n) operations to O(1) though I doubt we'll see that in practice.

  • There is also an optimizations in ArrayCollector to avoid creating a ResizeArray for size 0,1 or 2 arrays. This is obviously a good optimization based on any reasonable model of allocation costs. However it may not be optimal and we can adjust this in the future - the relevant struct fields are internal and can be changed. It would be good to further measure the stack-space/allocation/data-copying tradeoffs and decide if it's worth extending this further.

  • I went through tests\walkthroughs\DebugStepping\TheBigFileOfDebugStepping.fsx and made some improvements to debugging of list, array and sequence expressions and checked that all the sample list/sequence expressions in that file debug OK. Specifically the location of the debug points associated with try and with and while and finally keywords is now correctly recovered from the internal form.

The perf results on micro samples are good and pretty much as expected from previous experiments with using state machines for list { ... } and array {... } comprehensions. Note that some other people have experimented with faster builders using reflection emit codegen too.

  • Raw perf of generating 0 or 1 element lists: ~4x faster
  • Raw perf of generating 6-10 element lists: ~4x faster
  • Raw perf of generating 0 or 1 element arrays: ~4x faster
  • Raw perf of generating 6-10 element arrays: ~2x faster

We don't expect any change in fixed size arrays or lists

I don't expect any cases where this will either be slower or use more stack in a signficant way compared to our old way of doing these (which is to create a sequence expression state machine and iterate).

C:\GitHub\dsyme\fsharp>artifacts\bin\fsc\Release\net472\fsc.exe --optimize a.fs && a
PERF: tinyVariableSizeBuiltin : 89
PERF: variableSizeBuiltin : 504
PERF: fixedSizeBase : 504
PERF: tinyVariableSizeBuiltin (array) : 194
PERF: variableSizeBuiltin (array) : 1251
PERF: fixedSizeBase (array) : 260

C:\GitHub\dsyme\fsharp>fsc.exe --optimize a.fs && a
PERF: tinyVariableSizeBuiltin : 356
PERF: variableSizeBuiltin : 1949
PERF: fixedSizeBase : 497
PERF: tinyVariableSizeBuiltin (array) : 717
PERF: variableSizeBuiltin (array) : 2511
PERF: fixedSizeBase (array) : 244
  • For correctness testing the existing tests we have are ok I think - we have zillions of computed list and array expressions in the test suites and compiler that return results of many different sizes.

  • Some IL code generation tests will likely fail, we'll need to update those

  • We need to check debug stepping (it should be possible to make this much improved if it's not alrready)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK