proposal: add package for using SIMD instructions · Issue #53171 · golang/go ·...
source link: https://github.com/golang/go/issues/53171
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Comments
mpldr commented 16 days ago •
SIMD has the potential to greatly increase performance of a lot of data processing applications. Since #35307 was closed with the remark
Generics are now available, so I want to use this opportunity to necrobump and suggest a |
As for what the API might look like, the following yenc encoder may provide some insight.
package
package
package Equivalent Go-Code would look something like this (assuming abovementioned API design)
I think this should be part of the standard library mainly for two reasons:
|
Contributor
ianlancetaylor commented 16 days ago
The difficulty with a general purpose approach to SIMD, which is what you are suggesting, is that the performance can be dramatically different on different processors. Also, for specific processors, performance is not optimal as not all special purpose instructions are available. (The difficulty with a processor-specific approach to SIMD is that you have to write different code for each processor.) (As a side note, I don't see any reason to have a package like simd/sse2 in your description. Instead, we would arrange to use the appropriate implementation when building the simd package.) |
Contributor
Zheng-Xu commented 16 days ago
I hope the proposal can support different architectures. Arm SVE has a feature called (VLA)Vector Length Agnostics. Different H/W may implement the vector size differently. See: a sneak peek into sve and vla programming On the API level, it is better that we can decide the vector length at runtime instead of hard coding it as 8x64. Reference: OpenJDK Vector API On the compiler side, so far Go ABI needs the frame size to be decided at compile time, which doesn't support VLA programming very well yet. |
Author
mpldr commented 16 days ago
That might not be as easy as it sounds since that would make binaries potentially less portable. Adding a fallback to use in case a specific instructionset is not supported would help in offsetting this. (Yes, I'm aware that this is also not as simple as it sounds. |
I think we should provide different packages for different instruction sets . |
Contributor
ianlancetaylor commented 15 days ago
@mpldr We already have a mechanism for making binaries more or less portable: the |
inkeliz commented 15 days ago •
It's not the same of writing assembly file for each CPU? I think that defeats the purpose of SIMD package. I think the Zig approach is better (and I think Rust is similar). For instance, Zig provides one |
Author
mpldr commented 15 days ago
I think the Zig approach is better (and I think Rust is similar). For instance, Zig [provides one ***@***.***`](https://ziglang.org/documentation/master/#Vectors) function which works on any CPU, can [read more about that here](https://zig.news/michalz/fast-multi-platform-simd-math-library-in-zig-2adn). That makes the code as portable as non-simd version.
I also like the approach of just allowing the various operations to be
applied to it, but this would require a language change. (Also adding
60-something new types to the language seems rather contrary to "the Go
spirit")
|
This seems to make the code more portable, and I support it. |
beoran commented 12 days ago
Indeed, a vectorize package that is portable and that uses the CPU relevant instructions would be the best solution. On platforms that do not have such instructions, the default implementation could still be useful for portability and optimized for performance. |
The current consensus on portability is to use appropriate implementations when building SIMD packages, which I support. |
Contributor
rsc commented 2 days ago
I came across https://github.com/google/highway a few weeks ago. Is there anything we can learn from that about portable API for SIMD? |
Contributor
rsc commented 2 days ago
This proposal has been added to the active column of the proposals project |
Overall, as mentioned by people a platform independent implementation if simd is at best a half implementation. While some of the aspects are platform independent, and it may be possible to port a fraction, there is simply too much difference between platforms to make anything that would be genuinely useful. While it is "neat" Falling back to Go implementation of SIMD would in many cases be much worse than straight up Go, and overall design of this will just slow down the availability. Example: MPSADBW -if there is no HW support, the fallback will be horrible. SIMD intrinsics should be able to live alongside Go code. The compiler controls register use. This will allow using simd and other instructions without the now forced function call overhead. SIMD should be guarded by platform tags. SIMD types should be available for all platforms, but the functions shouldn't be abstracted, and should match underlying instructions, maybe with some compound functions. Here are some of my previous observations when looking at intrinsics for Go: Feature detectionThere is a huge number of individual features. Compile-time feature specification will always just select a very low common denominator, and It should be easy (maybe looking at imports), which features to check for. The detection should be part of the intrinsic offering. A quite tricky thing is that some instructions contains several "forms", but with different features. For example ANDPS xmm1, xmm2 (SSE) also has a non-destructive 3 register version Data typesThe tricky part about data types is that it will often require type conversion, or be untyped. Data loaded as a Some intrinsics has no clear type. For example I don't have a ready-to-go solution, but having to copy input from a []byte to a []float32 or vice versa must be avoidable. The compiler cannot enforce forced constant values. Take PSHUFD, which has an 8 bit immediate value. This must be resolvable at compile time. With current function definitions that isn't really doable, so some handling would be needed for this. Pointer arguments are a little tricky. There aren't that many, but prefetch instructions and gather/scatter and of course loading and saving. Edit: Final word on portable SIMD: I am not against it, but Go should supply the tools to write portable SIMD packages, that cover the feasible subsets. |
beoran commented 15 hours ago •
@rsc Highway seems to be a good idea. Since Go now has generics, and Highway is Apache licensed, is there any reason why someone interested could not port it to Go? That would be the first step, I think. @klauspost It sounds more like you want to use inline SIMD assembly than have a portable vector API. While the go compiler already supports assembly in separate files, I don't think inline SIMD assembly in Go is a good idea. Edit: a third approach would be for the Go compiler to optimize certain array and float operations using SIMD if possible. This should be documented then, though. |
@beoran That is the title of the proposal. While portable solutions and automatic vectoring are neat, it only provides a band-aid solution, with quite limited usability. |
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK