Github OptimizeFastIntegerForLoop for Span and ReadOnlySpan by thinkbeforecoding...

Copy link

Contributor

Author

thinkbeforecoding commented 8 days ago

The code is now closer from what C# emits, it can be checked using ILSpy C# decompilation:


let sumArray (vs: int ReadOnlySpan) =
    let mutable sum = 0
    for i in 0 .. vs.Length-1 do
        sum <- sum + vs.[i]
    sum

was decompiled as:

public static int sumArray(ReadOnlySpan<int> vs)
{
	int num = 0;
	int num2 = 0;
	int num3 = vs.Length - 1;
	if (num3 >= num2)
	{
		do
		{
			num += vs[num2];
			num2++;
		}
		while (num2 != num3 + 1);
	}
	return num;
}

and is now decompiled as:

public static int sumArray(ReadOnlySpan<int> vs)
{
	int num = 0;
	for (int i = 0; i < vs.Length; i++)
	{
		num += vs[i];
	}
	return num;
}

This is especially visible on JIT output. The old version JITted to:

L0000: sub rsp, 0x28
L0004: mov rdx, [rcx]
L0007: mov ecx, [rcx+8]
L000a: xor eax, eax
L000c: xor r8d, r8d
L000f: lea r9d, [rcx-1]
L0013: test r9d, r9d
L0016: jl short L0032
L0018: inc r9d
L001b: cmp r8d, ecx
L001e: jae short L0037
L0020: movsxd r10, r8d
L0023: lea r10, [rdx+r10*4]
L0027: add eax, [r10]
L002a: inc r8d
L002d: cmp r8d, r9d
L0030: jne short L001b
L0032: add rsp, 0x28
L0036: ret
L0037: call 0x00007ffca43fbc70
L003c: int3

On line L0030, the code is jumping back to L001b doing there another check.

The new code JIT as:

0000: mov rax, [rcx]
L0003: mov edx, [rcx+8]
L0006: xor ecx, ecx
L0008: xor r8d, r8d
L000b: test edx, edx
L000d: jle short L0021
L000f: movsxd r9, r8d
L0012: lea r9, [rax+r9*4]
L0016: add ecx, [r9]
L0019: inc r8d
L001c: cmp r8d, edx
L001f: jl short L000f
L0021: mov eax, ecx
L0023: ret

It jumps now from L001f to L000f doing no extra comparison.

The for v in span do version also benefits from the optimization:

let sumArray' (vs: int ReadOnlySpan) =
    let mutable sum = 0
    for v in vs do
        sum <- sum + v
    sum

was decompiled to C# as:

public static int sumArray'(ReadOnlySpan<int> vs)
{
	int num = 0;
	int num2 = 0;
	int num3 = vs.Length - 1;
	if (num3 >= num2)
	{
		do
		{
			int num4 = vs[num2];
			num += num4;
			num2++;
		}
		while (num2 != num3 + 1);
	}
	return num;
}

It is now decompiled as:

public static int sumArray'(ReadOnlySpan<int> vs)
{
	int num = 0;
	foreach (int num2 in vs)
	{
		num += num2;
	}
	return num;
}

and the JIT produces the following code:

L0000: mov rax, [rcx]
L0003: mov edx, [rcx+8]
L0006: xor ecx, ecx
L0008: xor r8d, r8d
L000b: test edx, edx
L000d: jle short L0024
L000f: movsxd r9, r8d
L0012: lea r9, [rax+r9*4]
L0016: mov r9d, [r9]
L0019: add ecx, r9d
L001c: inc r8d
L001f: cmp r8d, edx
L0022: jl short L000f
L0024: mov eax, ecx
L0026: ret

instead of previously:

L0000: sub rsp, 0x28
L0004: mov rdx, [rcx]
L0007: mov ecx, [rcx+8]
L000a: xor eax, eax
L000c: xor r8d, r8d
L000f: lea r9d, [rcx-1]
L0013: test r9d, r9d
L0016: jl short L0035
L0018: inc r9d
L001b: cmp r8d, ecx
L001e: jae short L003a
L0020: movsxd r10, r8d
L0023: lea r10, [rdx+r10*4]
L0027: mov r10d, [r10]
L002a: add eax, r10d
L002d: inc r8d
L0030: cmp r8d, r9d
L0033: jne short L001b
L0035: add rsp, 0x28
L0039: ret
L003a: call 0x00007ffca43fbc70
L003f: int3

that was performing redundant bounds checks.

thinkbeforecoding commented 8 days ago

Recommend

Apple saves 861,000 tons of precious metals by ditching iPhone chargers

教育培训行业乱象频出监管落地后将何去何从？

Two ‘trash can’ Macs form core of nerdy workstation built to last [Setups]

一条推特引发的暴跌：市场的惶恐亟须合规这剂“猛药”

Elago R4 Retro review

Implementing a simple WebRTC signaling mechanism with FSharp, Fable, and Ably

Happy F# Day!

华为开始「卖车」，续航 1000 公里的赛力斯 SF5 发布

女车主站特斯拉车顶高呼"刹车失灵"维权多地车主反馈质量问题 | GPLP犀牛财...

盒马集市入驻微信小程序，阿里低头了？

About Joyk