4

Github OptimizeFastIntegerForLoop for Span and ReadOnlySpan by thinkbeforecoding...

 3 years ago
source link: https://github.com/dotnet/fsharp/pull/11414
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Copy link

Contributor

Author

thinkbeforecoding commented 8 days ago

The code is now closer from what C# emits, it can be checked using ILSpy C# decompilation:


let sumArray (vs: int ReadOnlySpan) =
    let mutable sum = 0
    for i in 0 .. vs.Length-1 do
        sum <- sum + vs.[i]
    sum

was decompiled as:

public static int sumArray(ReadOnlySpan<int> vs)
{
	int num = 0;
	int num2 = 0;
	int num3 = vs.Length - 1;
	if (num3 >= num2)
	{
		do
		{
			num += vs[num2];
			num2++;
		}
		while (num2 != num3 + 1);
	}
	return num;
}

and is now decompiled as:

public static int sumArray(ReadOnlySpan<int> vs)
{
	int num = 0;
	for (int i = 0; i < vs.Length; i++)
	{
		num += vs[i];
	}
	return num;
}

This is especially visible on JIT output. The old version JITted to:

L0000: sub rsp, 0x28
L0004: mov rdx, [rcx]
L0007: mov ecx, [rcx+8]
L000a: xor eax, eax
L000c: xor r8d, r8d
L000f: lea r9d, [rcx-1]
L0013: test r9d, r9d
L0016: jl short L0032
L0018: inc r9d
L001b: cmp r8d, ecx
L001e: jae short L0037
L0020: movsxd r10, r8d
L0023: lea r10, [rdx+r10*4]
L0027: add eax, [r10]
L002a: inc r8d
L002d: cmp r8d, r9d
L0030: jne short L001b
L0032: add rsp, 0x28
L0036: ret
L0037: call 0x00007ffca43fbc70
L003c: int3

On line L0030, the code is jumping back to L001b doing there another check.

The new code JIT as:

0000: mov rax, [rcx]
L0003: mov edx, [rcx+8]
L0006: xor ecx, ecx
L0008: xor r8d, r8d
L000b: test edx, edx
L000d: jle short L0021
L000f: movsxd r9, r8d
L0012: lea r9, [rax+r9*4]
L0016: add ecx, [r9]
L0019: inc r8d
L001c: cmp r8d, edx
L001f: jl short L000f
L0021: mov eax, ecx
L0023: ret

It jumps now from L001f to L000f doing no extra comparison.

The for v in span do version also benefits from the optimization:

let sumArray' (vs: int ReadOnlySpan) =
    let mutable sum = 0
    for v in vs do
        sum <- sum + v
    sum

was decompiled to C# as:

public static int sumArray'(ReadOnlySpan<int> vs)
{
	int num = 0;
	int num2 = 0;
	int num3 = vs.Length - 1;
	if (num3 >= num2)
	{
		do
		{
			int num4 = vs[num2];
			num += num4;
			num2++;
		}
		while (num2 != num3 + 1);
	}
	return num;
}

It is now decompiled as:

public static int sumArray'(ReadOnlySpan<int> vs)
{
	int num = 0;
	foreach (int num2 in vs)
	{
		num += num2;
	}
	return num;
}

and the JIT produces the following code:

L0000: mov rax, [rcx]
L0003: mov edx, [rcx+8]
L0006: xor ecx, ecx
L0008: xor r8d, r8d
L000b: test edx, edx
L000d: jle short L0024
L000f: movsxd r9, r8d
L0012: lea r9, [rax+r9*4]
L0016: mov r9d, [r9]
L0019: add ecx, r9d
L001c: inc r8d
L001f: cmp r8d, edx
L0022: jl short L000f
L0024: mov eax, ecx
L0026: ret

instead of previously:

L0000: sub rsp, 0x28
L0004: mov rdx, [rcx]
L0007: mov ecx, [rcx+8]
L000a: xor eax, eax
L000c: xor r8d, r8d
L000f: lea r9d, [rcx-1]
L0013: test r9d, r9d
L0016: jl short L0035
L0018: inc r9d
L001b: cmp r8d, ecx
L001e: jae short L003a
L0020: movsxd r10, r8d
L0023: lea r10, [rdx+r10*4]
L0027: mov r10d, [r10]
L002a: add eax, r10d
L002d: inc r8d
L0030: cmp r8d, r9d
L0033: jne short L001b
L0035: add rsp, 0x28
L0039: ret
L003a: call 0x00007ffca43fbc70
L003f: int3

that was performing redundant bounds checks.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK