If Conversion Within .NET - Part 3
source link: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/if-conversion-within-dotnet-part-3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
If Conversion Within .NET - Part 3
This is the final part of the blog series on If Conversion in .NET. In part 1 we walked through how .NET optimizes Boolean operations, then in part 2 went further into the details within the Lowering phase. This part walks through If Conversion phase, building on the knowledge we learnt previously. Finally, we will look at some additional variations. The examples continue to use the same example code from the first part.
If Conversion Within .NET - Part 1 If Conversion Within .NET - Part 2
If Conversion
In the previous posts we looked at reducing all the branches down to a chain of compares ending in a single compare and branch. With If conversion, we aim to remove this final compare and branch.
This is done using a CSEL
(conditional select) Arm64 instruction. Consider:
CSEL x0, x1, x2, lt
This says if the condition flags for lt are set, then set x0
to x1
, otherwise set x0
to x2
. It is simply a MOV
with a source conditionally selected. This can easily be mapped into an If(...) {x=a;} else {x=b;}
statement. Alternatively, an If(...) {x=a;}
statement can be modelled by reusing the destination register:
CSEL x0, x1, x0, lt
If Conversion phase
Going back to our original program, we now enable the optimization, if conversion:
export DOTNET_JitDump=TestAndOrWithElse export DOTNET_TieredCompilation=0 unset DOTNET_JitDoIfConversion
After the optimize bools phase, the IR blocks look like:
------------ BB01 [000..00D) -> BB05 (cond), preds={} succs={BB04,BB05} ***** BB01 STMT00001 ( 0x009[E-] ... 0x00B ) N016 ( 32, 22) [000007] ----------- * JTRUE void $VN.Void N015 ( 30, 20) [000029] ----------- \--* NE int N013 ( 25, 17) [000027] -------N--- +--* AND int N009 ( 18, 12) [000026] ----------- | +--* EQ int N007 ( 13, 9) [000024] -------N--- | | +--* AND int N003 ( 6, 4) [000002] N------N-U- | | | +--* GT int N001 ( 1, 1) [000000] ----------- | | | | +--* LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] ----------- | | | | \--* CNS_INT int 3 $43 N006 ( 6, 4) [000018] -------N--- | | | \--* EQ int $101 N004 ( 1, 1) [000016] ----------- | | | +--* LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] ----------- | | | \--* CNS_INT int 10 $44 N008 ( 1, 2) [000025] ----------- | | \--* CNS_INT int 0 N012 ( 6, 4) [000006] N------N-U- | \--* GT int $102 N010 ( 1, 1) [000004] ----------- | +--* LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] ----------- | \--* CNS_INT int 4 $45 N014 ( 1, 2) [000028] ----------- \--* CNS_INT int 0 ------------ BB04 [00D..013) -> BB06 (always), preds={BB01} succs={BB06} ***** BB04 STMT00005 ( 0x00D[E-] ... 0x00F ) N002 ( 1, 3) [000015] DA--------- * STORE_LCL_VAR int V00 arg0 d:3 $VN.Void N001 ( 1, 2) [000014] ----------- \--* CNS_INT int 9 $47 ------------ BB05 [013..017), preds={BB01} succs={BB06} ***** BB05 STMT00002 ( 0x013[E-] ... 0x015 ) N002 ( 1, 3) [000009] DA--------- * STORE_LCL_VAR int V00 arg0 d:4 $VN.Void N001 ( 1, 2) [000008] ----------- \--* CNS_INT int 12 $46 ------------ BB06 [017..01F) (return), preds={BB04,BB05} succs={} ***** BB06 STMT00007 ( ??? ... ??? ) N004 ( 0, 0) [000021] DA--------- * STORE_LCL_VAR int V00 arg0 d:2 $VN.Void N003 ( 0, 0) [000020] ----------- \--* PHI int $140 N001 ( 0, 0) [000023] ----------- pred BB05 +--* PHI_ARG int V00 arg0 u:4 $46 N002 ( 0, 0) [000022] ----------- pred BB04 \--* PHI_ARG int V00 arg0 u:3 $47 ***** BB06 STMT00003 ( 0x017[E-] ... 0x01E ) N003 ( 16, 6) [000012] --CXG------ * CALL void CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void N001 ( 1, 1) [000010] ----------- arg0 in x0 +--* LCL_VAR int V00 arg0 u:2 (last use) $140 N002 ( 1, 1) [000011] ----------- arg1 in x1 \--* LCL_VAR int V01 arg1 u:1 (last use) $81 ***** BB06 STMT00004 ( 0x01E[E-] ... ??? ) N001 ( 0, 0) [000013] ----------- * RETURN void $VN.Void
For the purposes of this optimization, we care about BB01
(the if
statement) - mostly just the JTRUE
and NE
, plus BB04
(the then
statement) and BB05
(the else
statement).
Next, the If Conversion phase starts. Like optimize bools, it iterates through each block in turn.
Starting with BB01
, it checks the last statement is a JTRUE
block with a compare child. Then an If/then
or If/then/else
flow is checked for:
Example if/then
flow:
BB01
(if statement): conditionally jumps toBB03
on failure, otherwise falls through toBB02.
BB02
(then statement): always falls through toBB03.
BB03
(rest of function)
Example if/then/else
flow:
BB01
(if statement): conditionally jumps toBB03
on failure, otherwise falls through toBB02.
BB02
(then statement): always jumps toBB04.
BB03
(else statement): always falls through toBB04.
BB04
(rest of function)
The exact BB number here is irrelevant - an If
could start with BB17
. In addition, the then
statement could be comprised of multiple sequential blocks, the same for the else
block. This would happen if the code inside the then
or else
section comprised of more than one expression.
For our test case, an if/then/else
flow is matched.
Next the contents of the Then
and Else
blocks are checked. Any NOP
statements (those just containing a no operation (NOP
) node) are ignored. Otherwise, the block must contain a single statement ending in a STORE_LCL_VAR
(store local variable). In addition, there must be no side effects, it must be of integer type and cannot include PHI
nodes.
In addition, if this is an if/then/else
flow, then RETURN
(return) nodes are allowed. In all if/then/else
flows, the ASG
in both sides must update the same variable or both must return.
This means the following patterns are allowed:
if (...) { x=...; }
if (...) { x=...; } else { x=...; }
if (...) { return ...; } else { return ...; }
x = (...) ? ... : ...
return (...) ? ... : ...
And the following are rejected:
if (...) { x=...; y=....; }
if (...) { x=...; } else { y=...; }
if (...) { return ...; }
if (...) { x=...; } else { return ...; }
if (...) { return ...; } else { x=...; }
In all the allowed cases, this maps down to a single CSEL
instruction.
There may be cases where is optimal to use CSEL
that are currently rejected. This is left as an exercise for the reader to implement without overly complicating RyuJIT.
Once the conditions have been checked, transformation can begin. To fix control flow the Then
and Else
conditions are moved outside of the loop into the first block. The three statements now in the first block (If
, Then
and Else
) are combined into a single statement using a SELECT
node. This node takes in three arguments and is similar to a CSEL
instruction. The first argument is a condition that sets and checks flags. The second argument is the result to use if the condition passes. The third argument is the result to use if the condition fails. The parent of the CSEL
is set to the root of the Then
statement, and the root of the Else
(which is identical to the root of the Then
) is deleted.
In the example, two STORE_LCL_VAR
s are moved into the first block and the block now always falls through to BB06
. Then the code creates a SELECT
. The NE
child of the JTRUE
, the if
condition, is added as the first child. The childless JTRUE
is discarded and replaced with a NOP
. The children of the two STORE_LCL_VAR
s become the other two children of the SELECT
and the SELECT
node becomes the child of the first STORE_LCL_VAR
. The second STORE_LCL_VAR
can be discarded. That is a lot of shuffling. Afterwards this looks like:
------------ BB01 [000..00D) -> BB05 (always), preds={} succs={BB05} ***** BB01 STMT00001 ( 0x009[E-] ... 0x00B ) N001 ( 0, 0) [000007] ----------- * NOP void ***** BB01 STMT00005 ( 0x00D[E-] ... 0x00F ) N019 ( 33, 25) [000015] DA--------- * STORE_LCL_VAR int V00 arg0 d:3 $VN.Void N018 ( 33, 25) [000030] ----------- \--* SELECT int N015 ( 30, 20) [000029] ----------- +--* NE int N013 ( 25, 17) [000027] -------N--- | +--* AND int N009 ( 18, 12) [000026] ----------- | | +--* EQ int N007 ( 13, 9) [000024] -------N--- | | | +--* AND int N003 ( 6, 4) [000002] N------N-U- | | | | +--* GT int N001 ( 1, 1) [000000] ----------- | | | | | +--* LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] ----------- | | | | | \--* CNS_INT int 3 $43 N006 ( 6, 4) [000018] -------N--- | | | | \--* EQ int $101 N004 ( 1, 1) [000016] ----------- | | | | +--* LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] ----------- | | | | \--* CNS_INT int 10 $44 N008 ( 1, 2) [000025] ----------- | | | \--* CNS_INT int 0 N012 ( 6, 4) [000006] N------N-U- | | \--* GT int $102 N010 ( 1, 1) [000004] ----------- | | +--* LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] ----------- | | \--* CNS_INT int 4 $45 N014 ( 1, 2) [000028] ----------- | \--* CNS_INT int 0 N016 ( 1, 2) [000008] ----------- +--* CNS_INT int 12 $46 N017 ( 1, 2) [000014] ----------- \--* CNS_INT int 9 $47 ***** BB01 STMT00002 ( 0x013[E-] ... 0x015 ) N001 ( 0, 0) [000009] ----------- * NOP void ------------ BB04 [00D..013) -> BB06 (always), preds={} succs={BB06} ------------ BB05 [013..017), preds={BB01} succs={BB06}
BB04
and BB05
are now empty and no code flows into them. This will be spotted by the sanity checks that automatically run after every phase. The checks will first delete these empty blocks. Then they will spot that there is no requirement for BB06
to be a separate block (as only BB01
flows into it and does so unconditionally). The two blocks will be merged. This leaves a single block, split into six statements, for the entire function:
***** BB01 STMT00007 ( ??? ... ??? ) N004 ( 0, 0) [000021] DA--------- * STORE_LCL_VAR int V00 arg0 d:2 $VN.Void N003 ( 0, 0) [000020] ----------- \--* PHI int $140 N001 ( 0, 0) [000023] ----------- pred BB05 +--* PHI_ARG int V00 arg0 u:4 $46 N002 ( 0, 0) [000022] ----------- pred BB04 \--* PHI_ARG int V00 arg0 u:3 $47 ***** BB01 STMT00001 ( 0x009[E-] ... 0x00B ) N001 ( 0, 0) [000007] ----------- * NOP void ***** BB01 STMT00005 ( 0x00D[E-] ... 0x00F ) N019 ( 33, 25) [000015] DA--------- * STORE_LCL_VAR int V00 arg0 d:3 $VN.Void N018 ( 33, 25) [000030] ----------- \--* SELECT int N015 ( 30, 20) [000029] ----------- +--* NE int N013 ( 25, 17) [000027] -------N--- | +--* AND int N009 ( 18, 12) [000026] ----------- | | +--* EQ int N007 ( 13, 9) [000024] -------N--- | | | +--* AND int N003 ( 6, 4) [000002] N------N-U- | | | | +--* GT int N001 ( 1, 1) [000000] ----------- | | | | | +--* LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] ----------- | | | | | \--* CNS_INT int 3 $43 N006 ( 6, 4) [000018] -------N--- | | | | \--* EQ int $101 N004 ( 1, 1) [000016] ----------- | | | | +--* LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] ----------- | | | | \--* CNS_INT int 10 $44 N008 ( 1, 2) [000025] ----------- | | | \--* CNS_INT int 0 N012 ( 6, 4) [000006] N------N-U- | | \--* GT int $102 N010 ( 1, 1) [000004] ----------- | | +--* LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] ----------- | | \--* CNS_INT int 4 $45 N014 ( 1, 2) [000028] ----------- | \--* CNS_INT int 0 N016 ( 1, 2) [000008] ----------- +--* CNS_INT int 12 $46 N017 ( 1, 2) [000014] ----------- \--* CNS_INT int 9 $47 ***** BB01 STMT00002 ( 0x013[E-] ... 0x015 ) N001 ( 0, 0) [000009] ----------- * NOP void ***** BB01 STMT00003 ( 0x017[E-] ... 0x01E ) N003 ( 16, 6) [000012] --CXG------ * CALL void CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void N001 ( 1, 1) [000010] ----------- arg0 in x0 +--* LCL_VAR int V00 arg0 u:2 (last use) $140 N002 ( 1, 1) [000011] ----------- arg1 in x1 \--* LCL_VAR int V01 arg1 u:1 (last use) $81 ***** BB01 STMT00004 ( 0x01E[E-] ... ??? ) N001 ( 0, 0) [000013] ----------- * RETURN void $VN.Void
Note how the PHIs from BB06
have been moved to the start of the single block. There is no branching control flow remaining, so these will be optimized away when the IR is rationalized to LIR.
The rationalized LIR looks as would be expected:
------------ BB01 [000..01F) (return), preds={} succs={} [000031] ----------- IL_OFFSET void INLRT @ 0x009[E-] N001 ( 0, 0) [000007] ----------- NOP void [000032] ----------- IL_OFFSET void INLRT @ 0x00D[E-] N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] ----------- t1 = CNS_INT int 3 $43 /--* t0 int +--* t1 int N003 ( 6, 4) [000002] N------N-U- t2 = * GT int N004 ( 1, 1) [000016] ----------- t16 = LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] ----------- t17 = CNS_INT int 10 $44 /--* t16 int +--* t17 int N006 ( 6, 4) [000018] -------N--- t18 = * EQ int $101 /--* t2 int +--* t18 int N007 ( 13, 9) [000024] -------N--- t24 = * AND int N008 ( 1, 2) [000025] ----------- t25 = CNS_INT int 0 /--* t24 int +--* t25 int N009 ( 18, 12) [000026] ----------- t26 = * EQ int N010 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] ----------- t5 = CNS_INT int 4 $45 /--* t4 int +--* t5 int N012 ( 6, 4) [000006] N------N-U- t6 = * GT int $102 /--* t26 int +--* t6 int N013 ( 25, 17) [000027] -------N--- t27 = * AND int N014 ( 1, 2) [000028] ----------- t28 = CNS_INT int 0 /--* t27 int +--* t28 int N015 ( 30, 20) [000029] ----------- t29 = * NE int N016 ( 1, 2) [000008] ----------- t8 = CNS_INT int 12 $46 N017 ( 1, 2) [000014] ----------- t14 = CNS_INT int 9 $47 /--* t29 int +--* t8 int +--* t14 int N018 ( 33, 25) [000030] ----------- t30 = * SELECT int /--* t30 int N019 ( 33, 25) [000015] DA--------- * STORE_LCL_VAR int V00 arg0 d:3 $VN.Void [000033] ----------- IL_OFFSET void INLRT @ 0x013[E-] N001 ( 0, 0) [000009] ----------- NOP void [000034] ----------- IL_OFFSET void INLRT @ 0x017[E-] N001 ( 1, 1) [000010] ----------- t10 = LCL_VAR int V00 arg0 u:2 (last use) $140 N002 ( 1, 1) [000011] ----------- t11 = LCL_VAR int V01 arg1 u:1 (last use) $81 /--* t10 int arg0 in x0 +--* t11 int arg1 in x1 N003 ( 16, 6) [000012] --CXG------ * CALL void CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void [000035] ----------- IL_OFFSET void INLRT @ 0x01E[E-] N001 ( 0, 0) [000013] ----------- RETURN void $VN.Void
During lowering the compare nodes are all lowered to a CMP/CCMP/SETCC
chain of nodes as described previously. Note that the first input to the SELECT
(the if condition) is now the SETCC
which produces the result of the entire if statement.
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] -c--------- t1 = CNS_INT int 3 $43 N004 ( 1, 1) [000016] ----------- t16 = LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] -c--------- t17 = CNS_INT int 10 $44 N010 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] -c--------- t5 = CNS_INT int 4 $45 /--* t0 int +--* t1 int N003 ( 6, 4) [000002] -------N-U- * CMP void /--* t16 int +--* t17 int N006 ( 6, 4) [000018] -------N--- * CCMP void cond=UGT flags=0 /--* t4 int +--* t5 int N012 ( 6, 4) [000006] -------N-U- * CCMP void cond=UNE flags=z N013 ( 25, 17) [000027] -------N--- t27 = SETCC int cond=UGT N016 ( 1, 2) [000008] ----------- t8 = CNS_INT int 12 $46 N017 ( 1, 2) [000014] ----------- t14 = CNS_INT int 9 $47 /--* t27 int +--* t8 int +--* t14 int N018 ( 33, 25) [000030] ----------- t30 = * SELECT int
When lowering gets to SELECT
, it is lowered in a similar way to the CMP nodes. RyuJIT wants to convert the SELECT
to a SELECTCC
node. The SELECTCC
node is like a SELECT
node. However instead of taking in as input the result of a condition after being compared against flags, the input is just the result of the compare. The SELECTCC
then holds the flags to check - just like the SETCC, JCC
or CCMP
nodes.
Skipping SELECT Lowering
Let us see what happens if the lowering on the SELECT is skipped.
export DOTNET_JitDump=TestAndOrWithElse export DOTNET_TieredCompilation=0 export DOTNET_JitDoLowerSelect=0
Generating: N023 ( 6, 4) [000006] -------N-U- * CCMP void cond=UNE flags=z REG NA IN0003: ccmp w1, #4, z, ne Generating: N025 ( 25, 17) [000027] -------N--- t27 = SETCC int cond=UGT REG x0 IN0004: cset x0, hi Generating: N027 ( 1, 2) [000008] ----------- t8 = CNS_INT int 12 REG x2 $46 IN0005: mov w2, #12 Generating: N029 ( 1, 2) [000014] ----------- t14 = CNS_INT int 9 REG x3 $47 IN0006: mov w3, #9 /--* t27 int +--* t8 int +--* t14 int Generating: N031 ( 33, 25) [000030] ----------- t30 = * SELECT int REG x0 IN0007: cmp w0, #0 IN0008: csel w0, w2, w3, ne
The full assembly:
G_M6318_IG01: IN000e: 000000 stp fp, lr, [sp, #-0x10]! IN000f: 000004 mov fp, sp G_M6318_IG02: IN0001: 000008 cmp w0, #3 IN0002: 00000C ccmp w1, #10, 0, hi IN0003: 000010 ccmp w1, #4, z, ne IN0004: 000014 cset x0, hi IN0005: 000018 mov w2, #12 IN0006: 00001C mov w3, #9 IN0007: 000020 cmp w0, #0 IN0008: 000024 csel w0, w2, w3, ne IN0009: 000028 movz x2, #0x6EE0 // code for CSharpTutorials.Program:Consume[uint](uint,uint) IN000a: 00002C movk x2, #0x65A9 LSL #16 IN000b: 000030 movk x2, #0xFFFF LSL #32 IN000c: 000034 ldr x2, [x2] IN000d: 000038 blr x2 G_M6318_IG03: IN0010: 00003C ldp fp, lr, [sp], #0x10 IN0011: 000040 ret lr
Looking at the generation we can see the result of the if goes into x0
during SETCC
. The CSEL
instruction checks the flags, so the result in x0
must go back into flags via an additional CMP
of x0
and 0.
Lowering SELECT to SELECTCC
Looking back at the previous run, during lowering SELECT
is lowered as follows. The then/else
inputs to the SELECT
are checked to make sure they can be moved prior to the compares. Also, the condition input must either be a SETCC
or a compare node. If so, the lowering happens: the inputs are moved; the SELECT
becomes a SELECTCC
; the SETCC
is removed, and its GT
condition is put into the SELECTCC
. This gives:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 2) [000001] -c--------- t1 = CNS_INT int 3 $43 N004 ( 1, 1) [000016] ----------- t16 = LCL_VAR int V01 arg1 u:1 $81 N005 ( 1, 2) [000017] -c--------- t17 = CNS_INT int 10 $44 N010 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V01 arg1 u:1 $81 N011 ( 1, 2) [000005] -c--------- t5 = CNS_INT int 4 $45 N016 ( 1, 2) [000008] ----------- t8 = CNS_INT int 12 $46 N017 ( 1, 2) [000014] ----------- t14 = CNS_INT int 9 $47 /--* t0 int +--* t1 int N003 ( 6, 4) [000002] -------N-U- * CMP void /--* t16 int +--* t17 int N006 ( 6, 4) [000018] -------N--- * CCMP void cond=UGT flags=0 /--* t4 int +--* t5 int N012 ( 6, 4) [000006] -------N-U- * CCMP void cond=UNE flags=z /--* t8 int +--* t14 int N018 ( 33, 25) [000030] ----------- t30 = * SELECTCC int cond=UGT
This is generated as:
Generating: N027 ( 6, 4) [000006] -------N-U- * CCMP void cond=UNE flags=z REG NA IN0005: ccmp w1, #4, z, ne /--* t8 int +--* t14 int Generating: N029 ( 33, 25) [000030] ----------- t30 = * SELECTCC int cond=UGT REG x0 IN0006: csel w0, w2, w3, hi /--* t30 int Generating: N031 ( 33, 25) [000015] DA--------- * STORE_LCL_VAR int V00 arg0 d:3 x0 REG x0 $VN.Void V00 in reg x0 is becoming live [000015] Live regs: 0002 {x1} => 0003 {x0 x1} New debug range: new var or location Live vars after [000015]: {V01} => {V00 V01}
The full function is generated as follows:
G_M6318_IG01: IN000c: 000000 stp fp, lr, [sp, #-0x10]! IN000d: 000004 mov fp, sp G_M6318_IG02: IN0001: 000008 mov w2, #12 IN0002: 00000C mov w3, #9 IN0003: 000010 cmp w0, #3 IN0004: 000014 ccmp w1, #10, 0, hi IN0005: 000018 ccmp w1, #4, z, ne IN0006: 00001C csel w0, w2, w3, hi IN0007: 000020 movz x2, #0x6EE0 // code for CSharpTutorials.Program:Consume[uint](uint,uint) IN0008: 000024 movk x2, #0x35A9 LSL #16 IN0009: 000028 movk x2, #0xFFFF LSL #32 IN000a: 00002C ldr x2, [x2] IN000b: 000030 blr x2 G_M6318_IG03: IN000e: 000034 ldp fp, lr, [sp], #0x10 IN000f: 000038 ret lr
This code is now ideal. CMP
, a sequence of compares, then CSEL
.
Variations on CSEL
Arm64 provides some variations on the CSEL
instruction.
CINC | Conditionally increment register | CINC x0, x1, lt - if LT flags are set, then x0 = x1+1 |
|
CSINC | Conditional select increment | CINC x0, x1, x2 lt - if LT flags are set, then x0 = x1+1 , else x0 = x2 |
When the two input registers are the same, this is aliased to CINC. |
CINV | Conditionally invert register | CINV x0, x1, lt - if LT flags are set, then x0 = ~x1 |
|
CSINV | Conditional select invert | CSINV x0, x1, x2 lt - if LT flags are set, then x0 = ~x1 , else x0 = x2 |
When the two input registers are the same, this is aliased to CINV. |
CNEG | Conditionally invert register | CNEG x0, x1, lt - if LT flags are set, then x0 = NOT(x1) |
|
CSNEG | Conditional select negate | CSNEG x0, x1, x2 lt - if LT flags are set, then x0 = NOT(x1) , else x0 = x2 . |
When the two input registers are the same, this is aliased to CNEG |
The CS instructions look strange at first glance. It is not immediately obvious why they are required. However, they fit into the standard 2 input, 1 output format that most Arm64 instructions use. The other three instructions are simply aliases of the CS instructions that are used when the input registers match. Note there is nothing different in the encoding of the instruction when the alias is used, it is purely a stylistic choice or readability.
RyuJIT will spot patterns that can use CINC
, CINV
and CNEG
. This is performed during the lowering of SELECT
.
Using CINC
Consider the following, also from the same example program:
[MethodImpl(MethodImplOptions.NoInlining)] static void TestAndOrWithElse(uint op1, uint op2) { if (op1 > 3 && op2 == 10 || op2 <= 4) { op1 = 9; } else { op1 = 12; } Consume(op1, op2); }
On entry to lowering, the LIR will be:
------------ BB01 [000..014) (return), preds={} succs={} [000017] ----------- IL_OFFSET void INLRT @ 0x000[E-] N001 ( 0, 0) [000003] ----------- NOP void [000018] ----------- IL_OFFSET void INLRT @ 0x004[E-] N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 $80 N002 ( 1, 2) [000001] ----------- t1 = CNS_INT int 1 $41 /--* t0 int +--* t1 int N003 ( 3, 4) [000002] N------N-U- t2 = * LE int $100 N004 ( 1, 2) [000004] ----------- t4 = CNS_INT int 8 $43 N005 ( 1, 2) [000010] ----------- t10 = CNS_INT int 7 $44 /--* t2 int +--* t4 int +--* t10 int N006 ( 6, 9) [000016] ----------- t16 = * SELECT int /--* t16 int N007 ( 6, 9) [000011] DA--------- * STORE_LCL_VAR int V01 arg1 d:4 $VN.Void [000019] ----------- IL_OFFSET void INLRT @ 0x009[E-] N001 ( 0, 0) [000005] ----------- NOP void [000020] ----------- IL_OFFSET void INLRT @ 0x00C[E-] N001 ( 1, 1) [000006] ----------- t6 = LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 1) [000007] ----------- t7 = LCL_VAR int V01 arg1 u:2 (last use) $140 /--* t6 int arg0 in x0 +--* t7 int arg1 in x1 N003 ( 16, 6) [000008] --CXG------ * CALL void CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void [000021] ----------- IL_OFFSET void INLRT @ 0x013[E-] N001 ( 0, 0) [000009] ----------- RETURN void $VN.Void
This, as we would expect, is lowered to a CMP
and SELECTCC
:
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 $80 N002 ( 1, 2) [000001] -c--------- t1 = CNS_INT int 1 $41 N004 ( 1, 2) [000004] ----------- t4 = CNS_INT int 8 $43 N005 ( 1, 2) [000010] ----------- t10 = CNS_INT int 7 $44 /--* t0 int +--* t1 int N003 ( 3, 4) [000002] -------N-U- * CMP void /--* t4 int +--* t10 int N006 ( 6, 9) [000016] ----------- t16 = * SELECTCC int cond=ULE
After lowering to SELECTCC
, the same function will check the inputs. If they are both constants and one of them is one bigger than the other, then it can be optimized further. If the first input is the higher one, then then the inputs are swapped, and the condition is inverted. Then the lower input is removed and the SELECT_CC
is changed to the succinctly named SELECT_INCCC
.
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 $80 N002 ( 1, 2) [000001] -c--------- t1 = CNS_INT int 1 $41 N005 ( 1, 2) [000010] ----------- t10 = CNS_INT int 7 $44 /--* t0 int +--* t1 int N003 ( 3, 4) [000002] -------N-U- * CMP void /--* t10 int N006 ( 6, 9) [000016] ----------- t16 = * SELECT_INCCC int cond=ULE
This then generates exactly as expected:
G_M40289_IG01: IN000a: 000000 stp fp, lr, [sp, #-0x10]! IN000b: 000004 mov fp, sp G_M40289_IG02: IN0001: 000008 mov w2, #7 IN0002: 00000C cmp w0, #1 IN0003: 000010 cinc w2, w2, ls IN0004: 000014 sxtw w1, w2 IN0005: 000018 movz x2, #0x6EE0 // code for CSharpTutorials.Program:Consume[uint](uint,uint) IN0006: 00001C movk x2, #0x55A8 LSL #16 IN0007: 000020 movk x2, #0xFFFF LSL #32 IN0008: 000024 ldr x2, [x2] IN0009: 000028 blr x2 G_M40289_IG03: IN000c: 00002C ldp fp, lr, [sp], #0x10 IN000d: 000030 ret lr
Using CINV
The CINV
transformation acts in a similar way but allows for slightly stranger patterns. First consider:
[MethodImpl(MethodImplOptions.NoInlining)] static void TestInv(uint op1, uint op2) { if (op1 != 7) { op2 = ~op2; } Consume(op1, op2); }
The LIR will be:
------------ BB01 [000..010) (return), preds={} succs={} [000017] ----------- IL_OFFSET void INLRT @ 0x000[E-] N001 ( 0, 0) [000003] ----------- NOP void [000018] ----------- IL_OFFSET void INLRT @ 0x004[E-] N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 $80 N002 ( 1, 2) [000001] ----------- t1 = CNS_INT int 7 $43 /--* t0 int +--* t1 int N003 ( 3, 4) [000002] J------N--- t2 = * EQ int $100 N004 ( 1, 1) [000015] ----------- t15 = LCL_VAR int V01 arg1 N005 ( 1, 1) [000008] ----------- t8 = LCL_VAR int V01 arg1 u:1 (last use) $81 /--* t8 int N006 ( 2, 2) [000009] ----------- t9 = * NOT int $82 /--* t2 int +--* t15 int +--* t9 int N007 ( 7, 8) [000016] ----------- t16 = * SELECT int /--* t16 int N008 ( 7, 8) [000010] DA--------- * STORE_LCL_VAR int V01 arg1 d:3 $VN.Void [000019] ----------- IL_OFFSET void INLRT @ 0x008[E-] N001 ( 1, 1) [000004] ----------- t4 = LCL_VAR int V00 arg0 u:1 (last use) $80 N002 ( 1, 1) [000005] ----------- t5 = LCL_VAR int V01 arg1 u:2 (last use) $140 /--* t4 int arg0 in x0 +--* t5 int arg1 in x1 N003 ( 16, 6) [000006] --CXG------ * CALL void CSharpTutorials.Program:Consume[uint](uint,uint) $VN.Void [000020] ----------- IL_OFFSET void INLRT @ 0x00F[E-] N001 ( 0, 0) [000007] ----------- RETURN void $VN.Void
This will follow the same pattern as the CINC
optimization. After lowering the SELECT
to SELECTCC
, the same function will check the then
and else
statements. If one of them is a NOT
statement, then this can be optimized. A SELECT_INVCC
node is created in the expected way. Note that unlike SELECT_INCCC,
the SELECT_INVCC
has the same then
and else
inputs as the SELECT
node - this distinction will become important later.
N001 ( 1, 1) [000000] ----------- t0 = LCL_VAR int V00 arg0 u:1 $80 N002 ( 1, 2) [000001] -c--------- t1 = CNS_INT int 7 $43 N004 ( 1, 1) [000015] ----------- t15 = LCL_VAR int V01 arg1 N005 ( 1, 1) [000008] ----------- t8 = LCL_VAR int V01 arg1 u:1 (last use) $81 /--* t0 int +--* t1 int N003 ( 3, 4) [000002] -------N--- * CMP void /--* t15 int +--* t8 int N007 ( 7, 8) [000016] ----------- t16 = * SELECT_INVCC int cond=UEQ /--* t16 int N008 ( 7, 8) [000010] DA--------- * STORE_LCL_VAR int V01 arg1 d:3 $VN.Void
This is generated into:
G_M20564_IG01: IN0008: 000000 stp fp, lr, [sp, #-0x10]! IN0009: 000004 mov fp, sp G_M20564_IG02: IN0001: 000008 cmp w0, #7 IN0002: 00000C csinv w1, w1, w1, eq IN0003: 000010 movz x2, #0x6EE0 // code for CSharpTutorials.Program:Consume[uint](uint,uint) IN0004: 000014 movk x2, #0x55A6 LSL #16 IN0005: 000018 movk x2, #0xFFFF LSL #32 IN0006: 00001C ldr x2, [x2] IN0007: 000020 blr x2 G_M20564_IG03: IN000a: 000024 ldp fp, lr, [sp], #0x10 IN000b: 000028 ret lr
Note that the disassembler prints CSINV
instead of CINC
. This is the same functionality, and it is optional for the disassembler to show it as CINC
, although it would help debugging slightly.
Now consider:
[MethodImpl(MethodImplOptions.NoInlining)] static void TestInv2(uint op1, uint op2) { if (op1 != 7) { op2 = ~op2; } else { op2 = 3; } Consume(op1, op2); }
This will still get converted at lowering into SELECT_INVCC
. The phase does not care that the then
and else
cases calculate different inputs, only one of them has a ~ operation. This fits the extended CSINV
instruction exactly, and the function will generate into:
G_M15974_IG01: IN0009: 000000 stp fp, lr, [sp, #-0x10]! IN000a: 000004 mov fp, sp G_M15974_IG02: IN0001: 000008 mov w2, #3 IN0002: 00000C cmp w0, #7 IN0003: 000010 csinv w1, w2, w1, eq IN0004: 000014 movz x2, #0x6EE0 // code for CSharpTutorials.Program:Consume[uint](uint,uint) IN0005: 000018 movk x2, #0x31A6 LSL #16 IN0006: 00001C movk x2, #0xFFFF LSL #32 IN0007: 000020 ldr x2, [x2] IN0008: 000024 blr x2 G_M15974_IG03: IN000b: 000028 ldp fp, lr, [sp], #0x10 IN000c: 00002C ret lr
Imagine a scenario where CSINV does not exists, but CINV does. To use CINV, RyuJIT would have to fully parse the then
and else
cases to confirm they are the same inputs.
Using CNEG
The CNEG
optimization works almost identically to CINV
, including the extending CSNEG
variants, except it applies to subtraction instead of negation. These are left as an exercise for the reader to follow.
Closing Comments
That wraps up everything I wanted to walk through across the series. Over three parts we went through both the optimize bools and if conversion phases. We saw how they interact with each other to create a complete solution. We then looked at how they interact with other optimizations. Finally, we looked at a couple of further optimizations that build on top. While none of these optimizations offer large performance gains, they each build off each other and eventually add up. In addition, these will aid future optimizations, for example the simplification of the blocks simplifies loop optimizations.
-
Leave a comment...
-
Arm A-Profile Architecture Developments 2023
The latest features and updates that have been added to Arm's A-Profile architecture in 2023. -
If Conversion Within .NET - Part 3
This is the final part of the blog series on If Conversion in .NET. In this third post, we will look at some additional variations. -
If Conversion Within .NET - Part 2
This blog series gives an overview of the recently added If Conversion pass, with part two looking at the the lowering phase.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK