The ARM processor (Thumb-2), part 9: Sign and zero extension
source link: https://devblogs.microsoft.com/oldnewthing/20210610-00/?p=105295
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
The ARM processor (Thumb-2), part 9: Sign and zero extension
Raymond
June 10th, 2021
I noted last time that you could use the bitfield extraction instructions to do zero- and sign-extension of bytes and halfwords to words. But there are dedicated instructions for these operations which have smaller encodings if the source and destination registers are low.
; unsigned extend byte to word uxtb Rd, Rm ; Rd = (uint8_t)Rm ; signed extend byte to word sxtb Rd, Rm ; Rd = (int8_t)Rm ; unsigned extend halfword to word uxth Rd, Rm ; Rd = (uint16_t)Rm ; signed extend halfword to word sxth Rd, Rm ; Rd = (int16_t)Rm
You can optionally apply a rotation to the second register so that you can extract a 8-bit or 16-bit value that sits along a byte boundary.
; unsigned/signed extend byte to word with rotation ; rotation must be a multiple of 8 uxtb Rd, Rm, #rot ; Rd = (uint8_t)(Rm ROR #rot) sxtb Rd, Rm, #rot ; Rd = ( int8_t)(Rm ROR #rot) ; unsigned/signed extend halfword to word with rotation ; rotation must be a multiple of 8 uxth Rd, Rm, #rot ; Rd = (uint16_t)(Rm ROR #rot) sxth Rd, Rm, #rot ; Rd = ( int16_t)(Rm ROR #rot)
It’s kind of weird to apply a 24-bit rotation to extract a halfword, but you can do it if you want to.
You can also zero-extend or sign-extend a word to a doubleword using instructions you already have available:
; zero-extend Rd to Rd/R(d+1) mov R(d+1), #0 ; set to 0 ; sign-extend Rd to Rd/R(d+1) asrs R(d+1), Rd, #31 ; copy sign bit to all bits
The trick is that a signed right-shift by 31 positions ends up filling the entire word with the sign bit. We use the S-version ASRS
because it allows a compact 16-bit encoding if both the source and destination registers are low.
The ASR #31
trick can also be used in the op2
of arithmetic or logical instructions.
; set r0 to zero if r1 is positive or zero and r0, r1, ASR #31
The trick here is that r1, ASR #31
produces 0xFFFFFFFF
if r1 is negative, but 0x00000000
if r1 is positive or zero.
In addition to the straight zero- and sign-extension operations, there are other instructions that combine the extension with another operation. Most of them are focused on multimedia scenarios, but the extend-and-add instructions are more general-purpose, and I have seen the compiler generate the versions with no rotation.
; zero/sign extend and add byte with optional rotation ; rotation must be a multiple of 8 uxtab Rd, Rn, #rot ; Rd = Rd + (uint8_t)(Rn ROR #rot) sxtab Rd, Rn, #rot ; Rd = Rd + ( int8_t)(Rn ROR #rot) ; zero/sign extend and add halfword with optional rotation ; rotation must be a multiple of 8 sxtah Rd, Rn, #rot ; Rd = Rd + ( int16_t)(Rn ROR #rot) uxtah Rd, Rn, #rot ; Rd = Rd + (uint16_t)(Rn ROR #rot)
There’s another instruction that looks like it’d come in handy, particularly in Win32 user interface code that has to pack two 16-bit coordinates into a 32-bit integer, but I haven’t seen any compiler generate it:
; pack halfword bottom-and-top, or top-and-bottom ; shift is optional pkhbt Rd, Rn, Rm, LSL #imm ; Rd = ((Rm LSL #imm) << 16) | (uint16_t)Rn pkhtb Rd, Rn, Rm, ASR #imm ; Rd = (Rn << 16) | (uint16_t)(Rm ASR #imm)
The bottom-and-top version puts the first input register in the bottom part of the output, and the second input parameter goes into the top part. The top-and-bottom version does it the other way. (The top-and-bottom instruction is not redundant because the barrel shifter can be applied only to the second input parameter.)
When the compiler needs to do this, it generates two instructions:
; pack halfword bottom-and-top uxth r12, Rn ; r12 = (uint16_t)Rn orr Rd, r12, Rm, LSL #16 ; Rd = r12 | (Rm << 16) ; = (uint16_t)Rn | (Rm << 16)
Even if it didn’t want to use PKHBT
, it could have used BFI
to pack the values in a single instruction:
; pack halfword bottom-and-top (in place) bfi Rd, Rm, #16, #16 ; Rd[31:16] = Rm[15:0]
Maybe there’s some dirty secret about the PKHBT
and BFI
instructions that the compiler knows but I don’t.
Recommend
-
3
The ARM processor (Thumb-2), part 2: Differences between classic ARM and Thumb-2
-
7
The ARM processor (Thumb-2), part 4: Single-instruction constants
-
2
The ARM processor (Thumb-2), part 6: The lie hiding inside the CMN instruction
-
6
The ARM processor (Thumb-2), part 8: Bit shifting and bitfield access
-
5
The ARM processor (Thumb-2), part 10: Memory access and alignment
-
0
The ARM processor (Thumb-2), part 11: Atomic access and barriers ...
-
9
The ARM processor (Thumb-2), part 15: Miscellaneous instructions ...
-
7
The ARM processor (Thumb-2), part 16: The calling convention R...
-
5
The ARM processor (Thumb-2), part 18: Other kinds of prologues and epilogues
-
4
The ARM processor (Thumb-2), part 17: Prologues and epilogues ...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK