The ARM processor (Thumb-2), part 9: Sign and zero extension

Raymond

June 10th, 2021

I noted last time that you could use the bitfield extraction instructions to do zero- and sign-extension of bytes and halfwords to words. But there are dedicated instructions for these operations which have smaller encodings if the source and destination registers are low.

    ; unsigned extend byte to word
    uxtb    Rd, Rm      ; Rd = (uint8_t)Rm

    ; signed extend byte to word
    sxtb    Rd, Rm      ; Rd = (int8_t)Rm

    ; unsigned extend halfword to word
    uxth    Rd, Rm      ; Rd = (uint16_t)Rm

    ; signed extend halfword to word
    sxth    Rd, Rm      ; Rd = (int16_t)Rm

You can optionally apply a rotation to the second register so that you can extract a 8-bit or 16-bit value that sits along a byte boundary.

    ; unsigned/signed extend byte to word with rotation
    ; rotation must be a multiple of 8
    uxtb    Rd, Rm, #rot ; Rd = (uint8_t)(Rm ROR #rot)
    sxtb    Rd, Rm, #rot ; Rd = ( int8_t)(Rm ROR #rot)

    ; unsigned/signed extend halfword to word with rotation
    ; rotation must be a multiple of 8
    uxth    Rd, Rm, #rot ; Rd = (uint16_t)(Rm ROR #rot)
    sxth    Rd, Rm, #rot ; Rd = ( int16_t)(Rm ROR #rot)

It’s kind of weird to apply a 24-bit rotation to extract a halfword, but you can do it if you want to.

You can also zero-extend or sign-extend a word to a doubleword using instructions you already have available:

    ; zero-extend Rd to Rd/R(d+1)
    mov     R(d+1), #0          ; set to 0

    ; sign-extend Rd to Rd/R(d+1)
    asrs    R(d+1), Rd, #31     ; copy sign bit to all bits

The trick is that a signed right-shift by 31 positions ends up filling the entire word with the sign bit. We use the S-version ASRS because it allows a compact 16-bit encoding if both the source and destination registers are low.

The ASR #31 trick can also be used in the op2 of arithmetic or logical instructions.

    ; set r0 to zero if r1 is positive or zero
    and     r0, r1, ASR #31

The trick here is that r1, ASR #31 produces 0xFFFFFFFF if r1 is negative, but 0x00000000 if r1 is positive or zero.

In addition to the straight zero- and sign-extension operations, there are other instructions that combine the extension with another operation. Most of them are focused on multimedia scenarios, but the extend-and-add instructions are more general-purpose, and I have seen the compiler generate the versions with no rotation.

    ; zero/sign extend and add byte with optional rotation
    ; rotation must be a multiple of 8
    uxtab   Rd, Rn, #rot        ; Rd = Rd + (uint8_t)(Rn ROR #rot)
    sxtab   Rd, Rn, #rot        ; Rd = Rd + ( int8_t)(Rn ROR #rot)

    ; zero/sign extend and add halfword with optional rotation
    ; rotation must be a multiple of 8
    sxtah   Rd, Rn, #rot        ; Rd = Rd + ( int16_t)(Rn ROR #rot)
    uxtah   Rd, Rn, #rot        ; Rd = Rd + (uint16_t)(Rn ROR #rot)

There’s another instruction that looks like it’d come in handy, particularly in Win32 user interface code that has to pack two 16-bit coordinates into a 32-bit integer, but I haven’t seen any compiler generate it:

    ; pack halfword bottom-and-top, or top-and-bottom
    ; shift is optional
    pkhbt   Rd, Rn, Rm, LSL #imm ; Rd = ((Rm LSL #imm) << 16) | (uint16_t)Rn
    pkhtb   Rd, Rn, Rm, ASR #imm ; Rd = (Rn << 16) | (uint16_t)(Rm ASR #imm)

The bottom-and-top version puts the first input register in the bottom part of the output, and the second input parameter goes into the top part. The top-and-bottom version does it the other way. (The top-and-bottom instruction is not redundant because the barrel shifter can be applied only to the second input parameter.)

When the compiler needs to do this, it generates two instructions:

    ; pack halfword bottom-and-top
    uxth    r12, Rn                 ; r12 = (uint16_t)Rn
    orr     Rd, r12, Rm, LSL #16    ; Rd = r12 | (Rm << 16)
                                    ;    = (uint16_t)Rn | (Rm << 16)

Even if it didn’t want to use PKHBT, it could have used BFI to pack the values in a single instruction:

    ; pack halfword bottom-and-top (in place)
    bfi     Rd, Rm, #16, #16        ; Rd[31:16] = Rm[15:0]

Maybe there’s some dirty secret about the PKHBT and BFI instructions that the compiler knows but I don’t.

The ARM processor (Thumb-2), part 9: Sign and zero extension

The ARM processor (Thumb-2), part 9: Sign and zero extension

Recommend

The ARM processor (Thumb-2), part 2: Differences between classic ARM and Thumb-2

The ARM processor (Thumb-2), part 4: Single-instruction constants

The ARM processor (Thumb-2), part 6: The lie hiding inside the CMN instruction

The ARM processor (Thumb-2), part 8: Bit shifting and bitfield access

The ARM processor (Thumb-2), part 10: Memory access and alignment

The ARM processor (Thumb-2), part 11: Atomic access and barriers

The ARM processor (Thumb-2), part 15: Miscellaneous instructions

The ARM processor (Thumb-2), part 16: The calling convention

The ARM processor (Thumb-2), part 18: Other kinds of prologues and epilogues

The ARM processor (Thumb-2), part 17: Prologues and epilogues

About Joyk