80386 Barrel Shifter

(nand2mario.github.io)

79 points | by jamesbowman 3 days ago

2 comments

rep_lodsb 21 hours ago
Implementing rotate through carry like that was a really bad decision IMO - it's almost never by more than one bit left or right at a time, and this could be done much more efficiently than with the constant-time code which is only faster when the count is > 6.
Is the full microcode available anywhere?
[-]
- ajenner 21 hours ago
  I haven't published it yet as there are still some rough edges to clear up, but if you email me (andrew@reenigne.org) I'll send you the current work-in-progress (the same one that nand2mario is working from).
- kjs3 21 hours ago
  Since the shifter is also used for bit tests, the 'most things are a 1-bit shift' might not be the case. Perhaps they did the analysis and it made sense.
  [-]
  - rep_lodsb 20 hours ago
    There are separate opcodes for shift/rotate by 1, by CL, or by an immediate operand. Those are decoded to separate microcode entry points, so they could have at least optimized the "RCL/RCR x,1" case.
    And the microcode for bit test has to be different anyway.
cmovq 18 hours ago
> For memory operands, there's an additional twist: the bit index is a signed offset that can address bits outside the nominal operand. A bit index of 35 on a dword accesses bit 3 of the next dword in memory.
I wonder what is the use case for testing a bit outside of the memory address given.
[-]
- rep_lodsb 18 hours ago
  So you can have bit arrays of any length in memory, rather than just 32 bits in a register.
  [-]
  - cmovq 18 hours ago
    That makes sense. LLVM could probably do better here by using the memory operand version:
    https://godbolt.org/z/jeqbaPsMz
    [-]
    - ack_complete 9 hours ago
      Don't think the memory operand version would work here. If I understand the x86 architectural manual description, the 32-bit operand form interprets the bit offset as signed. A 64-bit operand could work around that but then run into issues with over-read due to fetching 64 bits of data.
    - jxors 14 hours ago
      The memory operand version tends to be as slow or slower than the manual implementation, so LLVM is right to avoid it.
- juancn 18 hours ago
  It was probably easier to just implement it that way, given that the barrel shifter is 64 bits wide.