Very interesting. Your effort should also be useful to someone learning how USB works hands-on, as it clears away all but the essentials.
I've noticed that calling assembly routines frustrates the C optimizer, because it must assume the worst about registers preserved. Here was an attempt I made at inlining the CRC routine and communicating what registers it trashed (though it may do it wrong, as I still find the asm specification syntax confusing):
Also, that's for the optimized CRC routine, so you'll want to convert the slower, shorter one.
I've noticed that calling assembly routines frustrates the C optimizer, because it must assume the worst about registers preserved. Here was an attempt I made at inlining the CRC routine and communicating what registers it trashed (though it may do it wrong, as I still find the asm specification syntax confusing):
- static __attribute__((naked)) inline void usbCrc16Append( volatile unsigned char* data, unsigned char len )
{
asm volatile (
"\n ldi r20, 0xFF"
"\n ldi r21, 0xFF"
"\n rjmp usbCrc16LoopTest"
"\nusbCrc16r18Loop:"
"\n ld r18, Z+"
"\n eor r18, r20 ; r19 is now 'x' in table()"
"\n mov r19, r18 ; compute parity of 'x'"
"\n swap r18"
"\n eor r18, r19"
"\n mov r20, r18"
"\n lsr r18"
"\n lsr r18"
"\n eor r18, r20"
"\n inc r18"
"\n andi r18, 2 ; r18 is now parity(x) << 1"
"\n cp r1, r18 ; c = (r18 != 0), then put in high bit"
"\n ror r19 ; so that after xoring, shifting, and xoring, it gives"
"\n ror r18 ; the desired 0xC0 with r21"
"\n mov r20, r18"
"\n eor r20, r21"
"\n mov r21, r19"
"\n lsr r19"
"\n ror r18"
"\n eor r21, r19"
"\n eor r20, r18"
"\nusbCrc16LoopTest:"
"\n subi %1, 1"
"\n brsh usbCrc16r18Loop"
"\n com r20"
"\n com r21"
"\n st Z+, r20"
"\n st Z, r21"
"\n"
: "=z" (data), "=r" (len)
: "0" (data), "1" (len)
: "memory", "r18", "r19", "r20", "r21" );
}
Also, that's for the optimized CRC routine, so you'll want to convert the slower, shorter one.