Test it with much greater transfer size, more than mega bytes.
PC host controller delays transfer completion interrupt until next SOF of micro-frame. In this way, the next transfer always starts at next micro-frame, when your PC application repeats transfers synchronously. That is, the transfer speed is proportional to the transfer size, until it saturates the bus bandwidth.
whereas winusb seems to have an ~11ms between every second packet
Sound like you are seeing thread switch quantum, instead of transfer speed.
Is it synchronous (non-OVERLAPPED) call?
For WinUSB, apply RAW_IO policy to the bulk IN pipe. And put a couple of OVERLAPPED WinUsb_ReadPipe calls in advance. And then, these calls are stored on the host controller queue, directly, which results in seamless transfer sequence.
Tsuneo