A software analyzer can only show data at the driver level. A hardware protocol analyzer can show what is happening on the bus. If the device is NAKing any IN token packets, that means the device is slowing things down. If you don't have a hardware analyzer, use whatever device debugging tools you have to find out if the device is NAKing.
The URB pairs with identical data show different driver names, one usbhub and one ACPI (device configuration and power management). I don't know the purpose of the URBs with no data (completion?) but I doubt that these are the source of the delays.
If your application is sending data in 64-byte blocks, that will be slower than requesting a single, larger transfer.