I think the underlying issue may be in how the microcontroller's USB DMA handles writes to the bus, and I just have to configure the write to be larger than the endpoint size to get multiple transfers from the endpoint buffer to the DMA buffer, which will then place the packet on the USB.
So the next problem I have to tackle is simultaneous In and Out packets to and from my device. The goal is to have a PC host send the audio device samples, which will go to a DAC chip and drive an output, then sample a sensor with an ADC chip, and send that sample back to the PC host.
Is this going to be a game of taking turns between IN and OUT transactions, e.g. time multiplexing? I'm a bit concerned, because currently I can only afford 2 transactions every 1ms frame, and until I deal with the DMA buffer size I have a limit of 512 bytes per packet. If I could somehow have two bInterval = 3 high speed transactions (every 4 microframes), but have the IN and OUT periods offset, that would be ideal. Not sure if I can actually do this, though. E.g., if I implement a microframe number check and send IN transactions to the host on mFrames 0 and 4, and receive OUT transactions from the host on mFrames 2 and 6, will both of these pipes be valid?