I'm not sure if this will help or not, but here goes. Remember that the USB bus is a master/slave architecture. The Host is the master, and the device (in this case, your device) is the slave. You must do what the host tells you to do, when the host tells you to do it. Period.
As Jan eluded to earlier, when the host sends you a packet requesting something, you must respond immediately with something, even if it is simply a NAK. The host will only send one packet at a time down the bus. The packet could be an interrupt packet, one of the control packets (control transactions always involve more than one packet), or something else. The host cannot send you a control packet and an interrupt packet at the same time -- it is a serial bus (it only sends one bit at a time).
Also, exactly when you are sent each packet is not consistent. A packet can be sent to you any time within a frame (frames are 1 ms long in USB 1). You can get it at the beginning, middle, or end of the frame, and you have to respond appropriately and quickly no matter what. The host decides exactly which packets get sent when inside each frame.