In my implementation, I saw this as a potentially disastrous scenario and took pains to avoid it. There is no way to tell with 100% certainty in 100% of the situations which transaction (1, 2, ..., n) is associated with which packet when they overlap in time and use the same end point (especially with the possibility of NAK's). There is a potential for mass confusion on either the host or device end.
In my host implementation, I queue all control transactions to end point 0 and make sure the previous one is done before the next one starts.
I have no idea if any other host implementation does anything like this or not.