If you are trying to analyse it by capturing it on the bus between the host and device then you just examine the endpoint field to see which endpoint is specified.
If you are inside the device you will never see the token packet at all, unless you have instructed the SIE to be ready to accept a transaction on any endpoint which is likely to be used. These are, of course, the endpoints defined in your descriptors.
To be more accurate in your original statements:
- A transaction can have between 2 and 4 packets.
- If specified correctly a SETUP can appear on any endpoint, because any endpoint can be used for control transfer. (Though it is not common.)