So the device supports two device descriptors, each with one configuration. Device A's configuration has a mass-storage interface. Device B's configuration has mass-storage and CDC interfaces.
It looks like the host enumerates device A, communicates via the mass-storage interface (maybe writing something to the drive?) to tell the device to enumerate next time as device B, and re-enumerates the device. If you want to do the same with your host, you need to decode and duplicate the communications with device A, or obtain documentation from the device vendor about how to do it.
Jan