PORTS Forum

Ports and Interfaces => USB => Topic started by: guscrown on February 08, 2014, 02:39:49 pm

Title: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 08, 2014, 02:39:49 pm
EE here. I was tasked with building a USB Hub onto one of our boards, and I chose the TI part TUSB2077A. Everything seems to work properly except when I disconnet one of our devices, but it doesn't happen all the time; I can safely say that it happens about 5% of the disconnections.

My initial assumption was that it had to do with improper layout, ESD, surge or any other electrical problem you can think up. I got all my toys out and started measuring, scoping, cutting traces, adding components, removing components, but no matter how much I did, I would always be able to duplicate the problem with the same frequency of occurrence.

I decided to sniff around the USB traffic and see what I could find, but device drivers are completely out of my expertise and I am unable to interpret the logs. I have been googling around many of the things I see, and they differences in the two logs are more than obvious, but I am still unable to determine if this is a faulty device driver, a faulty Windows usbhub.sys driver, or an error within the TI chip itself. Out of all the possibilities, I believe the fault device driver is the more likely, but before I contact my device provider, I wanted to understand better what is going on.

Can any USB expert out there take a look at these logs and tell me what you can interpret from them? I captured the full USB stack, so there are a lot of Bulk transfer messages, but if you scroll down about 1.8 seconds in you will see the magic happen.

https://www.dropbox.com/s/epdhsf9lcox5ci7/Full%20Stack%20Logs.zip


MORE DETAILS

The device in question is a custom made thermal ticket reader that uses the Thesycon usbio generic USB driver. I did not design this device, nor the device driver. It has been in use by our company for about 10 years, but this would be the first time it is connected to a USB Hub, before it was always connected to the Root Hub in a Host PC running Windows XP. I only designed the USB Hub, using the above mentioned TI Part.

The problem:

When unpluggin the device from the Hub, the hub will reset and re-enumerate itself and therefor the devices attached to it. I have checked the reset line in the hub and it is not asserted, nor is VCC dropped by any level that would cause an electrical malfunction.

I have also checked for ESD, Surge, over current, over voltage and undervoltage problems and have found nothing. I have protection devices on my board, as well as EMI filters for the data pairs. All other devices work normally, and if I do not unplug the device, all my software tools, and our commercial software operates without faults. It is only when you unplug (about 5% of the unplugs) that we get this fault.

I can see from the logs that the "Host" is sending the "RESET_PORT" command/call (or whatever is called) after it tried to SYNC_RESET_PIPE_AND_CLEAR_STALL with unseccesful results. I notice from the logs that this call is not made on the OK log, and simply calls for a device removal. The RESET_PORT call is issues on the HUB, which makes me think that the issues arises on the device driver.

The HUB IC is a standard TI part, and is already in mass production. My schematic follows their guidelines and our board layout was made by a design house in South California well aware of the critical importance of good practice for USB design.

Take a look at the following section of the 2 logs:

good disconnection:

(http://i.imgur.com/TpNLPdZ.png)

and the bad disconnection:

(http://i.imgur.com/VdbCrox.png)

The 10th column from left to right is the Endpoint column. On the good disconnection, the bulk transfer of 64 bytes has an Enndpoint address of 01:00:82, that is the Device's IN endpoint address. But on the bad disconnection, this is sent to FF:FF:8F, and I don't really dont know who or what that is.

My question basically is: Who is messing this up? The TI Hub? The Host? The device driver?

I did not design the device nor the device driver, I did the hardware design of the USB Hub using the TI part, and this prototype is working on a PCB we had manufactured. I would like to add that all other devices work fine, and I can hot-plug them with no issues, on this device is giving us this problem.

The thing about this device is that it has two stages. Stage one it identifies it the host as a bcdUSB 0x0110 device with one end point. One our application is loaded, the application loads the firmware onto the device on-the-fly and the device is re-enumerated, but this time it fails the WHQL by using a bcdUSB of 0x0101, and it now posses two endpoints, one IN and one OUT, in the logs you will probably see the addresses as 0x02 and 0x82, that is the device.

According to TI, they believe that the device driver is not handling the addressing topology of a USB Hub correctly, but I am unfamiliar with device driver design and do not know if this could be the actual problem.

Edit:

The other thing that I noticed is that after that last Bulk Transfer with 64 bytes on the GOOD disconnection, and 1024 bytes on the BAD disconnection that has different end points, there are 6 Internal USB get port status request on the GOOD disconnection, one of them is on the USB hub, but on the BAD disconnection no request is made on the hub whatsoever.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 08, 2014, 05:22:20 pm
I may have other comments later, but for now, is the device firmware returning bcdUSB 0x0101 on purpose or is it unknown why you are seeing that value:

>Stage one it identifies it the host as a bcdUSB 0x0110 device with one end point. One our application is loaded, the application loads the firmware onto the device on-the-fly and the device is re-enumerated, but this time it fails the WHQL by using a bcdUSB of 0x0101

There is no USB version 1.01; the device should return 0x0110 (or 0x0200).
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 08, 2014, 06:03:47 pm
Jan,

I do not know why the device returns that value. It has been working on two more of our products for about 10 years, and at this time and stage I do not know if that was a design decision or simply an error. It was introduced before I joined the company.

It should be noted that this is the first time the device will be connected to a hub, so we were unaware of this potential issue.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 08, 2014, 07:29:44 pm
The HUB has a pre-built state machine with all the USB hub stack in it, so no firmware is necessary on it. It is basically a drop-in solution.

I am going to make this very uneducated guess:

On the good disconnection, we see the last bulk transfer using the correct IN endpoint for the reader 0x82:

(http://i.imgur.com/TpNLPdZ.png)

whilst on the bad disconnection, we see this strange C:I:E values FF:FF:8F

(http://i.imgur.com/VdbCrox.png)

I am going to venture and say this actually crashes the USB Hub, hence we don't see a Get Port Status on the Hub, and later on we see the host trying to clear a stall on the hub:

(http://i.imgur.com/tnqxA2m.png)
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 09, 2014, 12:37:04 pm
Is this an embedded hub, and if so is the disconnect a soft (firmware-controlled) disconnect for re-enumerating rather than physically removing the plug?

If not, do you see this problem with other hubs?

A hardware analyzer would show more detail about what is happening on the bus, but I'm guessing you don't have one available to you.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 09, 2014, 12:47:46 pm
Also see

http://www.lvr.com/forum/index.php?topic=802.msg3170
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 09, 2014, 12:56:25 pm
Jan,

Yes, this is an embedded hub built into one of our boards, we used the TUSB2077A TI part, and their reference schematic. The reset is a soft reset, electrically nothing is happening that would cause the IC to reset, so I am left to assume it is something to do with the OS, or device driver.

You are correct, I do not posses a hardware analyzer unfortunately. I will see if I can convince my manager to purchase the Beagle USB analyzer.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: tsybezoff on February 10, 2014, 07:06:33 am
I guess this problem deal with TI HUB driver. I'm so doubt that Windows Driver doesn't work correctly. I worked with TI usb controllers (Luminary's seria) and always have problem with hot plugged, 'cause their developers have bugs at power functions. In due time I forced TI support to change usb power functions and develop usb hub driver for LM96xxx.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 10, 2014, 01:54:10 pm
Jan,

I managed to get my hands on a Ellisys USB Analyzer. I am attaching the 2 logs that I took. Maybe this will allow you to look better at what is happening. I am attaching the original ufo files used by their Visual USB Analyzing tool.

0001.ufo = Good disconnection
0002.ufo = bad disconnection

https://www.dropbox.com/s/guakbe2dxemf3hu/Hardware%20Analyzer%20Logs.zip
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 10, 2014, 03:52:10 pm
Things that I have noticed:

On the "good" disconnection, the Incomplete IN transaction has an interface value of "0", whilst on the "bad" disconnection, the Incomplete IN Transaction has no "interface" value. Does this mean anything to you?

On the good disconnection I can see 6 "Incomplete IN Transactions attempts", and this is repeatable with all "good" disconnections. While on the "bad" disconnection I can only see one attempt, and then immediately go into the reboot process.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 10, 2014, 06:10:43 pm
I have no answers, but here is what I see, which I think agrees with what you've reported:

Device address 2 is the device that attaches to the hub and device address 1 is the hub.

On the good disconnect, at 3.756, the host sends six IN tokens to the device with no response, then quits attempting to access the device and requests the status of port 4 from the hub. The hub reports that no device is present on port 4. The host makes no more attempts to access the device.

On the bad disconnect, the host sends one IN token to the device with no response, then resets the hub's upstream bus segment. A host can do whatever it wants, but a single incomplete transaction is a minor event that normally wouldn't lead to resetting the hub.

When you emulate disconnect, you're switching out the pull-up on D+ (or D- if low speed) from the device? Or is the device high speed? And this has worked fine when attached to a root hub?

Re your comment - IN token packets specify a device address and endpoint number but not an interface number (not needed). I'm not sure where you're seeing an interface number.






Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 10, 2014, 06:20:50 pm
Jan,

I was using the interface number provided by Ellisys Visual USB app, it is column 4.

The device is a full speed device. I am not emulating a disconnection, I am physically disconnecting the cable.

This command to reset the Hub, when you say "the host", is that actually the USB driver on the OS? or would that be the device driver for the device running on the OS?
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 10, 2014, 08:46:06 pm
It is incredible how after this much analysis I am still nowhere near understanding what is going on. The USB Protocol is in fact too complex.  ???

Today I tried with some off-the-shelve hubs and they all showed a similar issue: They cannot load the firmware to the device after a reconnection, I do not know if they enumerate incorectly or what, but I do not have the time to debug so many hubs at the same time.

The other thing I noticed, my hub and the off-the-shelve hubs, occasionally will give a BSoD when reconnecting the device, or a little after doing it. The usual culprit is usuhub.sys. I should note that Windows 7 never gave me any BSoD, this only happens in Win XP; which is very odd, since the device driver was designed for Windows XP.

As of now I have several hypothesis:

1.- The device driver. Since the problem only happens when the device enumerates with the secondary firmware, I can assume that the problem resides in the device. I still haven't been able to duplicate the problem if I uninstall the device driver from the OS.

2.- The hub: According to the USB spec, USB hubs should be able to handle any incomplete, or erroneous data packages without crashing. Based on the hardware logs nothing is happening on the bus with that device, just a bunch of IN tokens and NAKs. A disconnection should cause no issue.

I guess tomorrow will be another day of logging and trying to see if I can find something I haven't seen. I will also try to get those other hubs to reload the firmware so I can actually do a disconnection test with results that would be of help determining of all hubs behave the same way.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 10, 2014, 08:50:51 pm
In the Ellisys captures you linked to, I don't see anything in the interface column in either capture.

I had thought the hub and device were on the same PC board so yes, testing with conventional external hubs is a good thing to try.

Do the two different firmwares use different Product IDs in the device descriptor? Posting the descriptor sets might be helpful.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 10, 2014, 09:01:29 pm
Jan,

I believe they use the same PID and VID, but they have different descriptors. The first stage has 1 endpoint, I guess it is to wait for the software to load the firmware. The secondary stage has two endpoints, I guess this is because the reader has a built in thermal printer for branding tickets. I will post all descriptors tomorrow morning as soon as I get to the office. For now I can upload the secondary firmware descriptors:

https://www.dropbox.com/s/r7hfi30puuy0ncn/OMR%20Reader%20Enumeration%2014.html

Sorry if I wasn't clear, I guess it is difficult to explain over the internet. The hub I designed and the PC do not reside on the same board. The PC is an off-the-shelve all in one PC that we use for several products. From there I drop 1 USB to the upstream of my hub board, which also provides power to several of our peripherals (24V, 12V, 5V). This board has been operational for about 1 year before the addition of the hub circuit, it is FCC certified and everything. I have checked the power line quality and I haven't seen any real source of noise. Of course there's always the chance of EMI being a potential problem, but in this case I do not see it as a clear reason for the problems I am facing.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 11, 2014, 10:56:47 am
Jan,

This is the enumeration file with no firmware loaded onto the the Reader:

https://www.dropbox.com/s/jlftvic7xg8g5rw/Enumeration%20No%20Firmware.html

Before launching our application that is the state of the reader. Once our application starts, it loads the firmware only the reader and the descriptors change to the ones I attached on my previous post.

I still have not been able to get a failure when the reader is on Stage 1, but I am trying this morning some cycles to confirm this.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 11, 2014, 12:04:50 pm
Windows caches descriptors and thus doesn't like it/gets confused if a device re-enumerates with different descriptors but using the same VID and PID.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 11, 2014, 12:45:34 pm
That makes sense. But makes me wonder why the Root Hub has never crashed, but external hubs do. This device has been in use for about 10 years or so, but always connected to a root hub on the PC.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 11, 2014, 03:05:51 pm
Alright, this is what I have this morning. I believe I have a much better grip on log taking.

From the following log:

This my very small understanding of what is going on:

USBPDO-12 = The Device (it is using the Thesycon usbio.sys driver)
USBPDO-10 = The Hub (it uses the usbhub.sys driver)
USBPDO-4 = The Root Hub
0000006d = The Hub

(http://i.imgur.com/WL24Y6t.png)

According to TI, the first two Bulk Transfers you see are OK, and they are 64bytes because it is a Full Speed device. Then I can see some Get Port Status on Port 4, which is where the device is located. I do not know what the Raw data on Seq 0005-0006 is "00 01 01 00".

Full log is here:

https://www.dropbox.com/s/zms8bt20dcmnf3x/OK%20Disconnect%20Firmware%20Loaded.html

Now onto the BAD log:

(http://i.imgur.com/0JqivYc.png)

According to TI, the top two lines after START are incorrect, because the Bulk or Interrupt Transfer of 1024 bytes is for High Speed devices, like I said before, according to them 64 bytes is for Full Speed devices. But I have been unable to confirm this statement.

After that the log gets really crazy, and I am unable to follow it.

What I know so far:

1.- The Hardware analyzer didn't show anything stranger other than the HOST doing a reset immediately after ONE incomplete transaction, whereas in the good disconnection it always does SIX.
2.- I still suspect the device driver, but I am unsure if a device driver has the authority to reset the Upstream of a USB Hub, and why would it do that?

Full bad log here:

https://www.dropbox.com/s/pqm73w9dfquv9dt/NG%20Disconnect%20Firmware%20Loaded.html

Jan,

I am sorry if I have been such a bother, you have been of great help. I believe these last logs should more evidence of what is going on. If you can/want, please take a look at them.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Barry Twycross on February 11, 2014, 06:41:04 pm
According to TI, the top two lines after START are incorrect, because the Bulk or Interrupt Transfer of 1024 bytes is for High Speed devices, like I said before, according to them 64 bytes is for Full Speed devices. But I have been unable to confirm this statement.
You can partly confirm it in the spec. Chapter 5 gives an overview of "packet size constraints" for the various transfer types.

1024 bytes is not a valid packet in the USB 1.x spec, or in the 2.x spec for full speed transactions. The largest packet full speed allows is 1023 for Isoc or Interrupt. Bulk is limited to 64 bytes.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 11, 2014, 08:27:15 pm
The Raw Data is the port status, see 11.24.2.7 in the USB 2.0 spec.

Do these logs start just after the disconnect? What happened before the disconnect? Was the host requesting 64-byte packets, then switched to 1024 bytes after disconnect? Or?
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 11, 2014, 08:36:22 pm
Jan,

I start logging and after a few seconds disconnect the cable. Based on the usblyzer logs and the hardware logs, nothing was happening before the disconnect. Just a bunch of NAKs. So according to USBlyzer activity starts on the disconnect.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 12, 2014, 10:22:10 am
Again I have no answers, just questions that may or may not help isolate the problem.

Do you see any problem when you attach and then remove other devices (flash drive, mouse, etc.) from the hub?

When you remove your device, is it using the bootloader firmware or the firmware loaded later? If it's the latter, what host driver does that firmware use?
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 12, 2014, 11:07:52 am
No other device connected to the hub causes this error.

When you remove your device, is it using the bootloader firmware or the firmware loaded later?

The firmware loaded later. Problem only happens with Firmware on the device.

If it's the latter, what host driver does that firmware use?

They both use the usbio.sys from Thesycon. The same driver file.

I was looking at the logs this morning and I noticed something: After the host asks for those 1024 bytes, which according to the USB Spec 2.0 is incorrect, because this is a Full Speed HUB, and a Full Speed device, I see a Get port status on Port 4 of the hub (where the device use to be), but there is no reply. After that the host gets a port status (2), which is a status on the Hub itself. To which the hub replys: "01 01" -> A device is present and not in powered-off state, and then a "02 00" which means "Disabled because of a port error condition".

Could there be a correlation between the incorrect 1024 byte buffer and the disabling of the port?
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 12, 2014, 12:19:10 pm
Yes. However, it's an IN transaction, with no packet size embedded. The host just sends a token packet with the endpoint address and direction, and the device returns the data. On the hardware analyzer log, the last IN transaction is identical except before the device can return NAK, the bus segment resets. So the problem appears to be upstream from the bus segment. Maybe the hub driver chokes on getting an URB requesting a 1024-byte transaction. (A transaction is one packet on the bus. A transfer can consist of multiple transactions.)

So at this point I would suspect the TheSyCon driver. Have you tried contacting them?

 
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 12, 2014, 01:40:05 pm
Yesterday I sent all this data and conclusions we have gathered to them. I just wanted to be certain that there was no issue with the TI part or my board layout before I threw the ball to their side of the court. Hopefully they can figure something out. As soon as I have a response/solution I will share it with the community.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 12, 2014, 02:00:29 pm
Good luck!
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 12, 2014, 02:21:45 pm
Jan, I have one question for you.

While waiting for the device vendor to answer I decided to look around with the USBlyzer during different operatios of the device and noticed that when firmware is loaded all of the OUT "Bulk or Interrupt Transfer" have a buffer size of 640 bytes. I see others with size 162 bytes. After the firmware has been loaded the "Bulk or Interrupt Transfers" are not 64 bytes long.

But based on the USB Spec 5.8.3 says that for a full speed device the max bulk data payout sizes are 8, 16, 32, or 64. 512 is reserved for high-speed endpoints.

Am I missing something here?
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 12, 2014, 05:00:15 pm
Transactions are limited to the endpoint's maximum packet size. Transfers can use multiple transactions.

I don't know where USBlyzer is getting the data in the "Request Details" field.  An URB can request a transfer, with a lower-level driver dividing the transfer into transactions. Even so, the 1024-byte URB looks wrong to me in part because the "good" log doesn't have it and also because it immediately precedes the Reset!

http://msdn.microsoft.com/en-us/library/windows/hardware/ff538112%28v=vs.85%29.aspx
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 12, 2014, 05:25:47 pm
I see. Based on the enumeration descriptors, wMaxPacketSize is set to 0x40h, that means 64 bytes. Something fishy is going on. Either their driver is doing something completely wrong, or USBlyzer is messing up the sniffing. Too bad I can't see this on the hardware sniffer.

But it points more towards something USBlyzer does, take for instances the printer that I am using, it's wMaxPacketSize is also set to 64 bytes, but I see a ton of 4096 Bulk or Interrupt Transfer tokens.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: guscrown on February 13, 2014, 07:05:33 pm
Jan,

While I wait for the device desginer to reply to my questions I want to continue testing my USB 1.1 Full Speed Hub. Any recommended functionality test that I can perform? Most of the USB-IF stuff I see on usb.org is for highspeed devicex and hubs, which do not apply to me. In this spin of the board I am not planning on sending it to pre-cert testing, I will wait one more prototype as I can improve somewhat on the layout.

But functionality testing outside of just testing my devices work would be ideal. What do you recommend?
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 13, 2014, 08:50:34 pm
There is nothing wrong with a request for a 4096-byte bulk transfer that, for example, an application sends to a printer driver. Further down the stack, the host controller schedules the transfer in multiple transactions of 64 bytes each.
Title: Re: Debugging new USB Hub design and suspicious device driver.
Post by: Jan Axelson on February 15, 2014, 08:22:10 am
The USB-IF's compliance tests are the best tests I know of.

http://www.usb.org/developers/tools/usb20_tools/