Author Topic: Bug in host using HID on USB 3.0 xHCI?  (Read 20225 times)

grantb5

  • Member
  • ***
  • Posts: 34
Bug in host using HID on USB 3.0 xHCI?
« on: January 16, 2015, 12:07:31 pm »
I know, crazy right?  Well I've stumbled on a peculiar set of circumstances in which a device I've had in the field many years causes a new PC (Dell Optiplex 3020 Win7 Pro) to HANG on reboot. Confirmed in-house and in the field, maybe also on a Lenovo, and Tsuneo confirmed it on his ASUS PC as well. (I know, name dropper).  If you have the following conditions you can get it to hang ... in my case it's the Intel eXtensible Host Controller.

  • You have an HID device, and in my testing it can be LS or FS
  • You have a bInterval on EP1 IN that is less than 4ms
  • You've decided that all the data on EP1 IN is important, every transaction.
What I mean by that is, when the host sends the Set Idle (indefinite) you go "thanks for asking, but I'd like you to accept data from me for every IN transaction you issue".  Then when the host reboots it gets stuck with a BLACK screen, prior to "Windows Starting", waiting for a quiet period (NAKing) of more than 1.5 seconds on EP1 IN before it allows the boot to proceed. The whole time the host is asking for and receiving reports on EP1 IN. But to the outside world, black screen of doom.

Try it... set up your HID device so that it never NAKs on EP1 IN (give it a meaningful packet of course), set bInterval to 2ms, plug into your xHCI USB 3.0 port and Start Menu->Restart.

My device is not a keyboard or mouse and I don't think I'm in violation of the spirit of the spec. The closest analogy I can think of is my device is something like RS232 serial cable replacement. And to throw salt in the wound, if I configure my device to actually be CDC with the control line status reported on EP1 IN at 2ms there is no problem. And that's because the CDC spec doesn't have Set Idle.

OK, go.

« Last Edit: January 21, 2015, 05:27:45 pm by grantb5 »

grantb5

  • Member
  • ***
  • Posts: 34
Re: Bug in host using HID on USB 3.0 xHCI
« Reply #1 on: January 16, 2015, 12:25:18 pm »
I should also add that if you are in fact an HID keyboard with a second IN EP setup to return the consumer control/multimedia keys on that second IN EP then it too must be silenced for an extended period after the Set Idle (all reports) command before the host will successfully reboot. 

In the test I did the device, after receiving the 21 0A 00 00 00 00 00 00, the host performs a few INs on EP1 and seeing the NAKs seems happy. Then it later issues 21 0A 00 00 01 00 00 00 and proceeds to send IN's on EP1  and EP2.

So a question since I've never seen a proper example of this in any of the manufacturer App Notes (so far), does the wIndex=0000 in the first command mean ALL reports or just EP1 IN. And the second one, wIndex=0001 does it mean the first or second?

 

grantb5

  • Member
  • ***
  • Posts: 34
Re: Bug in host using HID on USB 3.0 xHCI
« Reply #2 on: January 16, 2015, 05:53:12 pm »
To answer my own question, I also have a USB keyboard design with standard EP1 IN and the consumer control stuff on EP2 IN. In that case the host asks me to Idle EP1 with wIndex LSB of 00 and it also asks me separately to Idle EP2 IN with wIndex LSB of 02. That's my interpretation anyway.

And FWIW, I tried STALLing these commands from the host and that did not help (yes with Boot Protocol "off").

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: Bug in host using HID on USB 3.0 xHCI
« Reply #3 on: January 16, 2015, 06:07:56 pm »
Interesting.

Does the endpoint return STALL in response to Set_Idle to indicate non-support of the request?

In a Set_Idle request, wIndex identifies the interface the request applies to.

grantb5

  • Member
  • ***
  • Posts: 34
Re: Bug in host using HID on USB 3.0 xHCI
« Reply #4 on: January 18, 2015, 10:56:23 am »

I tried it both ways ... STALLing or not. Didn't seem to make a difference.

Yeah I was just unclear about what "interface" was referring to. When I configure my device to actually be a vanilla keyboard or a more generic HID with one IN EP and one report, the host just sends the one Set Idle with wIndex = 0. When I configure it to be a keyboard that supports the multimedia keys and has a second report on a 2nd IN EP it sends two Set Idle's ... one with wIndex = 0 and one with wIndex = 1.  Anyway if I don't NAK for the better part of 2 seconds following that Set Idle (regardless of accepting or STALL it), then the PC hangs completely. That doesn't sound like a very graceful failure mode if you ask me.

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: Bug in host using HID on USB 3.0 xHCI
« Reply #5 on: January 18, 2015, 08:13:26 pm »
No, that sounds like a failure on the host's part.

In a USB device's descriptors, every bInterfaceNumber in an interface descriptor defines an interface.

Barry Twycross

  • Frequent Contributor
  • ****
  • Posts: 263
Re: Bug in host using HID on USB 3.0 xHCI
« Reply #6 on: January 19, 2015, 12:49:10 am »
Not sure I can help, but I'll mention a 4ms LS Interrupt endpoint is against the spec. Attempts at doing that have broken workarounds for previous controllers. However if it also fails at FS, that's probably not the problem.

grantb5

  • Member
  • ***
  • Posts: 34
Re: Bug in host using HID on USB 3.0 xHCI?
« Reply #7 on: January 19, 2015, 09:17:26 am »
Thanks, I had always thought 10ms was the fastest you could go for LS, but couldn't find it in the docs. It happens at FS and LS anyway. What happened is we ported some FS 2ms bInterval code to LS and noticed that this host polled it. It's not in the field like this. For now we've changed the LS to 10ms and FS to 5ms for now, though it still hangs now and then (say 1 in 30 instead of every time). Since the host only sends the Set Idle at enumeration time, if I start NAKing following that for at least ~1.6s then it boots. We are testing that now, as these "fixes" may allow us to get past this issue for now. We have thousands and thousands of these out there.

And BTW, I noticed some similar complaints from game controller users on the web where they had to unplug their controller to get their PC to boot on USB3. Anyway all input appreciated as we are able to makes changes to our firmware at this time.

GB

Barry Twycross

  • Frequent Contributor
  • ****
  • Posts: 263
Re: Bug in host using HID on USB 3.0 xHCI?
« Reply #8 on: January 19, 2015, 06:07:53 pm »
Thanks, I had always thought 10ms was the fastest you could go for LS, but couldn't find it in the docs.

Section 5.7.4 "Low-speed endpoints are limited to specifying only 10 ms to 255 ms."

grantb5

  • Member
  • ***
  • Posts: 34
Re: Bug in host using HID on USB 3.0 xHCI?
« Reply #9 on: January 21, 2015, 05:26:06 pm »
Well, for now, some additional information before I move on to other projects.

First I've found more Dell's with Intel USB 3.0 xHCI's that exhibit the same behaviour. Hangs on reboot (only) and I'm pretty sure it's at the BIOS stage. Or at least transition from BIOS to OS.   There is one enumeration then a whole lot of IN transactions (infinite if you want to wait that long).

  • I tried slowing bInterval down as much as I could get away with and it statistically improved the reboot percentage, but curiously not enough.
  • I tried NAKing for a fixed period of time following  HID Set Idle and that helped, but again enough boots and enough PCs and you'd get an eventual hang.
  • I also tried Stalling the HID Set Idle command and that didn't help at all.

So the workaround I finally settled on is to starting out NAKing IN EP's until either the user starts using the device or my host side software starts talking to it. These are different flavours of the same device so fortunately I have these fall backs.  No other operating changes. No further reboot hangs.

Surprisingly, another flavour of the device,  CDC,  can get away with chatting away constantly on an EP IN but an HID not so much.  If I had to guess I'd say the the host controller is waiting for some quiet time on HID IN Endpoints before it allows the OS to reboot. And I'm tempted to say they are wrong in doing so. Especially since it's such a nasty failure mode.

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: Bug in host using HID on USB 3.0 xHCI?
« Reply #10 on: January 22, 2015, 11:02:28 am »
Very helpful information, especially your workaround, thanks for posting. And yes of course the host should not refuse to boot when an endpoint is responding in a legal way -or even if it's not!

Barry Twycross

  • Frequent Contributor
  • ****
  • Posts: 263
Re: Bug in host using HID on USB 3.0 xHCI?
« Reply #11 on: January 22, 2015, 04:11:48 pm »
And yes of course the host should not refuse to boot when an endpoint is responding in a legal way -or even if it's not!
The host should not, but its quite easily done. There was at least one such bug in the host implementation I worked on which would stall the boot like that. Its very annoying when you find them, and quite a priority to fix.

If the problem is in EFI, its even easier. EFI is quite an awkward system to program in and you don't spend much time developing it or running it. Debugging it is even more difficult.