Author Topic: Error writing HID device  (Read 41150 times)

Jan Andersson

  • Member
  • ***
  • Posts: 15
Error writing HID device
« on: February 09, 2011, 10:32:32 am »
Hello folks! (Another Jan here!  ;) )

I have a problem similar to the one discussed here: http://www.lvr.com/forum/index.php?topic=96.0

The main difference is that I am unable to reproduce the error myself. In fact the PC application seems rock steady in communicating with the embedded HID device, while a few customers have reported that they are completely unable to make it work. I have tested it on a number of PCs, low-, middle- and high ends, Win XP and Win 7. Still just works fine.

The PC application does a number of things, the ones which only read or write a small number of 64 bytes packets work, while the one which writes 300kByte to the device fails the write with error code 31, sometimes immediately, sometimes after a couple of seconds.

So what I would very much like is some suggestion how to proceed when the debugging seem to have to be done "remotely"? I have read that if the device fails ACK/NACK within reasonable time it could give errors like this, but the "load" of the device should be the same when I use it as when the ones getting the error are using it.

« Last Edit: February 09, 2011, 12:56:05 pm by Jan Andersson »

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: Error writing HID device
« Reply #1 on: February 09, 2011, 12:26:36 pm »
Hi Jan,

If it's a low- or full-speed device, you should test with different host controllers (OHCI, UHCI) and with hubs.

The fact that the failure is on long transfers suggests that perhaps the endpoints sometimes fail to NAK when busy with the previous transaction's data. After receiving data, an OUT endpoint should immediately begin to NAK any new data from the host until the endpoint buffer can accept new data. How to manage the endpoints varies with the device hardware.

Jan

Jan Andersson

  • Member
  • ***
  • Posts: 15
Re: Error writing HID device
« Reply #2 on: February 09, 2011, 01:08:22 pm »
Thanks for a quick reply!

Hi Jan,

If it's a low- or full-speed device, you should test with different host controllers (OHCI, UHCI) and with hubs.
Hmm... Good idea! But how to do that? I guess different PCs have different host controllers?

The fact that the failure is on long transfers suggests that perhaps the endpoints sometimes fail to NAK when busy with the previous transaction's data. After receiving data, an OUT endpoint should immediately begin to NAK any new data from the host until the endpoint buffer can accept new data. How to manage the endpoints varies with the device hardware.

Jan
Sounds reasonable! But what could cause this to happen for specific PCs only? Today I got a report that two laptops got this error. Is it possible laptops handle USB different in general?

mdlayt

  • Member
  • ***
  • Posts: 40
Re: Error writing HID device
« Reply #3 on: February 09, 2011, 06:48:37 pm »
Quote
Is it possible laptops handle USB different in general?

Yes, but the variable is not laptop versus desktop.

The variable is, when some component is different, then it might act differently.  In this case, the most relevant components are, the host controller (ie root hub), the driver code, the app code, and any hubs you may have in the system.

That is why you need to look in your device manager to figure out if there is a specific set of host controllers (ie chips on your motherboard for USB) that do or don't work.  Eg in my device manager I see I have an "Intel(R) 82801G (ICH7 Family) USB2 Enhanced Host Controller."

Ron Hemphill

  • Member
  • ***
  • Posts: 19
Re: Error writing HID device
« Reply #4 on: February 09, 2011, 10:48:17 pm »
In your 2nd post you implied, but did not really answer whether your device is FS or LS.  Assuming it is, one thing you might try is to ask your customer to try putting a high-speed (USB 2.0) hub between the device and the PC/laptop.  The hub will then handle transactions to/from the host in high-speed and translate them to FS or LS for the device.  Also, if you're using a hub on your test system, remove it and connect the device directly to a root port (but be aware that some PCs use on-board hubs to distribute USB ports; make sure you're on an actual root port).

Jan Andersson

  • Member
  • ***
  • Posts: 15
Re: Error writing HID device
« Reply #5 on: February 10, 2011, 03:07:41 am »
In your 2nd post you implied, but did not really answer whether your device is FS or LS.  Assuming it is, one thing you might try is to ask your customer to try putting a high-speed (USB 2.0) hub between the device and the PC/laptop.  The hub will then handle transactions to/from the host in high-speed and translate them to FS or LS for the device.  Also, if you're using a hub on your test system, remove it and connect the device directly to a root port (but be aware that some PCs use on-board hubs to distribute USB ports; make sure you're on an actual root port).
I did not see any question about the type of device, so therefore I did not "answer" ;) It is a full-speed device though, so your idea is definitely worth trying! Thanks!

Jan Andersson

  • Member
  • ***
  • Posts: 15
Re: Error writing HID device
« Reply #6 on: February 10, 2011, 03:09:21 am »
Quote
Is it possible laptops handle USB different in general?

Yes, but the variable is not laptop versus desktop.

The variable is, when some component is different, then it might act differently.  In this case, the most relevant components are, the host controller (ie root hub), the driver code, the app code, and any hubs you may have in the system.

That is why you need to look in your device manager to figure out if there is a specific set of host controllers (ie chips on your motherboard for USB) that do or don't work.  Eg in my device manager I see I have an "Intel(R) 82801G (ICH7 Family) USB2 Enhanced Host Controller."
I happen to have that controller too. Will try to get info about which controllers there are at the problematic sites.

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: Error writing HID device
« Reply #7 on: February 10, 2011, 10:16:47 am »
As others have suggested, the host needs to comply with the spec but has some leeway in how to do it. For example, for an interrupt endpoint with a max latency of 10 ms, the host can poll at any rate of every 10 ms or less.

Jan

Jan Andersson

  • Member
  • ***
  • Posts: 15
Re: Error writing HID device
« Reply #8 on: February 10, 2011, 10:55:26 am »
As others have suggested, the host needs to comply with the spec but has some leeway in how to do it. For example, for an interrupt endpoint with a max latency of 10 ms, the host can poll at any rate of every 10 ms or less.

Jan
OK, I see.. Well that could cause problems if the host happens to be a poll-to-often-type.. Hmm.. Would be great if there was a way of knowing the poll rate, preferably from the PC itself.

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: Error writing HID device
« Reply #9 on: February 10, 2011, 11:17:27 am »
The device doesn't have to know the poll rate. It just needs to respond to what the host does.

Typically interrupt OUT endpoints are set up in hardware to NAK after receiving data until firmware retrieves the data and resets the endpoint to ACK new data.

What device controller chip does the device use? Are you using interrupt endpoints only?

Jan

Jan Andersson

  • Member
  • ***
  • Posts: 15
Re: Error writing HID device
« Reply #10 on: February 10, 2011, 01:22:33 pm »
The device doesn't have to know the poll rate. It just needs to respond to what the host does.
Yes, but I meant that if one would know the poll rate is faster for those hosts then one could assume it's a timing issue with them.

Typically interrupt OUT endpoints are set up in hardware to NAK after receiving data until firmware retrieves the data and resets the endpoint to ACK new data.

What device controller chip does the device use? Are you using interrupt endpoints only?

Jan
It's an ARM-based micro controller, NXP LPC2387, which has built in USB support. Do you mean other types than endpoint 0 and interrupt? If so, then no, just endpoint 0 and interrupts for the other used endpoints.

Rechecked the different descriptors and came to think about the parameter "interval" which is exists for the interrupt endpoint 1 used by the HID. I have it set to "1", could it be that it is too fast and so some hosts "overloads the system" so to say?

I find USB to be very complicated, even though I have bought and read your(Jan's) excellent book  ;)

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: Error writing HID device
« Reply #11 on: February 10, 2011, 02:48:52 pm »
Typically interrupt OUT endpoints are set up in hardware to NAK after receiving data until firmware retrieves the data and resets the endpoint to ACK new data. If that is happening correctly, the polling interval isn't an issue. However, a different interval is easy enough to try if you can get new firmware onto a device that is having problems.

A HID uses control transfers for Output reports if the HID doesn't have an interrupt OUT endpoint and for Feature reports. Some devices have trouble with hosts that schedule multiple stages of a control transfer in a single frame. If the problem you're seeing is with interrupt transfers, this doesn't apply.

300 kBytes is very long for a HID report, though in theory should be OK.

Be sure all of the reports < the longest one end in a short packet, which is a a zero-length data packet if the report is an even multiple of the endpoint's max packet size.

Having a system that shows the problem and a hardware protocol analyzer would make it much easier to isolate the problem.

Jan

Jan Andersson

  • Member
  • ***
  • Posts: 15
Re: Error writing HID device
« Reply #12 on: February 17, 2011, 09:30:17 am »
You were right. The interval doesn't seem to be an issue. I increased it and a customer which has the problem tested it and got no change.

And yes, the problem is with interrupt endpoints. Also it happens long before end = the last zero length packet isn't even sent then.

And yes again, would be heaven sent to have at least access to a PC which has the problem!

Guido Koerber

  • Frequent Contributor
  • ****
  • Posts: 72
Re: Error writing HID device
« Reply #13 on: February 17, 2011, 07:11:54 pm »
Since you have a full speed device there are several basic configurations you can encounter that each have different timing details:
When connected direct to the host computer your device can be handled either by a UHCI or OHCI type host controller, they do have different timing for the individual packets that make up a transaction.
Or if a hub is in the game then it may be a 1.1 hub or 2.0 hub. If it is a 1.1 hub the timing will be mostly identical to what you see when being connected direct to the host. Though if it is a 2.0 hub the data transfer between hub and host is running at high speed and the hub translates the packets down to the full speed data rate of your device. Depending on the chip in the hub timing can vary very significantly.

So you should not take any timing that you see in your test set up for granted. Make sure your protocol stack complies with the USB spec and not with the situation you see on your desk.

If you are using a stack from a chip vendor that is no guarantee that this is correct. I had to rewrite every Cypress USB code I have used so far because it was always buggy. And in some cases it had exactly such nasty behavior as you described, that were typically race conditions where the code did not anticipate the next packet or token to come in just now and had put the hardware into a wrong mode.

Jan Andersson

  • Member
  • ***
  • Posts: 15
Re: Error writing HID device
« Reply #14 on: February 18, 2011, 04:53:43 am »
So you should not take any timing that you see in your test set up for granted. Make sure your protocol stack complies with the USB spec and not with the situation you see on your desk.
I am surely not taking my system as being "general" for every possible combination there are  ;)

If you are using a stack from a chip vendor that is no guarantee that this is correct. I had to rewrite every Cypress USB code I have used so far because it was always buggy. And in some cases it had exactly such nasty behavior as you described, that were typically race conditions where the code did not anticipate the next packet or token to come in just now and had put the hardware into a wrong mode.
Yes, the code builds upon a chip vendor's stack. I feel however that I am a bit lost as how to proceed being without access to a faulty system, besides validating against the USB spec as you suggest. How do you suggest I validate against the USB spec?