Author Topic: USB hardware malfunction on mass storage devices  (Read 18782 times)

B-O

  • Member
  • ***
  • Posts: 4
USB hardware malfunction on mass storage devices
« on: December 31, 2012, 11:13:47 am »
When operating early (from BIOS) with the EHCI host controller (Intel chipset) and USB thumb drives, data from thumb drive is not correct, but neither the device or EHCI controller detects any error. Every single value in the CSW status is 100% valid with no errors at all. The EHCI controller status flags no error and the entire transfer descriptor chain are 100% executed and error free.

If I do a warm reset on the CPU, the same data is read without any errors. If I exercise the BIOS virtual drives from DOS everything works fine. It's just when booting this problem occur.

We are specialized on fast booting BIOS, having developed our own. This is a real bad problem, as how can you build a workaround when the errors are not detected by the hardware? In our implementation we have a complete state machine for every phase catching any error that occurs. But as no error occurs, how can we fix that?

The USB devices are very slow devices. For example, some of them need more than two seconds to change their D+/D- signal letting the BIOS know that they are connected. What kind of firmware need so long time to signal their presence? It's okay to be slow and not ready for more than 2 seconds, but that is a real problems for a fast boot BIOS. I mean, if we have passed the USB detection code and the device is not there, how would we know that we need to wait for it?

I think that Windows and Linux with their work around for bad USB devices encourages USB vendors to keep their badly implemented devices. If they didn't use their 1000+ engineering staff to implement work arounds for faulty devices, the vendor will either go out of market or fix the firmware. It's a joke at the moment and crazy.

The following should be standardized:

1) Quick signal the device presence
2) If not ready, have a working way to determine how long you need to wait.
3) Mass storage device classes should have the sequence of SCSI commands documented, so a driver will know how to make the device operational.
4) USB device vendors should adhere to the standard to 100% and the one that do not follow the standard should be banished.

B-O
 

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: USB hardware malfunction on mass storage devices
« Reply #1 on: December 31, 2012, 04:47:25 pm »
Yes, it's true that the USB specs don't consider the situation where you need to detect a device quickly, and yes, so many flash drives don't meet the specs. Getting faulty data with no error indication is a real problem!

The one improvement I'm aware of is that after many years without one, the USB-IF finally released mass storage compliance tests for those manufacturers who want to create robust devices. But no requirement to pass the tests if you don't care about certification.

The only workaround I can think of is to recommend specific known-good drives for customers to use.

B-O

  • Member
  • ***
  • Posts: 4
Re: USB hardware malfunction on mass storage devices
« Reply #2 on: January 01, 2013, 09:28:44 am »
I have isolated the problem to the EHCI controller. I have running a test in MSDOS for over 24 hours now with all data delivered and no errors. So the EHCI controller have troubles when booting, missing data despite CRC checksumming. I suspect it has something to do with the busmaster cache coherency, so this is really a errata on Intel chipsets. This is the problem of years of slow software in BIOS and OS, so Intel are not aware of the problem. I will let them know and hope they can find a workaround.

I have read two of your books. I hope you could make some books of the PC host controllers and programming. There are no litterature of this topic out there and it would be very welcome.

Keep up the good work!!

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: USB hardware malfunction on mass storage devices
« Reply #3 on: January 02, 2013, 01:36:34 pm »
Interesting. I've had my own recent USB issues, see:

http://lvr.com/usb_debug.htm

I haven't yet tried booting to Linux, but if the problem remains, I believe that would also point to the chipset.

Unfortunately, the market for books on low-level USB host controller programming is quite limited. The best resource I know of is the Linux code.

Barry Twycross

  • Frequent Contributor
  • ****
  • Posts: 263
Re: USB hardware malfunction on mass storage devices
« Reply #4 on: January 02, 2013, 07:48:42 pm »
Someone once put this most eloquently when they said the conformance of thumb drives to the spec is "coincidental at best". Unfortunately if it works on one platform we're encouraged to make it work on our platform as well. If it is show that something works on Windows, then we're encouraged to match that functionality, and things never get better.

1. By spec a bus powered USB device is required to signal its presence on the bus at most 100ms after attach. This doesn't apply to self power devices according to the spec, but the USB-if will fail compliance for your device if it takes too long to attach (even if self powered).

2. If the device does not follow the spec in #1, there's no way to know its there to know to wait for it. If it does follow the spec in #1, there are plenty of ways that the device can delay having to actually be operational.

3. I dissagree. Mass storage devices should respond to the commands as specified by the command set they claim to support. The problem is that there is a standard for commands a device will respond to, its what ever commands Windows sends to a device. A lot of devices just respond to those commands in that sequence and not to the commands as properly specified.

4. I totally agree. However, its been like this for at least 15 years and I see no hope of it getting better any time soon.  will note that absolute 100% compliance is actually quite difficult, but I'd be happy with 99%.

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: USB hardware malfunction on mass storage devices
« Reply #5 on: January 02, 2013, 10:37:58 pm »
3. Yes, it's the same with USB requests for enumeration, etc. Devices shouldn't have to follow a defined sequence; they just need to respond to requests, events, etc. as they occur.

B-O

  • Member
  • ***
  • Posts: 4
Re: USB hardware malfunction on mass storage devices
« Reply #6 on: January 04, 2013, 03:46:56 am »
Regarding the 10 port problem I suspect you have a ICH chipset with two EHCI controllers. The configuration for these chipsets can be 1 EHCI controller with 8 ports and 4 low/full speed UHCI companion controller with 2 ports each. The other configuration is 1 EHCI controller with 3 UHCI companion controller and one EHCI controller with 4 ports and 1 UHCI companion controller.

The BIOS choose the first configuration for compatibility with the former chipsets like ICH7 and NM10, thus having the same number of ports available for low, full and high speed.

About my remark 3 above.

I agree with the statements here. A device should be able to handle commans in any sequence. However, as they obviously doesn't, especially at the SCSI level, it would be nice to know how to get these devices working.

B-O

  • Member
  • ***
  • Posts: 4
Re: USB hardware malfunction on mass storage devices
« Reply #7 on: January 04, 2013, 03:57:28 am »
Oh, I see you have enabled all in the BIOS. Then the OS can not manage two EHCi controllers. There is a catch that the 4th UHCI controller is either routed to the first or second EHCI controller. If Windows was installed with the first configuration and then the BIOS setting is changed, the 4th UHCI controller will suddenly not be there. I think a reinstallation of Windows could solve the problem.

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: USB hardware malfunction on mass storage devices
« Reply #8 on: January 04, 2013, 11:25:32 am »
Interesting. I didn't have to change the BIOS; all ports were enabled to start with, though I didn't install Windows 8; the system came with it; so it's unknown what happened when they put the system together.

The chipset is an Intel C216. It has 2 EHCI controllers, one with an 8-port USB 2.0 "rate-matching hub" and one with a 6-port USB 2.0 "rate matching hub" There are no UHCI controllers, and I don't see anything about multiple configurations. On my system, eight of the hardware ports belong to the first EHCI controller and the remaining two ports belong to the second ECHI controller.

The datasheet says: If for some reason, the particular system platform does not want to support any one of
the Device Functions, with the exception of D30:F0, can individually be disabled.

And under ECHI: BIOS performs a number of platform customization steps after the core well has powered up. Contact your Intel Field Representative for additional PCH BIOS information.

Since I can enable both ECHI controllers at the same time from within Device Manager, it appears that the BIOS initialization is disabling the second ECHI controller on bootup even though it's set up to enable all USB ports.

But I don't see anything in the BIOS screens that I can do to fix it.

Thanks for your comments; they got me looking into the datasheet for more clues.