Author Topic: Firmware work around for hardware clock problem.  (Read 14338 times)

kbenham

  • Member
  • ***
  • Posts: 3
Firmware work around for hardware clock problem.
« on: February 21, 2014, 12:53:04 pm »
Hello,
 
 Problem Description:

 I am having a problem with my USB CDMA cellular modem not receiving responses to pings at colder temps, resetting the modem (which reloads all the drivers) fixes the problem for a while then it eventually it happens again.
 The reason the I know it has something to do with crystal is I swapped it out for a 16MHZ which has zero error ideally and the connection lose went away. I am running angstrom embedded Linux 2.6.3 and I have turned on
 usbmon and enabled debug messages in the modem driver.

  Around the time when this problem occurs I get multiple -62 status errors and a  -71.
  These errors show up in the usb.mon, and dmesg logs for the driver.

  I have mutliple units that I need to fix over the air. I am trying to avoid changing hardware.


Hardware:

      AT91SMAN9G20B-CU   (G20) here after.
 
      I am using A 400MHz ARM926-based processor with an extensive range of communication peripherals.
      It embeds a FS USB host which this post is about the USB clock is derived from the main crystal 18.432MHZ
      through a PLL in the G20 which produces an error of 0.16% within the 0.25% acceptable range.

Questions:

   What I don't understand is why the reset fixes the USB communications temporarily...Why isn't dead until the temp goes back up?

    Is there something less heavy handed I can do?


Example of dmesg errors:

[  354.410000] at91_ohci at91_ohci: urb c399a9e0 path 2 ep2in 6c160000 cc 6 --> status -71
[  354.410000] sierra ttyUSB0: sierra_indat_callback - endpoint 02.
[  354.410000] sierra ttyUSB0: sierra_indat_callback: nonzero status: -71 on endpoint 02.
[  354.450000] sierra ttyUSB0: sierra_write: write (175 chars)
[  354.450000] sierra ttyUSB0: sierra_outdat_callback - port 0
[  354.860000] sierra ttyUSB0: sierra_indat_callback - endpoint 02.
[  354.870000] sierra ttyUSB0: sierra_write: write (56 chars)

[  669.270000] sierra ttyUSB0: sierra_write: write (174 chars)
[  669.270000] sierra ttyUSB0: sierra_outdat_callback - port 0
[  669.710000] sierra ttyUSB0: sierra_indat_callback - endpoint 02.
[  669.720000] sierra ttyUSB0: sierra_write: write (56 chars)
[  669.720000] sierra ttyUSB0: sierra_outdat_callback - port 0
[  675.380000] at91_ohci at91_ohci: urb c399aa60 path 2 ep2in 5c160000 cc 5 --> status -62
[  675.380000] sierra ttyUSB0: sierra_indat_callback - endpoint 02.
[  675.380000] sierra ttyUSB0: sierra_indat_callback: nonzero status: -62 on endpoint 02.
[  684.230000] sierra ttyUSB0: sierra_write: write (175 chars)
[  684.230000] sierra ttyUSB0: sierra_outdat_callback - port 0
[  684.660000] sierra ttyUSB0: sierra_indat_callback - endpoint 02.

Barry Twycross

  • Frequent Contributor
  • ****
  • Posts: 263
Re: Firmware work around for hardware clock problem.
« Reply #1 on: February 21, 2014, 02:21:33 pm »
What do the -65 and -71 errors mean?

kbenham

  • Member
  • ***
  • Posts: 3
Re: Firmware work around for hardware clock problem.
« Reply #2 on: February 21, 2014, 02:27:17 pm »
URB status: Timer expired (-ETIME) (-62)


URB status: Protocol error (-EPROTO) (-71)

One of the following errors occurred with this urb:
A bitstuff error happened during the transfer.
No response packet was received in time by the hardware.

Barry Twycross

  • Frequent Contributor
  • ****
  • Posts: 263
Re: Firmware work around for hardware clock problem.
« Reply #3 on: March 20, 2014, 04:59:45 pm »
I managed to miss your reply until now.

Is that timeout error a bus timeout, or a protocol timeout?

A bus timeout would be where the device does not send a handshake for a packet within the required (very short) period of time (some small number of bit times). A protocol timeout is where the device does not come up with data within a long period of time set by the driver, usually 5-30 second or so.

In either case, I'd suspect the device is out to lunch.

For a bus timeout the device is so dead, the USB cell is no longer acting autonomously. Usually with a hung/crashed CPU the USB will NAK transactions so there's no bus timeout, but you do get protocol timeouts.

Also as you mention cold and bitstuff errors, I might wonder if the device has gone so far out of spec with its bit timing its responding, but the packets it sends are not received correctly by the host. Resetting the device might force it to do some recalibration which would allow the communication to work again. Looking at this with a hardware analyzer or even a scope could be very illuminating.

I once worked with a device which was sensitive to glitches on hot plug. Once in a very long time, it would see the glitch and auto calibrate its termination to one the high extreme. This was so extreme that the host would see it as an open bus at the first opportunity and disconnect it. A reset caused it to go through the calibration again and it started working. That was a fun thing to debug, which I think involved a scope and an analyzer. The analyzer to see the issue and trigger the scope to capture the problem. Once we saw the problem, which showed us as "hot SOFs", i.e. SOFs which were too high a level we could capture it with a scope.

kbenham

  • Member
  • ***
  • Posts: 3
Re: Firmware work around for hardware clock problem.
« Reply #4 on: March 21, 2014, 08:36:16 am »
 ;D
We found the problem, it was a bug in the device driver (sierra.c) if you got any status errors the URB callback did not get resubmitted once there were enough errors incoming data stopped and caused the problem.
We moved the URB resubmit so it always happened we still got the errors but it recovered and continued to work even at -40C with the clock frequency errors.

The sierra.c we are using is old, one of the gotchas of trying not to introduce new bugs by freezing the Linux kernel.

THANKS FOR THE RESPONSES!!!!