Author Topic: Real USB through put, PICs & Full Speed.  (Read 34379 times)

st2000

  • Member
  • ***
  • Posts: 44
Real USB through put, PICs & Full Speed.
« on: November 09, 2010, 02:36:48 pm »
Hi...

Maybe someone here knows something about (Microchip) PIC processors and Microchip software.

I have a PIC 24F USB Host project where I am transferring data from a USB drive to an SDCard.  I was hoping to move 100MBytes in several minutes.  In some cases, it can take over 2 hours!

Let's stick to the USB side of the problem.  My best USB to SDCard transfer times are currently about 70KBytes/Second.  If I only execute the USB reads, I can get as high as 125KBytes/Second.  I believe the Microchip software used in this project is setting up for BLOCK TRANSFERS and for FULL SPEED.  But, that usually means a transfer rate of 1.5MBytes/Second.  In fact, I can get a Windows XP PC to write to these USB drives at a rate above 4MBytes/Second.

So what happened to my USB drive reading speed?  It should be at least 10 times faster then it is. I doubt the PIC isn't keeping up.  I am running it at 32MHz during the transfer.

-thanks



Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: Real USB through put, PICs & Full Speed.
« Reply #1 on: November 10, 2010, 09:25:50 am »
Things that can slow the transfer rate include:

The device NAKs transactions attempts.

The host is performing many small transfers rather than a smaller number of large transfers. For example, if you're using ReadFile, use a large buffer to read large quantities of data at once.

You're using interrupt transfers, which have limited bandwidth.

You're using bulk or control transfers and the bus is busy.

Jan

st2000

  • Member
  • ***
  • Posts: 44
Re: Real USB through put, PICs & Full Speed.
« Reply #2 on: November 10, 2010, 10:02:06 am »

Thanks Jan for responding...

Things that can slow the transfer rate include:

The device NAKs transactions attempts.

Really, hum, there's probably a place in the Microchip code where I can break and see if this is happening.

The host is performing many small transfers rather than a smaller number of large transfers. For example, if you're using ReadFile, use a large buffer to read large quantities of data at once.

I have been reading 512 bytes at a time using Microchip's FAT software.  I think Microsoft FAT16 will set the sector size of a 2G drive to 2K.  I've heard reading the sector size of a given drive to be best.  But the PIC is starved for RAM and 512 is all I can afford right now. Still, I hear what you are saying.  Is there a simple way to verify the code is performing bulk transfers?

You're using interrupt transfers, which have limited bandwidth.

Well, I single stepped through the Microchip USB driver software and it appeared to be setting up for bulk transfers.

You're using bulk or control transfers and the bus is busy.

Well, the only two devices on the USB are the USB thumb drive the the PIC 24F Host.  So I can't imagine the bus to be busy (i.e. unavailable).

-thanks


Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: Real USB through put, PICs & Full Speed.
« Reply #3 on: November 10, 2010, 10:13:17 am »
I missed that you were using an embedded host.

Drives always use bulk transfers.

It's possible that the Microchip host firmware isn't as efficient as it could be. Someone on the Microchip forums might have a suggestion.

Jan

st2000

  • Member
  • ***
  • Posts: 44
Re: Real USB through put, PICs & Full Speed.
« Reply #4 on: November 10, 2010, 10:29:39 am »

Yes, I confess I am burning this candle at both ends (posting there as well).  I think we (the Microchip forum community) are walking around in a dark room bumping into each other regarding this USB speed thing.  The best lead I have found is a 10K foot explanation of re-writing the driver which I read twice and am still not clear why it works better for him/her.

I'll try a different approach and directly contact some Microchip engineers and managers around the country I've names for.

Often is the case where people are sold on ideas based on putting together pieces that work but not well.  I find my self with a such a beast.  It is so slow it might as well not work at all in the eyes of our marketer.

-thanks


Bret

  • Frequent Contributor
  • ****
  • Posts: 68
Re: Real USB through put, PICs & Full Speed.
« Reply #5 on: November 10, 2010, 12:19:35 pm »
If you're only transferring 512 bytes (1 sector) at a time, your speed is always going to be VERY slow.  It takes a minimum of three bulk transactions to move one block (in your case, one sector) of data.  With the need to wait for interrupts and error-checking and what-not at each stage, it can end up being really slow.  If your controller indeed only operates at full-speed instead of high-speed, it's that much worse.

Assuming you can't increase the interrupt rate or change to a high-speed architecture, I think the only practical way you can increase speed is to increase the amount of data transferred in each block.  Is there any way to "borrow" memory from somewhere else in the system?

Tsuneo

  • Frequent Contributor
  • ****
  • Posts: 145
Re: Real USB through put, PICs & Full Speed.
« Reply #6 on: November 10, 2010, 02:41:47 pm »
I think we (the Microchip forum community) are walking around in a dark room bumping into each other regarding this USB speed thing.
So dark?
My post on the Microchip USB forum clearly tells the problems on the current version of Microchip host MSC, and suggestions to improve them.
http://www.microchip.com/forums/fb.ashx?m=524504

Tsuneo

Jan Axelson

  • Administrator
  • Frequent Contributor
  • *****
  • Posts: 3033
    • Lakeview Research
Re: Real USB through put, PICs & Full Speed.
« Reply #7 on: November 10, 2010, 04:45:41 pm »
Thanks for the link and good analysis of the code, Tsuneo!

Jan

Bret

  • Frequent Contributor
  • ****
  • Posts: 68
Re: Real USB through put, PICs & Full Speed.
« Reply #8 on: November 10, 2010, 05:32:04 pm »
It looks like you're suggesting to "post" all three bulk transactions to the schedule right away, and not wait to verify that the first one was actually accepted by the device before posting the second, etc.?  I've actually considered doing that before, but never tried it.  I figured it was probably too dangerous, and would cause race conditions and/or data corruption in at least some devices.  But if it works, so be it.

Tsuneo

  • Frequent Contributor
  • ****
  • Posts: 145
Re: Real USB through put, PICs & Full Speed.
« Reply #9 on: November 10, 2010, 11:28:19 pm »
Quote
and not wait to verify that the first one was actually accepted by the device before posting the second, etc.?

No, I don't suggest "flying" Data / Status transport.
I suggested improvement of bus scheduling.
In the current implementation on the Microchip stack, every transfer starts at the next frame, even if the bus is idle. I suggested to make it start immediately when a transfer is registered on the bus schedule list.

You may be trapped in the practice on the decent host controllers (OHCI/UHCI/EHCI). In these HC, transfers are bound to (micro-) frame to some extent. But the HC on PIC24F/32MX is a primitive one. No hardware supports bus scheduling. We can touch to the bus scheduling as we like on the firmware.

Tsuneo
« Last Edit: November 10, 2010, 11:50:16 pm by Tsuneo »

Bret

  • Frequent Contributor
  • ****
  • Posts: 68
Re: Real USB through put, PICs & Full Speed.
« Reply #10 on: November 11, 2010, 10:33:05 am »
Yes, I'm only used to "decent" host controllers where scheduling is pretty tightly linked to the IRQ's and (micro-) frames.  Without those constraints, your solution seems like a good one.

st2000

  • Member
  • ***
  • Posts: 44
Re: Real USB through put, PICs & Full Speed.
« Reply #11 on: November 11, 2010, 02:50:41 pm »
Well, I'll admit it.  At this point I am not used to any USB hardware.  Not only that, I haven't even been able to make a dent in my current USD drive read speed (which is about 125KBytes/Second).

Size:
So as not to cloud my results, I am only reading the USB drive the tossing the results.  I switched from 512 to 1024 byte reads and changed the MEDIA_SECTOR_SIZE define from 512 to 1024.  I don't know if the Microchip software really supports this cleanly.  I say this because this breaks my code.  Switching the MEDIA_SECTOR_SIZE back to 512 gets things working again.  But the throughput is about the same.  Only 125KBytes/Second.  Which is about the same as when I was reading 512 bytes at a time.

Calls:
I am not sure what Tsuneo means by "I suggest to make it start immediately when a transfer is registered on the bus schedule list.".  To me the call to USB_FindNextToken() checks for any pending block transfers so it would appear safe to call as many times as possible.  Is that what was meant?

Is there "high speed goal" that anyone has reached?  I'm running a PIC 24F at 32MHz.  Maybe 125KBytes/Second is just as fast as a Microchip PIC can go.  Maybe it's time to find another embedded solution.


JohnHyde

  • Member
  • ***
  • Posts: 4
Re: Real USB through put, PICs & Full Speed.
« Reply #12 on: November 11, 2010, 03:36:50 pm »
I guessing that you haven't used a bus spy (such as an Ellisys Tracker) to see what is actually happening on the bus.
Typically, for a 512byte sector read or write you'll have an MSC command in frame 1, then a delay of 1 or 2 frames, then 2 frames with a total of 8 transfers of 64 bytes, then a 2-3 frame delay (2-6 for write) then a MSC status in 1 frame.

To increase throughput you do writes bigger than 512 bytes, such as the allocation size which may by 2KB or 4KB or larger depending upon your Flash drive size.

The PIC32 runs the same code but faster.  The USB Host implementation is quite simple and almost all of the scheduling is done in software - MicroChip provide the source code so you can tune it if required.

FTDI have recently introduced a Vinculum-II which is a faster, better version of their Vinculum-1 dual host controller.  Their development environment includes an MSC driver (see http://www.usb-by-example.com/FTDI_Book_Examples.htm).  The Vinculum-II implements the dual host controllers in hardware so the throughput will be better but, even so, it is the MSC protocol and 512 byte sector size that is limiting your throughput.

Regards, John

Tsuneo

  • Frequent Contributor
  • ****
  • Posts: 145
Re: Real USB through put, PICs & Full Speed.
« Reply #13 on: November 11, 2010, 08:08:53 pm »
John,

Typically, for a 512byte sector read or write you'll have an MSC command in frame 1, then a delay of 1 or 2 frames, then 2 frames with a total of 8 transfers of 64 bytes, then a 2-3 frame delay (2-6 for write) then a MSC status in 1 frame.

You are also trapped in the practice on the decent HCs. PIC HC is a primitive one. It immediately generates hardware interrupt at every completion of transactions, without waiting for the next frame. Firmware should fully control the transaction timing. Therefore, PIC HC can place all of CBW/Data/CSW transports in a single frame, if the frame timing allows.

MSC-BOT spec doesn't tell that each transport should be placed in separate frames. But I know some USB sticks on the market assume this transport timing wrongly, though most of USB sticks are fine on this point. For safety, the firmware may have an option to disable quick transport timing.

Tsuneo

Tsuneo

  • Frequent Contributor
  • ****
  • Posts: 145
Re: Real USB through put, PICs & Full Speed.
« Reply #14 on: November 11, 2010, 09:53:08 pm »
st2000,

The thread may drop in geeks chat. We have to come back from the "dark room" :-)

I suggest you two options for the Microchip Host MSC stack.

1) Multi-sector read/write
Microchip's FAT system (MDD File System\FSIO.c) always issues single sector read/write, for simplicity. But for greater transfer size, more than two sectors at a time, and for contiguous sector allocation on the media, multi-sector read/write in single READ10/WRITE10 reduces the protocol overhead. For 512 bytes/sector file system, the sectors in a cluster are allocated in contiguous, at least.

Therefore, I recommend you to modify Microchip FSIO to support multi-sectors (hard), or to replace it to another FAT system (easier). Fortunately, ChaN's FatFs supports multi-sector. Marc Coussement has implemented FatFs on top of Microchip host MSC.

ChaN's FatFs
http://elm-chan.org/fsw/ff/00index_e.html

FatFs on Microchip host MSC by Marc Coussement
http://www.microchip.com/forums/fb.ashx?m=501402

Marc has implemented it fine. But maybe, he didn't like to touch to Microchip original code so much. disk_read()/ disk_write() (fatfs_usb.c) repeatedly call single sector read/write routines of the stack (USBHostMSDSCSISectorRead() / USBHostMSDSCSISectorWrite()). To get better performance, modify USBHostMSDSCSISectorRead() / USBHostMSDSCSISectorWrite() directly for multi-sector read/write, too. It's easy.

- Copy USBHostMSDSCSISectorRead() / USBHostMSDSCSISectorWrite() (usb_host_msd_scsi.c) and rename them for new functions of multi-sector read/write
- Add a parameter of sector count to the copied functions
- Assign the sector count to commandBlock[8] (Number of blocks) for READ10/WRITE10
commandBlock[8] = count;
- Increase the dataLength parameter of USBHostMSDRead() - for CBW
errorCode = USBHostMSDRead( deviceAddress, 0, commandBlock, 10, dataBuffer, mediaInformation.sectorSize * count);
- Replace the loop in disk_read()/disk_write() with these new functions.

That's all.

With this mods, you'll get better speed performance on read/write for the size of more than two sectors.

2) Bus scheduling improvement
For quick and dirty implementation,
_USB_FindNextToken() (usb_host.c) traverses the schedule list, and it puts new transaction on the bus. You may need to disable USB interrupt around this extra call.

void start_bulk_transfer_ASAP( void )
{
#if defined( __C30__ )
    WORD        interrupt_mask;
#elif defined( __PIC32MX__ )
    UINT32      interrupt_mask;
#endif

    interrupt_mask = U1IE;        // guard from USB interrupt
    U1IE = 0;
    usbBusInfo.flags.bfBulkTransfersDone        = 0;
    usbBusInfo.lastBulkTransaction              = 0;
    _USB_FindNextToken();
    U1IE = interrupt_mask;        // recover USB interrupt
}

The calls to this function are inserted just after every USBHostRead() and USBHostWrite() call on usb_host_msd.c
- in USBHostMSDTasks() STATE_RUNNING - SUBSTATE_SEND_CBW case
- in USBHostMSDEventHandler() STATE_CBW_WAIT and STATE_TRANSFER_WAIT cases


I believe these mods improve the speed performance 3-4 times. But not for 10 times, as you expect.

Tsuneo