Date: Fri, 2 Oct 2009 13:14:00 -0700 From: Pyun YongHyeon <pyunyh@gmail.com> To: Yohanes Nugroho <yohanes@gmail.com> Cc: freebsd-net@freebsd.org, freebsd-arm@freebsd.org Subject: Re: FreeBSD ARM network speed Message-ID: <20091002201400.GJ1512@michelle.cdnetworks.com> In-Reply-To: <260bb65e0910012258w7c569505xa8cac5bd8bbd2aaa@mail.gmail.com> References: <260bb65e0910012258w7c569505xa8cac5bd8bbd2aaa@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 02, 2009 at 12:58:38PM +0700, Yohanes Nugroho wrote: > Hi All, > Hi, [...] > The specification for the STR9104 SoC is available on Cavium website > for those who are interested, but it is not very clear, so in > developing the network driver, I followed the logic used by the Linux > driver (the initialization sequence, etc). The current code is at > http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/str91xx/src/sys/arm/econa/if_ece.c&REV=4 > > Here is how the sending part works on STR9104: > > - In the initialization part, I allocate a ring, the size of the ring > is 256 entries (same as Linux version). If ethernet controller does not support 1000baseT(I think it's fastethernt because ICPlus IP101A is 10/100 PHY) allocating 256 descriptors are waste of resource especially on 64MB systems, I think. > - When being asked to send a packet, I will do the following thing: > - stop the network TX DMA > - put the address of each segment of the packet to the ring, and set > a flag so that the entry in the ring will be sent by hardware > - start the network TX DMA > > obviously there is a cleaning up part (freeing mbuf) that should be > done. The network driver can generate interrupt when a packet has been > sent (but can't tell which entry was sent). In the Linux version, this > interrupt is not used, the clean up is done just after starting the TX > DMA, at the send of the sending function, and I do the same in the > FreeBSD driver . Usually only one entry that needs to be removed, so > it is quite fast. > > Is there something obvious (or not so obvius) that I've missed? > I briefly looked over the driver code and I can see missing bus_dmamap_sync(9) in several places as well as incorrect use of bus_dma(9). This may also affect performance because checking OWN bit wouldn't be correct in CPU's view without bus_dmamap_sync(9). Another poor performance might come from m_devget(9), I don't know whether controller really needs this type of copying(sorry, have no time to read data sheet) but m_devget(9) is really slow and time consuming operation because it has to copy entire frame to new mbuf. If you had to use m_devget(9) to align buffers on ETHER_ALIGN boundary I guess you can pass the alignment restriction to bus_dma(9). Of course, this requires the controller have ability to receive frames on even address boundary or no Rx buffer alignment limitation. I believe you should not stop DMA before sending another frame as you did in Rx handler. Basically you should make controller as busy as you can to get maximum performance and should reclaim transmitted buffers as soon as you noticed. Stopping DMA may take time since it may have to drain active DMA cycles. If the controller does not generate Tx completion interrupt after sending a frame, which is not likely, you may have to implement a kind of polling in separate thread or should use polling(9). Good luck!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20091002201400.GJ1512>