From owner-svn-src-head@FreeBSD.ORG Wed Jan 21 00:50:41 2015 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 78B765FD for ; Wed, 21 Jan 2015 00:50:41 +0000 (UTC) Received: from nm32-vm4.bullet.mail.bf1.yahoo.com (nm32-vm4.bullet.mail.bf1.yahoo.com [72.30.239.140]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 126D27F0 for ; Wed, 21 Jan 2015 00:50:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1421801433; bh=MHbTwGeIoqlEwpe67rWMRNPHZq1BICYRwxiZrc2nNj4=; h=Date:From:To:Subject:References:In-Reply-To:From:Subject; b=lahE/Oj5ylz/KCv5YvCnD6qwLKECDnBujr0zA/H1QHpcRuaImpbm/uMKTsIPIn2W7nLc+NMuE3+npOzHiRlxvEiRtgJ5j+DkeUTXULjnMotdlEwhmg0v/9Ky7ZoHFYQAiNCj7yxzcPQDKbYk7TM+e4tBcS7RcW4UViz9R3Oe4ugwO20q59mEKRgoXn2qg751eCzsVxv0jsOzElkguKMRMykZp0fEPZDBYTPyoCrU2QmTiZHlW2Ukt7F37lt8vgGf+dSIvXfSXtrLukFXymZSZwgU40Oq7I6PwtyOtzD66d5ZDIy9Gflt0SD0h6pmCunA8F8BPoKZXZL7HfB/SMo/1A== Received: from [66.196.81.174] by nm32.bullet.mail.bf1.yahoo.com with NNFMP; 21 Jan 2015 00:50:33 -0000 Received: from [68.142.230.64] by tm20.bullet.mail.bf1.yahoo.com with NNFMP; 21 Jan 2015 00:50:33 -0000 Received: from [127.0.0.1] by smtp221.mail.bf1.yahoo.com with NNFMP; 21 Jan 2015 00:50:33 -0000 X-Yahoo-Newman-Id: 89947.54961.bm@smtp221.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: w71VM0QVM1kfbA8rOKidKHokW0xPRTTw2.E2v9M52zspZKc OCSux4YCLc3D57scymTz3jKKZLftxZgacMMbucxzZstVCjkT2X6qPrrAvaU7 KC3YGGz.756qKxrMBhjeiFKfhg4qyJCRsS8fyKYton8iccA5.QvXleF2VHxh EJTas7XHHhTGrCUOg.ZITMM.DnVZkBP3XrPGG3HZV6jby5G0daF2ZjYXURw6 lGlU8Id_sM7rFeF3H4mhEzYN5HJqWfAmpRQCUe3nHQkMg0QCKFIYXGnzg24l 3aTDYbudcGRssxJx21.njkUsj4xcCc1ozqN1ig2aKBRyngl5LWq4PVkUKuRC .dIj7vIocWkyrzwmSnytLoD7B8PHS6CusCujol82.wIgzlEmUzmBKfANIY3T 38fWAA.nlYnoifFURooyYqWzzX9FJ.2b.E4dNZeHjnsAJ7RM8uyW38htP1yp G8144hQgZREo820SJXE4DrS6UTUsqjwXTUVj.TrGFe8cTRLLf5bg9lY8qCoX 9hv886jNi4lTjd47PMMZnLX0lXnTNiKL1XpVvZQ6goLKoqRFDMqv1R6fEhOH dXuDPk0lqxDCwvKrBaraJW0X4ioZFpSW_QRqQE.j_38FOx92OOFv4FpYy8YW yu1G9yWVYoQ-- X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf Message-ID: <54BEF7CF.9030505@FreeBSD.org> Date: Tue, 20 Jan 2015 19:50:23 -0500 From: Pedro Giffuni User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Navdeep Parhar , Luigi Rizzo , "src-committers@freebsd.org" , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" Subject: Re: svn commit: r276485 - in head/sys: conf dev/cxgbe modules/cxgbe/if_cxgbe References: <201412312319.sBVNJHca031041@svn.freebsd.org> <20150106203344.GB26068@ox> <54BEE07A.3070207@FreeBSD.org> <54BEE305.6020905@FreeBSD.org> In-Reply-To: <54BEE305.6020905@FreeBSD.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jan 2015 00:50:41 -0000 On 01/20/15 18:21, Navdeep Parhar wrote: > The problem reported by Luigi has been fixed in r277225 already. > > Regards, > Navdeep > But the fix is rather ugly, isn't it? I would personally prefer to just kill the older gcc but in the meantime updating it so that it behaves like the updated gcc/clang would be better. IMHO. Pedro. > On 01/20/15 15:10, Pedro Giffuni wrote: >> Hi; >> >> I got this patch from the OpenBSD-tech list[1]. >> Perhaps this fixes the gcc issue? >> >> Apparently it's required for mesa too. >> >> Pedro. >> >> [1] http://article.gmane.org/gmane.os.openbsd.tech/40604 >> >> On 01/06/15 15:33, Navdeep Parhar wrote: >>> On Tue, Jan 06, 2015 at 07:58:34PM +0100, Luigi Rizzo wrote: >>>> >>>> On Thu, Jan 1, 2015 at 12:19 AM, Navdeep Parhar >>>> wrote: >>>> >>>> Author: np >>>> Date: Wed Dec 31 23:19:16 2014 >>>> New Revision: 276485 >>>> URL: https://svnweb.freebsd.org/changeset/base/276485 >>>> >>>> Log: >>>> cxgbe(4): major tx rework. >>>> >>>> >>>> FYI, this commit has some unnamed unions (eg. in t4_mp_ring.c) >>>> which prevent the kernel from compiling with our stock gcc >>>> and its standard kernel build flags (specifically -std=...). >>>> >>>> Adding the following in the kernel config >>>> >>>> makeoptions COPTFLAGS="-fms-extensions" >>>> >>>> seems to do the job >>>> >>>> I know it is unavoidable that we'll end up with gcc not working, >>>> but maybe we can still avoid unnamed unions. >>> There are two unresolved issues with mp_ring and I had to make the >>> driver amd64-only while I consider my options. >>> >>> - platforms where gcc is the default (and our version has problems with >>> unnamed unions). This is simple to fix but reduces the >>> readability of >>> the code. But sure, if building head with gcc is popular then that >>> trumps readability. I wonder if adding -fms-extensions just to the >>> driver's build flags would be an acceptable compromise. >>> - platforms without the acq/rel versions of 64b cmpset. I think it >>> would be simple to add acq/rel variants to i386/pc98 and others that >>> already have 64b cmpset. The driver will be permanently unplugged >>> from >>> whatever remains (only 32 bit powerpc I think). >>> >>> I'll try to sort all this out within the next couple of weeks. >>> >>> Regards, >>> Navdeep >>> >>>> cheers >>>> luigi >>>> >>>> >>>> a) Front load as much work as possible in if_transmit, before >>>> any driver >>>> lock or software queue has to get involved. >>>> >>>> b) Replace buf_ring with a brand new mp_ring (multiproducer >>>> ring). This >>>> is specifically for the tx multiqueue model where one of the >>>> if_transmit >>>> producer threads becomes the consumer and other producers >>>> carry on as >>>> usual. mp_ring is implemented as standalone code and it >>>> should be >>>> possible to use it in any driver with tx multiqueue. It also >>>> has: >>>> - the ability to enqueue/dequeue multiple items. This might >>>> become >>>> significant if packet batching is ever implemented. >>>> - an abdication mechanism to allow a thread to give up >>>> writing tx >>>> descriptors and have another if_transmit thread take over. >>>> A thread >>>> that's writing tx descriptors can end up doing so for an >>>> unbounded >>>> time period if a) there are other if_transmit threads >>>> continuously >>>> feeding the sofware queue, and b) the chip keeps up with >>>> whatever the >>>> thread is throwing at it. >>>> - accurate statistics about interesting events even when the >>>> stats come >>>> at the expense of additional branches/conditional code. >>>> >>>> The NIC txq lock is uncontested on the fast path at this >>>> point. I've >>>> left it there for synchronization with the control events >>>> (interface >>>> up/down, modload/unload). >>>> >>>> c) Add support for "type 1" coalescing work request in the >>>> normal NIC tx >>>> path. This work request is optimized for frames with a single >>>> item in >>>> the DMA gather list. These are very common when forwarding >>>> packets. >>>> Note that netmap tx in cxgbe already uses these "type 1" work >>>> requests. >>>> >>>> d) Do not request automatic cidx updates every 32 >>>> descriptors. Instead, >>>> request updates via bits in individual work requests (still >>>> every 32 >>>> descriptors approximately). Also, request an automatic final >>>> update >>>> when the queue idles after activity. This means NIC tx >>>> reclaim is still >>>> performed lazily but it will catch up quickly as soon as the >>>> queue >>>> idles. This seems to be the best middle ground and I'll >>>> probably do >>>> something similar for netmap tx as well. >>>> >>>> e) Implement a faster tx path for WRQs (used by TOE tx and >>>> control >>>> queues, _not_ by the normal NIC tx). Allow work requests to >>>> be written >>>> directly to the hardware descriptor ring if room is >>>> available. I will >>>> convert t4_tom and iw_cxgbe modules to this faster style >>>> gradually. >>>> >>>> MFC after: 2 months >>>> >>>> Added: >>>> head/sys/dev/cxgbe/t4_mp_ring.c (contents, props changed) >>>> head/sys/dev/cxgbe/t4_mp_ring.h (contents, props changed) >>>> Modified: >>>> head/sys/conf/files >>>> head/sys/dev/cxgbe/adapter.h >>>> head/sys/dev/cxgbe/t4_l2t.c >>>> head/sys/dev/cxgbe/t4_main.c >>>> head/sys/dev/cxgbe/t4_sge.c >>>> head/sys/modules/cxgbe/if_cxgbe/Makefile >>>> >>>> Modified: head/sys/conf/files >>>> >>>> =========================================================================== >>>> >>>> >>>> === >>>> --- head/sys/conf/files Wed Dec 31 22:52:43 2014 (r276484) >>>> +++ head/sys/conf/files Wed Dec 31 23:19:16 2014 (r276485) >>>> @@ -1142,6 +1142,8 @@ dev/cxgb/sys/uipc_mvec.c optional cxgb p >>>> compile-with "${NORMAL_C} -I$S/dev/cxgb" >>>> dev/cxgb/cxgb_t3fw.c optional cxgb cxgb_t3fw \ >>>> compile-with "${NORMAL_C} -I$S/dev/cxgb" >>>> +dev/cxgbe/t4_mp_ring.c optional cxgbe pci \ >>>> + compile-with "${NORMAL_C} -I$S/dev/cxgbe" >>>> dev/cxgbe/t4_main.c optional cxgbe pci \ >>>> compile-with "${NORMAL_C} -I$S/dev/cxgbe" >>>> dev/cxgbe/t4_netmap.c optional cxgbe pci \ >>>> >>>> Modified: head/sys/dev/cxgbe/adapter.h >>>> >>>> =========================================================================== >>>> >>>> >>>> === >>>> --- head/sys/dev/cxgbe/adapter.h Wed Dec 31 22:52:43 2014 >>>> (r276484) >>>> +++ head/sys/dev/cxgbe/adapter.h Wed Dec 31 23:19:16 2014 >>>> (r276485) >>>> @@ -152,7 +152,8 @@ enum { >>>> CL_METADATA_SIZE = CACHE_LINE_SIZE, >>>> >>>> SGE_MAX_WR_NDESC = SGE_MAX_WR_LEN / EQ_ESIZE, /* max WR >>>> size in >>>> desc */ >>>> - TX_SGL_SEGS = 36, >>>> + TX_SGL_SEGS = 39, >>>> + TX_SGL_SEGS_TSO = 38, >>>> TX_WR_FLITS = SGE_MAX_WR_LEN / 8 >>>> }; >>>> >>>> @@ -273,6 +274,7 @@ struct port_info { >>>> struct timeval last_refreshed; >>>> struct port_stats stats; >>>> u_int tnl_cong_drops; >>>> + u_int tx_parse_error; >>>> >>>> eventhandler_tag vlan_c; >>>> >>>> @@ -308,23 +310,9 @@ struct tx_desc { >>>> __be64 flit[8]; >>>> }; >>>> >>>> -struct tx_map { >>>> - struct mbuf *m; >>>> - bus_dmamap_t map; >>>> -}; >>>> - >>>> -/* DMA maps used for tx */ >>>> -struct tx_maps { >>>> - struct tx_map *maps; >>>> - uint32_t map_total; /* # of DMA maps */ >>>> - uint32_t map_pidx; /* next map to be used */ >>>> - uint32_t map_cidx; /* reclaimed up to this index */ >>>> - uint32_t map_avail; /* # of available maps */ >>>> -}; >>>> - >>>> struct tx_sdesc { >>>> + struct mbuf *m; /* m_nextpkt linked chain of >>>> frames */ >>>> uint8_t desc_used; /* # of hardware descriptors >>>> used by the WR >>>> */ >>>> - uint8_t credits; /* NIC txq: # of frames sent out >>>> in the WR >>>> */ >>>> }; >>>> >>>> >>>> @@ -378,16 +366,12 @@ struct sge_iq { >>>> enum { >>>> EQ_CTRL = 1, >>>> EQ_ETH = 2, >>>> -#ifdef TCP_OFFLOAD >>>> EQ_OFLD = 3, >>>> -#endif >>>> >>>> /* eq flags */ >>>> - EQ_TYPEMASK = 7, /* 3 lsbits hold the >>>> type */ >>>> - EQ_ALLOCATED = (1 << 3), /* firmware resources >>>> allocated */ >>>> - EQ_DOOMED = (1 << 4), /* about to be >>>> destroyed */ >>>> - EQ_CRFLUSHED = (1 << 5), /* expecting an update >>>> from SGE */ >>>> - EQ_STALLED = (1 << 6), /* out of hw descriptors >>>> or dmamaps >>>> */ >>>> + EQ_TYPEMASK = 0x3, /* 2 lsbits hold the >>>> type (see >>>> above) */ >>>> + EQ_ALLOCATED = (1 << 2), /* firmware resources >>>> allocated */ >>>> + EQ_ENABLED = (1 << 3), /* open for business */ >>>> }; >>>> >>>> /* Listed in order of preference. Update t4_sysctls too if you >>>> change >>>> these */ >>>> @@ -402,32 +386,25 @@ enum {DOORBELL_UDB, DOORBELL_WCWR, DOORB >>>> struct sge_eq { >>>> unsigned int flags; /* MUST be first */ >>>> unsigned int cntxt_id; /* SGE context id for the eq */ >>>> - bus_dma_tag_t desc_tag; >>>> - bus_dmamap_t desc_map; >>>> - char lockname[16]; >>>> struct mtx eq_lock; >>>> >>>> struct tx_desc *desc; /* KVA of descriptor ring */ >>>> - bus_addr_t ba; /* bus address of descriptor >>>> ring */ >>>> - struct sge_qstat *spg; /* status page, for >>>> convenience */ >>>> uint16_t doorbells; >>>> volatile uint32_t *udb; /* KVA of doorbell (lies within >>>> BAR2) */ >>>> u_int udb_qid; /* relative qid within the >>>> doorbell page */ >>>> - uint16_t cap; /* max # of desc, for >>>> convenience */ >>>> - uint16_t avail; /* available descriptors, for >>>> convenience * >>>> / >>>> - uint16_t qsize; /* size (# of entries) of the >>>> queue */ >>>> + uint16_t sidx; /* index of the entry with the >>>> status page >>>> */ >>>> uint16_t cidx; /* consumer idx (desc idx) */ >>>> uint16_t pidx; /* producer idx (desc idx) */ >>>> - uint16_t pending; /* # of descriptors used since >>>> last >>>> doorbell */ >>>> + uint16_t equeqidx; /* EQUEQ last requested at this >>>> pidx */ >>>> + uint16_t dbidx; /* pidx of the most recent >>>> doorbell */ >>>> uint16_t iqid; /* iq that gets egr_update for >>>> the eq */ >>>> uint8_t tx_chan; /* tx channel used by the eq */ >>>> - struct task tx_task; >>>> - struct callout tx_callout; >>>> + volatile u_int equiq; /* EQUIQ outstanding */ >>>> >>>> - /* stats */ >>>> - >>>> - uint32_t egr_update; /* # of SGE_EGR_UPDATE >>>> notifications for eq >>>> */ >>>> - uint32_t unstalled; /* recovered from stall */ >>>> + bus_dma_tag_t desc_tag; >>>> + bus_dmamap_t desc_map; >>>> + bus_addr_t ba; /* bus address of descriptor >>>> ring */ >>>> + char lockname[16]; >>>> }; >>>> >>>> struct sw_zone_info { >>>> @@ -499,18 +476,19 @@ struct sge_fl { >>>> struct cluster_layout cll_alt; /* alternate refill >>>> zone, layout */ >>>> }; >>>> >>>> +struct mp_ring; >>>> + >>>> /* txq: SGE egress queue + what's needed for Ethernet NIC */ >>>> struct sge_txq { >>>> struct sge_eq eq; /* MUST be first */ >>>> >>>> struct ifnet *ifp; /* the interface this txq >>>> belongs to */ >>>> - bus_dma_tag_t tx_tag; /* tag for transmit buffers */ >>>> - struct buf_ring *br; /* tx buffer ring */ >>>> + struct mp_ring *r; /* tx software ring */ >>>> struct tx_sdesc *sdesc; /* KVA of software descriptor >>>> ring */ >>>> - struct mbuf *m; /* held up due to temporary >>>> resource >>>> shortage */ >>>> - >>>> - struct tx_maps txmaps; >>>> + struct sglist *gl; >>>> + __be32 cpl_ctrl0; /* for convenience */ >>>> >>>> + struct task tx_reclaim_task; >>>> /* stats for common events first */ >>>> >>>> uint64_t txcsum; /* # of times hardware assisted >>>> with >>>> checksum */ >>>> @@ -519,13 +497,12 @@ struct sge_txq { >>>> uint64_t imm_wrs; /* # of work requests with >>>> immediate data * >>>> / >>>> uint64_t sgl_wrs; /* # of work requests with >>>> direct SGL */ >>>> uint64_t txpkt_wrs; /* # of txpkt work requests (not >>>> coalesced) >>>> */ >>>> - uint64_t txpkts_wrs; /* # of coalesced tx work >>>> requests */ >>>> - uint64_t txpkts_pkts; /* # of frames in coalesced tx >>>> work >>>> requests */ >>>> + uint64_t txpkts0_wrs; /* # of type0 coalesced tx work >>>> requests */ >>>> + uint64_t txpkts1_wrs; /* # of type1 coalesced tx work >>>> requests */ >>>> + uint64_t txpkts0_pkts; /* # of frames in type0 >>>> coalesced tx WRs */ >>>> + uint64_t txpkts1_pkts; /* # of frames in type1 >>>> coalesced tx WRs */ >>>> >>>> /* stats for not-that-common events */ >>>> - >>>> - uint32_t no_dmamap; /* no DMA map to load the mbuf */ >>>> - uint32_t no_desc; /* out of hardware descriptors */ >>>> } __aligned(CACHE_LINE_SIZE); >>>> >>>> /* rxq: SGE ingress queue + SGE free list + miscellaneous >>>> items */ >>>> @@ -574,7 +551,13 @@ struct wrqe { >>>> STAILQ_ENTRY(wrqe) link; >>>> struct sge_wrq *wrq; >>>> int wr_len; >>>> - uint64_t wr[] __aligned(16); >>>> + char wr[] __aligned(16); >>>> +}; >>>> + >>>> +struct wrq_cookie { >>>> + TAILQ_ENTRY(wrq_cookie) link; >>>> + int ndesc; >>>> + int pidx; >>>> }; >>>> >>>> /* >>>> @@ -585,17 +568,32 @@ struct sge_wrq { >>>> struct sge_eq eq; /* MUST be first */ >>>> >>>> struct adapter *adapter; >>>> + struct task wrq_tx_task; >>>> + >>>> + /* Tx desc reserved but WR not "committed" yet. */ >>>> + TAILQ_HEAD(wrq_incomplete_wrs , wrq_cookie) >>>> incomplete_wrs; >>>> >>>> - /* List of WRs held up due to lack of tx descriptors */ >>>> + /* List of WRs ready to go out as soon as descriptors are >>>> available. */ >>>> STAILQ_HEAD(, wrqe) wr_list; >>>> + u_int nwr_pending; >>>> + u_int ndesc_needed; >>>> >>>> /* stats for common events first */ >>>> >>>> - uint64_t tx_wrs; /* # of tx work requests */ >>>> + uint64_t tx_wrs_direct; /* # of WRs written directly to >>>> desc ring. >>>> */ >>>> + uint64_t tx_wrs_ss; /* # of WRs copied from scratch >>>> space. */ >>>> + uint64_t tx_wrs_copied; /* # of WRs queued and copied to >>>> desc ring. >>>> */ >>>> >>>> /* stats for not-that-common events */ >>>> >>>> - uint32_t no_desc; /* out of hardware descriptors */ >>>> + /* >>>> + * Scratch space for work requests that wrap around >>>> after reaching >>>> the >>>> + * status page, and some infomation about the last WR >>>> that used it. >>>> + */ >>>> + uint16_t ss_pidx; >>>> + uint16_t ss_len; >>>> + uint8_t ss[SGE_MAX_WR_LEN]; >>>> + >>>> } __aligned(CACHE_LINE_SIZE); >>>> >>>> >>>> @@ -744,7 +742,7 @@ struct adapter { >>>> struct sge sge; >>>> int lro_timeout; >>>> >>>> - struct taskqueue *tq[NCHAN]; /* taskqueues that flush >>>> data out * >>>> / >>>> + struct taskqueue *tq[NCHAN]; /* General purpose >>>> taskqueues */ >>>> struct port_info *port[MAX_NPORTS]; >>>> uint8_t chan_map[NCHAN]; >>>> >>>> @@ -978,12 +976,11 @@ static inline int >>>> tx_resume_threshold(struct sge_eq *eq) >>>> { >>>> >>>> - return (eq->qsize / 4); >>>> + /* not quite the same as qsize / 4, but this will do. */ >>>> + return (eq->sidx / 4); >>>> } >>>> >>>> /* t4_main.c */ >>>> -void t4_tx_task(void *, int); >>>> -void t4_tx_callout(void *); >>>> int t4_os_find_pci_capability(struct adapter *, int); >>>> int t4_os_pci_save_state(struct adapter *); >>>> int t4_os_pci_restore_state(struct adapter *); >>>> @@ -1024,16 +1021,15 @@ int t4_setup_adapter_queues(struct adapt >>>> int t4_teardown_adapter_queues(struct adapter *); >>>> int t4_setup_port_queues(struct port_info *); >>>> int t4_teardown_port_queues(struct port_info *); >>>> -int t4_alloc_tx_maps(struct tx_maps *, bus_dma_tag_t, int, int); >>>> -void t4_free_tx_maps(struct tx_maps *, bus_dma_tag_t); >>>> void t4_intr_all(void *); >>>> void t4_intr(void *); >>>> void t4_intr_err(void *); >>>> void t4_intr_evt(void *); >>>> void t4_wrq_tx_locked(struct adapter *, struct sge_wrq *, >>>> struct wrqe *); >>>> -int t4_eth_tx(struct ifnet *, struct sge_txq *, struct mbuf *); >>>> void t4_update_fl_bufsize(struct ifnet *); >>>> -int can_resume_tx(struct sge_eq *); >>>> +int parse_pkt(struct mbuf **); >>>> +void *start_wrq_wr(struct sge_wrq *, int, struct wrq_cookie *); >>>> +void commit_wrq_wr(struct sge_wrq *, void *, struct >>>> wrq_cookie *); >>>> >>>> /* t4_tracer.c */ >>>> struct t4_tracer; >>>> >>>> Modified: head/sys/dev/cxgbe/t4_l2t.c >>>> >>>> =========================================================================== >>>> >>>> >>>> === >>>> --- head/sys/dev/cxgbe/t4_l2t.c Wed Dec 31 22:52:43 2014 >>>> (r276484) >>>> +++ head/sys/dev/cxgbe/t4_l2t.c Wed Dec 31 23:19:16 2014 >>>> (r276485) >>>> @@ -113,16 +113,15 @@ found: >>>> int >>>> t4_write_l2e(struct adapter *sc, struct l2t_entry *e, int sync) >>>> { >>>> - struct wrqe *wr; >>>> + struct wrq_cookie cookie; >>>> struct cpl_l2t_write_req *req; >>>> int idx = e->idx + sc->vres.l2t.start; >>>> >>>> mtx_assert(&e->lock, MA_OWNED); >>>> >>>> - wr = alloc_wrqe(sizeof(*req), &sc->sge.mgmtq); >>>> - if (wr == NULL) >>>> + req = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*req), >>>> 16), & >>>> cookie); >>>> + if (req == NULL) >>>> return (ENOMEM); >>>> - req = wrtod(wr); >>>> >>>> INIT_TP_WR(req, 0); >>>> OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_L2T_WRITE_REQ, >>>> idx | >>>> @@ -132,7 +131,7 @@ t4_write_l2e(struct adapter *sc, struct >>>> req->vlan = htons(e->vlan); >>>> memcpy(req->dst_mac, e->dmac, sizeof(req->dst_mac)); >>>> >>>> - t4_wrq_tx(sc, wr); >>>> + commit_wrq_wr(&sc->sge.mgmtq, req, &cookie); >>>> >>>> if (sync && e->state != L2T_STATE_SWITCHING) >>>> e->state = L2T_STATE_SYNC_WRITE; >>>> >>>> Modified: head/sys/dev/cxgbe/t4_main.c >>>> >>>> =========================================================================== >>>> >>>> >>>> === >>>> --- head/sys/dev/cxgbe/t4_main.c Wed Dec 31 22:52:43 2014 >>>> (r276484) >>>> +++ head/sys/dev/cxgbe/t4_main.c Wed Dec 31 23:19:16 2014 >>>> (r276485) >>>> @@ -66,6 +66,7 @@ __FBSDID("$FreeBSD$"); >>>> #include "common/t4_regs_values.h" >>>> #include "t4_ioctl.h" >>>> #include "t4_l2t.h" >>>> +#include "t4_mp_ring.h" >>>> >>>> /* T4 bus driver interface */ >>>> static int t4_probe(device_t); >>>> @@ -378,7 +379,8 @@ static void build_medialist(struct port_ >>>> static int cxgbe_init_synchronized(struct port_info *); >>>> static int cxgbe_uninit_synchronized(struct port_info *); >>>> static int setup_intr_handlers(struct adapter *); >>>> -static void quiesce_eq(struct adapter *, struct sge_eq *); >>>> +static void quiesce_txq(struct adapter *, struct sge_txq *); >>>> +static void quiesce_wrq(struct adapter *, struct sge_wrq *); >>>> static void quiesce_iq(struct adapter *, struct sge_iq *); >>>> static void quiesce_fl(struct adapter *, struct sge_fl *); >>>> static int t4_alloc_irq(struct adapter *, struct irq *, int rid, >>>> @@ -434,7 +436,6 @@ static int sysctl_tx_rate(SYSCTL_HANDLER >>>> static int sysctl_ulprx_la(SYSCTL_HANDLER_ARGS); >>>> static int sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS); >>>> #endif >>>> -static inline void txq_start(struct ifnet *, struct sge_txq *); >>>> static uint32_t fconf_to_mode(uint32_t); >>>> static uint32_t mode_to_fconf(uint32_t); >>>> static uint32_t fspec_to_fconf(struct t4_filter_specification >>>> *); >>>> @@ -1429,67 +1430,36 @@ cxgbe_transmit(struct ifnet *ifp, struct >>>> { >>>> struct port_info *pi = ifp->if_softc; >>>> struct adapter *sc = pi->adapter; >>>> - struct sge_txq *txq = &sc->sge.txq[pi->first_txq]; >>>> - struct buf_ring *br; >>>> + struct sge_txq *txq; >>>> + void *items[1]; >>>> int rc; >>>> >>>> M_ASSERTPKTHDR(m); >>>> + MPASS(m->m_nextpkt == NULL); /* not quite ready for >>>> this yet */ >>>> >>>> if (__predict_false(pi->link_cfg.link_ok == 0)) { >>>> m_freem(m); >>>> return (ENETDOWN); >>>> } >>>> >>>> - /* check if flowid is set */ >>>> - if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) >>>> - txq += ((m->m_pkthdr.flowid % (pi->ntxq - pi-> >>>> rsrv_noflowq)) >>>> - + pi->rsrv_noflowq); >>>> - br = txq->br; >>>> - >>>> - if (TXQ_TRYLOCK(txq) == 0) { >>>> - struct sge_eq *eq = &txq->eq; >>>> - >>>> - /* >>>> - * It is possible that t4_eth_tx finishes up and >>>> releases >>>> the >>>> - * lock between the TRYLOCK above and the >>>> drbr_enqueue >>>> here. We >>>> - * need to make sure that this mbuf doesn't just >>>> sit there >>>> in >>>> - * the drbr. >>>> - */ >>>> - >>>> - rc = drbr_enqueue(ifp, br, m); >>>> - if (rc == 0 && callout_pending(&eq->tx_callout) >>>> == 0 && >>>> - !(eq->flags & EQ_DOOMED)) >>>> - callout_reset(&eq->tx_callout, 1, >>>> t4_tx_callout, >>>> eq); >>>> + rc = parse_pkt(&m); >>>> + if (__predict_false(rc != 0)) { >>>> + MPASS(m == NULL); /* was >>>> freed >>>> already */ >>>> + atomic_add_int(&pi->tx_parse_error, 1); /* rare, >>>> atomic is >>>> ok */ >>>> return (rc); >>>> } >>>> >>>> - /* >>>> - * txq->m is the mbuf that is held up due to a temporary >>>> shortage >>>> of >>>> - * resources and it should be put on the wire first. >>>> Then what's >>>> in >>>> - * drbr and finally the mbuf that was just passed in >>>> to us. >>>> - * >>>> - * Return code should indicate the fate of the mbuf that >>>> was passed >>>> in >>>> - * this time. >>>> - */ >>>> - >>>> - TXQ_LOCK_ASSERT_OWNED(txq); >>>> - if (drbr_needs_enqueue(ifp, br) || txq->m) { >>>> - >>>> - /* Queued for transmission. */ >>>> - >>>> - rc = drbr_enqueue(ifp, br, m); >>>> - m = txq->m ? txq->m : drbr_dequeue(ifp, br); >>>> - (void) t4_eth_tx(ifp, txq, m); >>>> - TXQ_UNLOCK(txq); >>>> - return (rc); >>>> - } >>>> + /* Select a txq. */ >>>> + txq = &sc->sge.txq[pi->first_txq]; >>>> + if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) >>>> + txq += ((m->m_pkthdr.flowid % (pi->ntxq - pi-> >>>> rsrv_noflowq)) + >>>> + pi->rsrv_noflowq); >>>> >>>> - /* Direct transmission. */ >>>> - rc = t4_eth_tx(ifp, txq, m); >>>> - if (rc != 0 && txq->m) >>>> - rc = 0; /* held, will be transmitted soon >>>> (hopefully) */ >>>> + items[0] = m; >>>> + rc = mp_ring_enqueue(txq->r, items, 1, 4096); >>>> + if (__predict_false(rc != 0)) >>>> + m_freem(m); >>>> >>>> - TXQ_UNLOCK(txq); >>>> return (rc); >>>> } >>>> >>>> @@ -1499,17 +1469,17 @@ cxgbe_qflush(struct ifnet *ifp) >>>> struct port_info *pi = ifp->if_softc; >>>> struct sge_txq *txq; >>>> int i; >>>> - struct mbuf *m; >>>> >>>> /* queues do not exist if !PORT_INIT_DONE. */ >>>> if (pi->flags & PORT_INIT_DONE) { >>>> for_each_txq(pi, i, txq) { >>>> TXQ_LOCK(txq); >>>> - m_freem(txq->m); >>>> - txq->m = NULL; >>>> - while ((m = >>>> buf_ring_dequeue_sc(txq->br)) != NULL) >>>> - m_freem(m); >>>> + txq->eq.flags &= ~EQ_ENABLED; >>>> TXQ_UNLOCK(txq); >>>> + while (!mp_ring_is_idle(txq->r)) { >>>> + mp_ring_check_drainage(txq->r, 0); >>>> + pause("qflush", 1); >>>> + } >>>> } >>>> } >>>> if_qflush(ifp); >>>> @@ -1564,7 +1534,7 @@ cxgbe_get_counter(struct ifnet *ifp, ift >>>> struct sge_txq *txq; >>>> >>>> for_each_txq(pi, i, txq) >>>> - drops += txq->br->br_drops; >>>> + drops += >>>> counter_u64_fetch(txq->r->drops); >>>> } >>>> >>>> return (drops); >>>> @@ -3236,7 +3206,8 @@ cxgbe_init_synchronized(struct port_info >>>> { >>>> struct adapter *sc = pi->adapter; >>>> struct ifnet *ifp = pi->ifp; >>>> - int rc = 0; >>>> + int rc = 0, i; >>>> + struct sge_txq *txq; >>>> >>>> ASSERT_SYNCHRONIZED_OP(sc); >>>> >>>> @@ -3265,6 +3236,17 @@ cxgbe_init_synchronized(struct port_info >>>> } >>>> >>>> /* >>>> + * Can't fail from this point onwards. Review >>>> cxgbe_uninit_synchronized >>>> + * if this changes. >>>> + */ >>>> + >>>> + for_each_txq(pi, i, txq) { >>>> + TXQ_LOCK(txq); >>>> + txq->eq.flags |= EQ_ENABLED; >>>> + TXQ_UNLOCK(txq); >>>> + } >>>> + >>>> + /* >>>> * The first iq of the first port to come up is used for >>>> tracing. >>>> */ >>>> if (sc->traceq < 0) { >>>> @@ -3297,7 +3279,8 @@ cxgbe_uninit_synchronized(struct port_in >>>> { >>>> struct adapter *sc = pi->adapter; >>>> struct ifnet *ifp = pi->ifp; >>>> - int rc; >>>> + int rc, i; >>>> + struct sge_txq *txq; >>>> >>>> ASSERT_SYNCHRONIZED_OP(sc); >>>> >>>> @@ -3314,6 +3297,12 @@ cxgbe_uninit_synchronized(struct port_in >>>> return (rc); >>>> } >>>> >>>> + for_each_txq(pi, i, txq) { >>>> + TXQ_LOCK(txq); >>>> + txq->eq.flags &= ~EQ_ENABLED; >>>> + TXQ_UNLOCK(txq); >>>> + } >>>> + >>>> clrbit(&sc->open_device_map, pi->port_id); >>>> PORT_LOCK(pi); >>>> ifp->if_drv_flags &= ~IFF_DRV_RUNNING; >>>> @@ -3543,15 +3532,17 @@ port_full_uninit(struct port_info *pi) >>>> >>>> if (pi->flags & PORT_INIT_DONE) { >>>> >>>> - /* Need to quiesce queues. XXX: ctrl queues? */ >>>> + /* Need to quiesce queues. */ >>>> + >>>> + quiesce_wrq(sc, &sc->sge.ctrlq[pi->port_id]); >>>> >>>> for_each_txq(pi, i, txq) { >>>> - quiesce_eq(sc, &txq->eq); >>>> + quiesce_txq(sc, txq); >>>> } >>>> >>>> #ifdef TCP_OFFLOAD >>>> for_each_ofld_txq(pi, i, ofld_txq) { >>>> - quiesce_eq(sc, &ofld_txq->eq); >>>> + quiesce_wrq(sc, ofld_txq); >>>> } >>>> #endif >>>> >>>> @@ -3576,23 +3567,39 @@ port_full_uninit(struct port_info *pi) >>>> } >>>> >>>> static void >>>> -quiesce_eq(struct adapter *sc, struct sge_eq *eq) >>>> +quiesce_txq(struct adapter *sc, struct sge_txq *txq) >>>> { >>>> - EQ_LOCK(eq); >>>> - eq->flags |= EQ_DOOMED; >>>> + struct sge_eq *eq = &txq->eq; >>>> + struct sge_qstat *spg = (void *)&eq->desc[eq->sidx]; >>>> >>>> - /* >>>> - * Wait for the response to a credit flush if one's >>>> - * pending. >>>> - */ >>>> - while (eq->flags & EQ_CRFLUSHED) >>>> - mtx_sleep(eq, &eq->eq_lock, 0, "crflush", 0); >>>> - EQ_UNLOCK(eq); >>>> + (void) sc; /* unused */ >>>> >>>> - callout_drain(&eq->tx_callout); /* XXX: iffy */ >>>> - pause("callout", 10); /* Still iffy */ >>>> +#ifdef INVARIANTS >>>> + TXQ_LOCK(txq); >>>> + MPASS((eq->flags & EQ_ENABLED) == 0); >>>> + TXQ_UNLOCK(txq); >>>> +#endif >>>> >>>> - taskqueue_drain(sc->tq[eq->tx_chan], &eq->tx_task); >>>> + /* Wait for the mp_ring to empty. */ >>>> + while (!mp_ring_is_idle(txq->r)) { >>>> + mp_ring_check_drainage(txq->r, 0); >>>> + pause("rquiesce", 1); >>>> + } >>>> + >>>> + /* Then wait for the hardware to finish. */ >>>> + while (spg->cidx != htobe16(eq->pidx)) >>>> + pause("equiesce", 1); >>>> + >>>> + /* Finally, wait for the driver to reclaim all >>>> descriptors. */ >>>> + while (eq->cidx != eq->pidx) >>>> + pause("dquiesce", 1); >>>> +} >>>> + >>>> +static void >>>> +quiesce_wrq(struct adapter *sc, struct sge_wrq *wrq) >>>> +{ >>>> + >>>> + /* XXXTX */ >>>> } >>>> >>>> static void >>>> @@ -4892,6 +4899,9 @@ cxgbe_sysctls(struct port_info *pi) >>>> oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "stats", >>>> CTLFLAG_RD, >>>> NULL, "port statistics"); >>>> children = SYSCTL_CHILDREN(oid); >>>> + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, >>>> "tx_parse_error", >>>> CTLFLAG_RD, >>>> + &pi->tx_parse_error, 0, >>>> + "# of tx packets with invalid length or # of >>>> segments"); >>>> >>>> #define SYSCTL_ADD_T4_REG64(pi, name, desc, reg) \ >>>> SYSCTL_ADD_OID(ctx, children, OID_AUTO, name, \ >>>> @@ -6947,74 +6957,6 @@ sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS) >>>> } >>>> #endif >>>> >>>> -static inline void >>>> -txq_start(struct ifnet *ifp, struct sge_txq *txq) >>>> -{ >>>> - struct buf_ring *br; >>>> - struct mbuf *m; >>>> - >>>> - TXQ_LOCK_ASSERT_OWNED(txq); >>>> - >>>> - br = txq->br; >>>> - m = txq->m ? txq->m : drbr_dequeue(ifp, br); >>>> - if (m) >>>> - t4_eth_tx(ifp, txq, m); >>>> -} >>>> - >>>> -void >>>> -t4_tx_callout(void *arg) >>>> -{ >>>> - struct sge_eq *eq = arg; >>>> - struct adapter *sc; >>>> - >>>> - if (EQ_TRYLOCK(eq) == 0) >>>> - goto reschedule; >>>> - >>>> - if (eq->flags & EQ_STALLED && !can_resume_tx(eq)) { >>>> - EQ_UNLOCK(eq); >>>> -reschedule: >>>> - if (__predict_true(!(eq->flags && EQ_DOOMED))) >>>> - callout_schedule(&eq->tx_callout, 1); >>>> - return; >>>> - } >>>> - >>>> - EQ_LOCK_ASSERT_OWNED(eq); >>>> - >>>> - if (__predict_true((eq->flags & EQ_DOOMED) == 0)) { >>>> - >>>> - if ((eq->flags & EQ_TYPEMASK) == EQ_ETH) { >>>> - struct sge_txq *txq = arg; >>>> - struct port_info *pi = >>>> txq->ifp->if_softc; >>>> - >>>> - sc = pi->adapter; >>>> - } else { >>>> - struct sge_wrq *wrq = arg; >>>> - >>>> - sc = wrq->adapter; >>>> - } >>>> - >>>> - taskqueue_enqueue(sc->tq[eq->tx_chan], >>>> &eq->tx_task); >>>> - } >>>> - >>>> - EQ_UNLOCK(eq); >>>> -} >>>> - >>>> -void >>>> -t4_tx_task(void *arg, int count) >>>> -{ >>>> - struct sge_eq *eq = arg; >>>> - >>>> - EQ_LOCK(eq); >>>> - if ((eq->flags & EQ_TYPEMASK) == EQ_ETH) { >>>> - struct sge_txq *txq = arg; >>>> - txq_start(txq->ifp, txq); >>>> - } else { >>>> - struct sge_wrq *wrq = arg; >>>> - t4_wrq_tx_locked(wrq->adapter, wrq, NULL); >>>> - } >>>> - EQ_UNLOCK(eq); >>>> -} >>>> - >>>> static uint32_t >>>> fconf_to_mode(uint32_t fconf) >>>> { >>>> @@ -7452,9 +7394,9 @@ static int >>>> set_filter_wr(struct adapter *sc, int fidx) >>>> { >>>> struct filter_entry *f = &sc->tids.ftid_tab[fidx]; >>>> - struct wrqe *wr; >>>> struct fw_filter_wr *fwr; >>>> unsigned int ftid; >>>> + struct wrq_cookie cookie; >>>> >>>> ASSERT_SYNCHRONIZED_OP(sc); >>>> >>>> @@ -7473,12 +7415,10 @@ set_filter_wr(struct adapter *sc, int fi >>>> >>>> ftid = sc->tids.ftid_base + fidx; >>>> >>>> - wr = alloc_wrqe(sizeof(*fwr), &sc->sge.mgmtq); >>>> - if (wr == NULL) >>>> + fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), >>>> 16), & >>>> cookie); >>>> + if (fwr == NULL) >>>> return (ENOMEM); >>>> - >>>> - fwr = wrtod(wr); >>>> - bzero(fwr, sizeof (*fwr)); >>>> + bzero(fwr, sizeof(*fwr)); >>>> >>>> fwr->op_pkd = htobe32(V_FW_WR_OP(FW_FILTER_WR)); >>>> fwr->len16_pkd = htobe32(FW_LEN16(*fwr)); >>>> @@ -7547,7 +7487,7 @@ set_filter_wr(struct adapter *sc, int fi >>>> f->pending = 1; >>>> sc->tids.ftids_in_use++; >>>> >>>> - t4_wrq_tx(sc, wr); >>>> + commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie); >>>> return (0); >>>> } >>>> >>>> @@ -7555,22 +7495,21 @@ static int >>>> del_filter_wr(struct adapter *sc, int fidx) >>>> { >>>> struct filter_entry *f = &sc->tids.ftid_tab[fidx]; >>>> - struct wrqe *wr; >>>> struct fw_filter_wr *fwr; >>>> unsigned int ftid; >>>> + struct wrq_cookie cookie; >>>> >>>> ftid = sc->tids.ftid_base + fidx; >>>> >>>> - wr = alloc_wrqe(sizeof(*fwr), &sc->sge.mgmtq); >>>> - if (wr == NULL) >>>> + fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), >>>> 16), & >>>> cookie); >>>> + if (fwr == NULL) >>>> return (ENOMEM); >>>> - fwr = wrtod(wr); >>>> bzero(fwr, sizeof (*fwr)); >>>> >>>> t4_mk_filtdelwr(ftid, fwr, sc->sge.fwq.abs_id); >>>> >>>> f->pending = 1; >>>> - t4_wrq_tx(sc, wr); >>>> + commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie); >>>> return (0); >>>> } >>>> >>>> @@ -8170,6 +8109,7 @@ t4_ioctl(struct cdev *dev, unsigned long >>>> >>>> /* MAC stats */ >>>> t4_clr_port_stats(sc, pi->tx_chan); >>>> + pi->tx_parse_error = 0; >>>> >>>> if (pi->flags & PORT_INIT_DONE) { >>>> struct sge_rxq *rxq; >>>> @@ -8192,24 +8132,24 @@ t4_ioctl(struct cdev *dev, unsigned long >>>> txq->imm_wrs = 0; >>>> txq->sgl_wrs = 0; >>>> txq->txpkt_wrs = 0; >>>> - txq->txpkts_wrs = 0; >>>> - txq->txpkts_pkts = 0; >>>> - txq->br->br_drops = 0; >>>> - txq->no_dmamap = 0; >>>> - txq->no_desc = 0; >>>> + txq->txpkts0_wrs = 0; >>>> + txq->txpkts1_wrs = 0; >>>> + txq->txpkts0_pkts = 0; >>>> + txq->txpkts1_pkts = 0; >>>> + mp_ring_reset_stats(txq->r); >>>> } >>>> >>>> #ifdef TCP_OFFLOAD >>>> /* nothing to clear for each ofld_rxq */ >>>> >>>> for_each_ofld_txq(pi, i, wrq) { >>>> - wrq->tx_wrs = 0; >>>> - wrq->no_desc = 0; >>>> + wrq->tx_wrs_direct = 0; >>>> + wrq->tx_wrs_copied = 0; >>>> } >>>> #endif >>>> wrq = &sc->sge.ctrlq[pi->port_id]; >>>> - wrq->tx_wrs = 0; >>>> - wrq->no_desc = 0; >>>> + wrq->tx_wrs_direct = 0; >>>> + wrq->tx_wrs_copied = 0; >>>> } >>>> break; >>>> } >>>> >>>> Added: head/sys/dev/cxgbe/t4_mp_ring.c >>>> >>>> =========================================================================== >>>> >>>> >>>> === >>>> --- /dev/null 00:00:00 1970 (empty, because file is newly >>>> added) >>>> +++ head/sys/dev/cxgbe/t4_mp_ring.c Wed Dec 31 23:19:16 2014 >>>> (r276485) >>>> @@ -0,0 +1,364 @@ >>>> +/*- >>>> + * Copyright (c) 2014 Chelsio Communications, Inc. >>>> + * All rights reserved. >>>> + * Written by: Navdeep Parhar >>>> + * >>>> + * Redistribution and use in source and binary forms, with or >>>> without >>>> + * modification, are permitted provided that the following >>>> conditions >>>> + * are met: >>>> + * 1. Redistributions of source code must retain the above >>>> copyright >>>> + * notice, this list of conditions and the following >>>> disclaimer. >>>> + * 2. Redistributions in binary form must reproduce the above >>>> copyright >>>> + * notice, this list of conditions and the following >>>> disclaimer in the >>>> + * documentation and/or other materials provided with the >>>> distribution. >>>> + * >>>> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS >>>> ``AS IS'' AND >>>> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT >>>> LIMITED TO, THE >>>> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A >>>> PARTICULAR >>>> PURPOSE >>>> + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR >>>> CONTRIBUTORS BE LIABLE >>>> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR >>>> CONSEQUENTIAL >>>> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF >>>> SUBSTITUTE GOODS >>>> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS >>>> INTERRUPTION) >>>> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN >>>> CONTRACT, >>>> STRICT >>>> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) >>>> ARISING IN ANY >>>> WAY >>>> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE >>>> POSSIBILITY OF >>>> + * SUCH DAMAGE. >>>> + */ >>>> + >>>> +#include >>>> +__FBSDID("$FreeBSD$"); >>>> + >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> +#include >>>> + >>>> +#include "t4_mp_ring.h" >>>> + >>>> +union ring_state { >>>> + struct { >>>> + uint16_t pidx_head; >>>> + uint16_t pidx_tail; >>>> + uint16_t cidx; >>>> + uint16_t flags; >>>> + }; >>>> + uint64_t state; >>>> +}; >>>> + >>>> +enum { >>>> + IDLE = 0, /* consumer ran to completion, nothing >>>> more to do. >>>> */ >>>> + BUSY, /* consumer is running already, or will >>>> be shortly. >>>> */ >>>> + STALLED, /* consumer stopped due to lack of >>>> resources. */ >>>> + ABDICATED, /* consumer stopped even though there >>>> was work to >>>> be >>>> + done because it wants another thread >>>> to take >>>> over. */ >>>> +}; >>>> + >>>> +static inline uint16_t >>>> +space_available(struct mp_ring *r, union ring_state s) >>>> +{ >>>> + uint16_t x = r->size - 1; >>>> + >>>> + if (s.cidx == s.pidx_head) >>>> + return (x); >>>> + else if (s.cidx > s.pidx_head) >>>> + return (s.cidx - s.pidx_head - 1); >>>> + else >>>> + return (x - s.pidx_head + s.cidx); >>>> +} >>>> + >>>> +static inline uint16_t >>>> +increment_idx(struct mp_ring *r, uint16_t idx, uint16_t n) >>>> +{ >>>> + int x = r->size - idx; >>>> + >>>> + MPASS(x > 0); >>>> + return (x > n ? idx + n : n - x); >>>> +} >>>> + >>>> +/* Consumer is about to update the ring's state to s */ >>>> +static inline uint16_t >>>> +state_to_flags(union ring_state s, int abdicate) >>>> +{ >>>> + >>>> + if (s.cidx == s.pidx_tail) >>>> + return (IDLE); >>>> + else if (abdicate && s.pidx_tail != s.pidx_head) >>>> + return (ABDICATED); >>>> + >>>> + return (BUSY); >>>> +} >>>> + >>>> +/* >>>> + * Caller passes in a state, with a guarantee that there is >>>> work to do and >>>> that >>>> + * all items up to the pidx_tail in the state are visible. >>>> + */ >>>> +static void >>>> +drain_ring(struct mp_ring *r, union ring_state os, uint16_t >>>> prev, int >>>> budget) >>>> +{ >>>> + union ring_state ns; >>>> + int n, pending, total; >>>> + uint16_t cidx = os.cidx; >>>> + uint16_t pidx = os.pidx_tail; >>>> + >>>> + MPASS(os.flags == BUSY); >>>> + MPASS(cidx != pidx); >>>> + >>>> + if (prev == IDLE) >>>> + counter_u64_add(r->starts, 1); >>>> + pending = 0; >>>> + total = 0; >>>> + >>>> + while (cidx != pidx) { >>>> + >>>> + /* Items from cidx to pidx are available for >>>> consumption. * >>>> / >>>> + n = r->drain(r, cidx, pidx); >>>> + if (n == 0) { >>>> + critical_enter(); >>>> + do { >>>> + os.state = ns.state = r->state; >>>> + ns.cidx = cidx; >>>> + ns.flags = STALLED; >>>> + } while (atomic_cmpset_64(&r->state, >>>> os.state, >>>> + ns.state) == 0); >>>> + critical_exit(); >>>> + if (prev != STALLED) >>>> + counter_u64_add(r->stalls, 1); >>>> + else if (total > 0) { >>>> + counter_u64_add(r->restarts, 1); >>>> + counter_u64_add(r->stalls, 1); >>>> + } >>>> + break; >>>> + } >>>> + cidx = increment_idx(r, cidx, n); >>>> + pending += n; >>>> + total += n; >>>> + >>>> + /* >>>> + * We update the cidx only if we've caught up >>>> with the >>>> pidx, the >>>> + * real cidx is getting too far ahead of the one >>>> visible to >>>> + * everyone else, or we have exceeded our budget. >>>> + */ >>>> + if (cidx != pidx && pending < 64 && total < >>>> budget) >>>> + continue; >>>> + critical_enter(); >>>> + do { >>>> + os.state = ns.state = r->state; >>>> + ns.cidx = cidx; >>>> + ns.flags = state_to_flags(ns, total >= >>>> budget); >>>> + } while (atomic_cmpset_acq_64(&r->state, >>>> os.state, >>>> ns.state) == 0); >>>> + critical_exit(); >>>> + >>>> + if (ns.flags == ABDICATED) >>>> + counter_u64_add(r->abdications, 1); >>>> + if (ns.flags != BUSY) { >>>> + /* Wrong loop exit if we're going to >>>> stall. */ >>>> + MPASS(ns.flags != STALLED); >>>> + if (prev == STALLED) { >>>> + MPASS(total > 0); >>>> + counter_u64_add(r->restarts, 1); >>>> + } >>>> + break; >>>> + } >>>> + >>>> + /* >>>> + * The acquire style atomic above guarantees >>>> visibility of >>>> items >>>> + * associated with any pidx change that we >>>> notice here. >>>> + */ >>>> + pidx = ns.pidx_tail; >>>> + pending = 0; >>>> + } >>>> +} >>>> + >>>> +int >>>> +mp_ring_alloc(struct mp_ring **pr, int size, void *cookie, >>>> ring_drain_t >>>> drain, >>>> + ring_can_drain_t can_drain, struct malloc_type *mt, int >>>> flags) >>>> +{ >>>> + struct mp_ring *r; >>>> + >>>> + /* All idx are 16b so size can be 65536 at most */ >>>> + if (pr == NULL || size < 2 || size > 65536 || drain == >>>> NULL || >>>> + can_drain == NULL) >>>> + return (EINVAL); >>>> + *pr = NULL; >>>> + flags &= M_NOWAIT | M_WAITOK; >>>> + MPASS(flags != 0); >>>> + >>>> + r = malloc(__offsetof(struct mp_ring, items[size]), mt, >>>> flags | >>>> M_ZERO); >>>> + if (r == NULL) >>>> + return (ENOMEM); >>>> + r->size = size; >>>> + r->cookie = cookie; >>>> + r->mt = mt; >>>> + r->drain = drain; >>>> + r->can_drain = can_drain; >>>> + r->enqueues = counter_u64_alloc(flags); >>>> + r->drops = counter_u64_alloc(flags); >>>> + r->starts = counter_u64_alloc(flags); >>>> + r->stalls = counter_u64_alloc(flags); >>>> + r->restarts = counter_u64_alloc(flags); >>>> + r->abdications = counter_u64_alloc(flags); >>>> + if (r->enqueues == NULL || r->drops == NULL || r->starts >>>> == NULL || >>>> + r->stalls == NULL || r->restarts == NULL || >>>> + r->abdications == NULL) { >>>> + mp_ring_free(r); >>>> + return (ENOMEM); >>>> + } >>>> + >>>> + *pr = r; >>>> + return (0); >>>> +} >>>> + >>>> +void >>>> + >>>> +mp_ring_free(struct mp_ring *r) >>>> +{ >>>> + >>>> + if (r == NULL) >>>> + return; >>>> + >>>> + if (r->enqueues != NULL) >>>> + counter_u64_free(r->enqueues); >>>> + if (r->drops != NULL) >>>> + counter_u64_free(r->drops); >>>> + if (r->starts != NULL) >>>> + counter_u64_free(r->starts); >>>> + if (r->stalls != NULL) >>>> + counter_u64_free(r->stalls); >>>> + if (r->restarts != NULL) >>>> + counter_u64_free(r->restarts); >>>> + if (r->abdications != NULL) >>>> + counter_u64_free(r->abdications); >>>> + >>>> + free(r, r->mt); >>>> +} >>>> + >>>> +/* >>>> + * Enqueue n items and maybe drain the ring for some time. >>>> + * >>>> + * Returns an errno. >>>> + */ >>>> +int >>>> +mp_ring_enqueue(struct mp_ring *r, void **items, int n, int >>>> budget) >>>> +{ >>>> + union ring_state os, ns; >>>> + uint16_t pidx_start, pidx_stop; >>>> + int i; >>>> + >>>> + MPASS(items != NULL); >>>> + MPASS(n > 0); >>>> + >>>> >>>> *** DIFF OUTPUT TRUNCATED AT 1000 LINES *** >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> -----------------------------------------+------------------------------- >>>> >>>> >>>> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >>>> dell'Informazione >>>> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >>>> TEL +39-050-2211611 . via Diotisalvi 2 >>>> Mobile +39-338-6809875 . 56122 PISA (Italy) >>>> -----------------------------------------+------------------------------- >>>> >>>> >> > >