From owner-freebsd-net@FreeBSD.ORG Fri Mar 11 15:44:53 2005 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E488416A4CE; Fri, 11 Mar 2005 15:44:52 +0000 (GMT) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 66E3C43D5A; Fri, 11 Mar 2005 15:44:52 +0000 (GMT) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.12.8) with ESMTP id j2BFinVU070807; Fri, 11 Mar 2005 07:44:49 -0800 (PST) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id j2BFinO2070806; Fri, 11 Mar 2005 07:44:49 -0800 (PST) (envelope-from rizzo) Date: Fri, 11 Mar 2005 07:44:48 -0800 From: Luigi Rizzo To: Gleb Smirnoff Message-ID: <20050311074448.A70328@xorpc.icir.org> References: <200503011642.48813.jhb@FreeBSD.org> <20050301162949.A64187@xorpc.icir.org> <20050311110234.GA87255@cell.sick.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20050311110234.GA87255@cell.sick.ru>; from glebius@freebsd.org on Fri, Mar 11, 2005 at 02:02:34PM +0300 cc: dima <_pppp@mail.ru> cc: pjd@freebsd.org cc: John Baldwin cc: rwatson@freebsd.org cc: net@freebsd.org Subject: Re: Giant-free polling [PATCH] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2005 15:44:53 -0000 while i am happy to see interest on having this supported, and while I understand the excitement in firing the editor and writing code, i don't think this approach of hacking up some patch that allows multiple poll* instances to run without panicing the box, or discussing the performance of lists vs arrays vs IFF_* flags will lead to anything. As I asked this already, consider this msg a rant, for archival purposes, calling for a sane approach to the problem. Polling in the UP case is beneficial because it has some features listed e.g. in http://docs.freebsd.org/cgi/getmsg.cgi?fetch=128950+0+/usr/local/www/db/text/2005/freebsd-net/20050306.freebsd-net For the SMP case there are the basic issues to handle (proper locking, preventing deadlocks) but also performance issues e.g. to avoid that multiple polling instances just fight for the lock but do nothing useful, see e.g. the case described in http://docs.freebsd.org/cgi/getmsg.cgi?fetch=71980+0+/usr/local/www/db/text/2005/freebsd-net/20050306.freebsd-net Now, if you want to approach the issue seriously, you need to: 1) _design_ a locking strategy for SMP polling -- design meaning provide a description in english, even a sketchy one, which will eventually end up in a manpage such as polling(4), or in the source file as a comment. Certainly a design is not C code or patches without comments from which one should figure out the overall picture. 2) evaluate how the design compares in terms of features (e.g. those listed in the first reference above) to the UP case. I am not saying one should implement all of them, but at least it must be clear how the two things are different, if they are. 3) evaluate how the design addresses (in terms of performance) the contention problems, of which the second reference above is probably just one but at least a very obvious one; 4) implement it -- and if you want it to be reviewed, please post the full code, not just differences -- the code will be terribly different to be human readable in terms of a diff; 5) optimise it -- e.g. deal with arrays vs lists, implicit vs explicit flags, etc. It seems to me that this whole discussion is proceding from step 4 and ignoring the first 3 steps. Please, step back from your keyboards and move to the whiteboard, if nothing else to prove me that you have already addressed all the issues. cheers luigi On Fri, Mar 11, 2005 at 02:02:34PM +0300, Gleb Smirnoff wrote: > Collegues, > > On Tue, Mar 01, 2005 at 04:29:49PM -0800, Luigi Rizzo wrote: > L> [cc-ing net@freebsd.org to get more opinions] > > this good, that you have CC'ed net@, otherwise we would continue > working in parallel without collaboration. > > Here is attached patch made in a collaboration by ru, pjd and me. > > We use separate mutex for polling, and we drop it before calling the > poll handler to avoid LOR and contention on this mutex. > > We have definite evidence that now idle_poll and ISR poll can run > in parallel: if we remove PRF_RUNNING flag check, we got panics > because poll handlers of many drivers (e.g. if_em) are not reentrable. > > I think this patch is a step forward in direction of parallel polling > on SMP, since we now have two polling instances (ISR + idle) working > in parallel. > > -- > Totus tuus, Glebius. > GLEBIUS-RIPN GLEB-RIPE > Index: kern_poll.c > =================================================================== > RCS file: /home/ncvs/src/sys/kern/kern_poll.c,v > retrieving revision 1.19 > diff -u -r1.19 kern_poll.c > --- kern_poll.c 25 Feb 2005 22:07:50 -0000 1.19 > +++ kern_poll.c 11 Mar 2005 10:49:47 -0000 > @@ -33,6 +33,7 @@ > #include > #include /* needed by net/if.h */ > #include > +#include > > #include /* for IFF_* flags */ > #include /* for NETISR_POLL */ > @@ -144,7 +145,7 @@ > SYSCTL_INT(_kern_polling, OID_AUTO, residual_burst, CTLFLAG_RW, > &residual_burst, 0, "# of residual cycles in burst"); > > -static u_int32_t poll_handlers; /* next free entry in pr[]. */ > +static u_int32_t poll_handlers; > SYSCTL_UINT(_kern_polling, OID_AUTO, handlers, CTLFLAG_RD, > &poll_handlers, 0, "Number of registered poll handlers"); > > @@ -169,20 +170,33 @@ > &idlepoll_sleeping, 0, "idlepoll is sleeping"); > > > -#define POLL_LIST_LEN 128 > struct pollrec { > poll_handler_t *handler; > struct ifnet *ifp; > +#define PRF_RUNNING 0x1 > +#define PRF_PINNED 0x2 > + uint32_t flags; > + LIST_ENTRY(pollrec) next; > }; > > -static struct pollrec pr[POLL_LIST_LEN]; > +#define PR_VALID(pr) ((pr)->handler != NULL && \ > + !((pr)->flags & PRF_RUNNING) && \ > + ((pr)->ifp->if_flags & (IFF_UP|IFF_RUNNING)) == \ > + (IFF_UP|IFF_RUNNING)) > + > +static LIST_HEAD(, pollrec) poll_list = LIST_HEAD_INITIALIZER(&poll_list); > + > +static struct mtx poll_mtx; > > static void > init_device_poll(void) > { > > - netisr_register(NETISR_POLL, (netisr_t *)netisr_poll, NULL, 0); > - netisr_register(NETISR_POLLMORE, (netisr_t *)netisr_pollmore, NULL, 0); > + mtx_init(&poll_mtx, "polling", NULL, MTX_DEF); > + netisr_register(NETISR_POLL, (netisr_t *)netisr_poll, NULL, > + NETISR_MPSAFE); > + netisr_register(NETISR_POLLMORE, (netisr_t *)netisr_pollmore, NULL, > + NETISR_MPSAFE); > } > SYSINIT(device_poll, SI_SUB_CLOCKS, SI_ORDER_MIDDLE, init_device_poll, NULL) > > @@ -244,17 +258,26 @@ > void > ether_poll(int count) > { > - int i; > - > - mtx_lock(&Giant); > + struct pollrec *pr, *pr1; > > if (count > poll_each_burst) > count = poll_each_burst; > - for (i = 0 ; i < poll_handlers ; i++) > - if (pr[i].handler && (IFF_UP|IFF_RUNNING) == > - (pr[i].ifp->if_flags & (IFF_UP|IFF_RUNNING)) ) > - pr[i].handler(pr[i].ifp, 0, count); /* quick check */ > - mtx_unlock(&Giant); > + > + mtx_lock(&poll_mtx); > + LIST_FOREACH_SAFE(pr, &poll_list, next, pr1) { > + if (PR_VALID(pr)) { > + pr->flags |= PRF_RUNNING; > + if (pr1 != NULL) > + pr1->flags |= PRF_PINNED; > + mtx_unlock(&poll_mtx); > + pr->handler(pr->ifp, 0, count); > + mtx_lock(&poll_mtx); > + if (pr1 != NULL) > + pr1->flags &= ~PRF_PINNED; > + pr->flags &= ~PRF_RUNNING; > + } > + } > + mtx_unlock(&poll_mtx); > } > > /* > @@ -320,17 +343,17 @@ > > /* > * netisr_poll is scheduled by schednetisr when appropriate, typically once > - * per tick. It is called at splnet() so first thing to do is to upgrade to > - * splimp(), and call all registered handlers. > + * per tick. > */ > static void > netisr_poll(void) > { > + struct pollrec *pr, *pr1; > static int reg_frac_count; > - int i, cycles; > + int cycles; > enum poll_cmd arg = POLL_ONLY; > - mtx_lock(&Giant); > > + mtx_lock(&poll_mtx); > phase = 3; > if (residual_burst == 0) { /* first call in this tick */ > microuptime(&poll_start_t); > @@ -372,25 +395,36 @@ > residual_burst -= cycles; > > if (polling) { > - for (i = 0 ; i < poll_handlers ; i++) > - if (pr[i].handler && (IFF_UP|IFF_RUNNING) == > - (pr[i].ifp->if_flags & (IFF_UP|IFF_RUNNING)) ) > - pr[i].handler(pr[i].ifp, arg, cycles); > + LIST_FOREACH_SAFE(pr, &poll_list, next, pr1) { > + if (PR_VALID(pr)) { > + pr->flags |= PRF_RUNNING; > + mtx_unlock(&poll_mtx); > + pr->handler(pr->ifp, arg, cycles); > + mtx_lock(&poll_mtx); > + pr->flags &= ~PRF_RUNNING; > + } else if (pr->handler == NULL && > + !(pr->flags & PRF_PINNED)) { > + LIST_REMOVE(pr, next); > + free(pr, M_TEMP); > + } > + } > } else { /* unregister */ > - for (i = 0 ; i < poll_handlers ; i++) { > - if (pr[i].handler && > - pr[i].ifp->if_flags & IFF_RUNNING) { > - pr[i].ifp->if_flags &= ~IFF_POLLING; > - pr[i].handler(pr[i].ifp, POLL_DEREGISTER, 1); > + LIST_FOREACH_SAFE(pr, &poll_list, next, pr1) { > + if (pr->handler != NULL && > + pr->ifp->if_flags & IFF_RUNNING) { > + pr->ifp->if_flags &= ~IFF_POLLING; > + mtx_unlock(&poll_mtx); > + pr->handler(pr->ifp, POLL_DEREGISTER, 1); > + mtx_lock(&poll_mtx); > } > - pr[i].handler=NULL; > + pr->handler = NULL; > } > residual_burst = 0; > poll_handlers = 0; > } > /* on -stable, schednetisr(NETISR_POLLMORE); */ > phase = 4; > - mtx_unlock(&Giant); > + mtx_unlock(&poll_mtx); > } > > /* > @@ -399,12 +433,12 @@ > * A device is not supposed to register itself multiple times. > * > * This is called from within the *_intr() functions, so we do not need > - * further locking. > + * further ifnet locking. > */ > int > ether_poll_register(poll_handler_t *h, struct ifnet *ifp) > { > - int s; > + struct pollrec *pr; > > if (polling == 0) /* polling disabled, cannot register */ > return 0; > @@ -415,30 +449,27 @@ > if (ifp->if_flags & IFF_POLLING) /* already polling */ > return 0; > > - s = splhigh(); > - if (poll_handlers >= POLL_LIST_LEN) { > - /* > - * List full, cannot register more entries. > - * This should never happen; if it does, it is probably a > - * broken driver trying to register multiple times. Checking > - * this at runtime is expensive, and won't solve the problem > - * anyways, so just report a few times and then give up. > - */ > - static int verbose = 10 ; > - splx(s); > - if (verbose >0) { > - printf("poll handlers list full, " > - "maybe a broken driver ?\n"); > - verbose--; > + mtx_lock(&poll_mtx); > + LIST_FOREACH(pr, &poll_list, next) > + if (pr->ifp == ifp && pr->handler != NULL) { > + mtx_unlock(&poll_mtx); > + log(LOG_DEBUG, "ether_poll_register: %s: handler" > + " already registered\n", ifp->if_xname); > + return (0); > } > - return 0; /* no polling for you */ > + pr = malloc(sizeof(*pr), M_TEMP, M_NOWAIT); > + if (pr == NULL) { > + mtx_unlock(&poll_mtx); > + log(LOG_DEBUG, "ether_poll_register: malloc() failed\n"); > + return (0); > } > > - pr[poll_handlers].handler = h; > - pr[poll_handlers].ifp = ifp; > + pr->handler = h; > + pr->ifp = ifp; > + LIST_INSERT_HEAD(&poll_list, pr, next); > poll_handlers++; > ifp->if_flags |= IFF_POLLING; > - splx(s); > + mtx_unlock(&poll_mtx); > if (idlepoll_sleeping) > wakeup(&idlepoll_sleeping); > return 1; /* polling enabled in next call */ > @@ -453,29 +484,24 @@ > int > ether_poll_deregister(struct ifnet *ifp) > { > - int i; > + struct pollrec *pr; > > - mtx_lock(&Giant); > if ( !ifp || !(ifp->if_flags & IFF_POLLING) ) { > - mtx_unlock(&Giant); > return 0; > } > - for (i = 0 ; i < poll_handlers ; i++) > - if (pr[i].ifp == ifp) /* found it */ > - break; > - ifp->if_flags &= ~IFF_POLLING; /* found or not... */ > - if (i == poll_handlers) { > - mtx_unlock(&Giant); > - printf("ether_poll_deregister: ifp not found!!!\n"); > - return 0; > - } > - poll_handlers--; > - if (i < poll_handlers) { /* Last entry replaces this one. */ > - pr[i].handler = pr[poll_handlers].handler; > - pr[i].ifp = pr[poll_handlers].ifp; > - } > - mtx_unlock(&Giant); > - return 1; > + mtx_lock(&poll_mtx); > + LIST_FOREACH(pr, &poll_list, next) > + if (pr->ifp == ifp && pr->handler != NULL) { > + pr->handler = NULL; > + poll_handlers--; > + mtx_unlock(&poll_mtx); > + ifp->if_flags &= ~IFF_POLLING; > + return (1); > + } > + mtx_unlock(&poll_mtx); > + log(LOG_DEBUG, "ether_poll_deregister: %s: not found!!!\n", > + ifp->if_xname); > + return (0); > } > > static void > @@ -495,10 +521,7 @@ > for (;;) { > if (poll_in_idle_loop && poll_handlers > 0) { > idlepoll_sleeping = 0; > - mtx_lock(&Giant); > ether_poll(poll_each_burst); > - mtx_unlock(&Giant); > - mtx_assert(&Giant, MA_NOTOWNED); > mtx_lock_spin(&sched_lock); > mi_switch(SW_VOL, NULL); > mtx_unlock_spin(&sched_lock); > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"