Date: Thu, 11 Jul 2013 18:52:29 +0400 From: Gleb Smirnoff <glebius@FreeBSD.org> To: Luigi Rizzo <rizzo@iet.unipi.it> Cc: freebsd-net@freebsd.org, Andre Oppermann <andre@freebsd.org>, Andriy Gapon <avg@freebsd.org> Subject: Re: Listen queue overflow: N already in queue awaiting acceptance Message-ID: <20130711145229.GB8839@glebius.int.ru> In-Reply-To: <CA%2BhQ2%2Bg%2BpbhJjZOv5x0MAHNy0X8VfKTymqt1BtcHYvkD=sGUdA@mail.gmail.com> References: <51DE591E.7040405@FreeBSD.org> <51DE5C8C.3090404@freebsd.org> <20130711133504.GB67810@FreeBSD.org> <51DEC10B.3080409@freebsd.org> <CA%2BhQ2%2Bg%2BpbhJjZOv5x0MAHNy0X8VfKTymqt1BtcHYvkD=sGUdA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 11, 2013 at 04:49:25PM +0200, Luigi Rizzo wrote: L> >> IMO, this should be a single counter accessible via sysctl, with no L> >> printf(). Those, who need details on whether this is micro-burst or L> >> persistent condition, can run monitoring software that draws plots. L> > L> > L> > The single counter wouldn't tell you anything because it misses which L> > socket/accept queue is affected by the overflow. The inpcb pointer L> > can be cross-refrenced with netstat -a. L> > L> > Andriy for example would never have found out about this problem other L> > than receiving vague user complaints about aborted connection attempts. L> > Maybe after spending many hours searching for the cause he may have L> > interfered from endless scrolling in Wireshark that something wasn't L> > right and blame syncache first. Only later it would emerge that he's L> > either receiving too many connections or his application is too slow L> > dealing with incoming connections. L> > L> > If you can recommend a suitable and general sysadmin friendly monitoring L> > software that will point out this problem I'm all ears. L> L> the problem with these non-throttled messages is that they often L> cause thrashing -- you become slighly slow, messages start being L> generated and your system becomes a lot slower, making it hard L> to recover. L> L> What i usually do is throttle (in the kernel) and count the number of L> message suppressed. Something like this (in a macro): L> L> static int ctr, last_tick; L> if (ticks - last_tick > suppression_delay) { L> printf("got this error ... (%d times)\n", ... , ctr); L> ctr = 0; L> last_tick = tick; L> } else { L> ctr++; L> } L> L> the errors may not be exactly the same, the counter is race_prone L> (you can make it atomic if you really feel like) but the whole point is L> to get the idea that something is very wrong, not the exact count L> or pointer btw, there is ready function for that: ppsratecheck(), already utilized for suppressing some error messages. -- Totus tuus, Glebius.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130711145229.GB8839>