FreeBSD Mail Archives

Date:      Thu, 11 Jul 2013 18:52:29 +0400
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        freebsd-net@freebsd.org, Andre Oppermann <andre@freebsd.org>, Andriy Gapon <avg@freebsd.org>
Subject:   Re: Listen queue overflow: N already in queue awaiting acceptance
Message-ID:  <20130711145229.GB8839@glebius.int.ru>
In-Reply-To: <CA%2BhQ2%2Bg%2BpbhJjZOv5x0MAHNy0X8VfKTymqt1BtcHYvkD=sGUdA@mail.gmail.com>
References:  <51DE591E.7040405@FreeBSD.org> <51DE5C8C.3090404@freebsd.org> <20130711133504.GB67810@FreeBSD.org> <51DEC10B.3080409@freebsd.org> <CA%2BhQ2%2Bg%2BpbhJjZOv5x0MAHNy0X8VfKTymqt1BtcHYvkD=sGUdA@mail.gmail.com>

On Thu, Jul 11, 2013 at 04:49:25PM +0200, Luigi Rizzo wrote:
L> >> IMO, this should be a single counter accessible via sysctl, with no
L> >> printf(). Those, who need details on whether this is micro-burst or
L> >> persistent condition, can run monitoring software that draws plots.
L> >
L> >
L> > The single counter wouldn't tell you anything because it misses which
L> > socket/accept queue is affected by the overflow.  The inpcb pointer
L> > can be cross-refrenced with netstat -a.
L> >
L> > Andriy for example would never have found out about this problem other
L> > than receiving vague user complaints about aborted connection attempts.
L> > Maybe after spending many hours searching for the cause he may have
L> > interfered from endless scrolling in Wireshark that something wasn't
L> > right and blame syncache first.  Only later it would emerge that he's
L> > either receiving too many connections or his application is too slow
L> > dealing with incoming connections.
L> >
L> > If you can recommend a suitable and general sysadmin friendly monitoring
L> > software that will point out this problem I'm all ears.
L> 
L> the problem with these non-throttled messages is that they often
L> cause thrashing -- you become slighly slow, messages start being
L> generated and your system becomes a lot slower, making it hard
L> to recover.
L> 
L> What i usually do is throttle (in the kernel) and count the number of
L> message suppressed. Something like this (in a macro):
L> 
L> static int ctr, last_tick;
L> if (ticks - last_tick > suppression_delay) {
L>     printf("got this error ... (%d times)\n", ... , ctr);
L>     ctr = 0;
L>     last_tick = tick;
L> } else {
L>     ctr++;
L> }
L> 
L> the errors may not be exactly the same, the counter is race_prone
L> (you can make it atomic if you really feel like) but the whole point is
L> to get the idea that something is very wrong, not the exact count
L> or pointer

btw, there is ready function for that: ppsratecheck(), already utilized
for suppressing some error messages.

-- 
Totus tuus, Glebius.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130711145229.GB8839>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation