Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Jul 2013 17:06:10 +0200
From:      Luigi Rizzo <rizzo@iet.unipi.it>
To:        Gleb Smirnoff <glebius@freebsd.org>
Cc:        freebsd-net@freebsd.org, Andre Oppermann <andre@freebsd.org>, Andriy Gapon <avg@freebsd.org>
Subject:   Re: Listen queue overflow: N already in queue awaiting acceptance
Message-ID:  <CA%2BhQ2%2Bj%2B-i63cyz-4b1widY-8Ht=puFwj%2BxH=ghMzptHe9k0Wg@mail.gmail.com>
In-Reply-To: <20130711145229.GB8839@glebius.int.ru>
References:  <51DE591E.7040405@FreeBSD.org> <51DE5C8C.3090404@freebsd.org> <20130711133504.GB67810@FreeBSD.org> <51DEC10B.3080409@freebsd.org> <CA%2BhQ2%2Bg%2BpbhJjZOv5x0MAHNy0X8VfKTymqt1BtcHYvkD=sGUdA@mail.gmail.com> <20130711145229.GB8839@glebius.int.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 11, 2013 at 4:52 PM, Gleb Smirnoff <glebius@freebsd.org> wrote:
> On Thu, Jul 11, 2013 at 04:49:25PM +0200, Luigi Rizzo wrote:
> L> >> IMO, this should be a single counter accessible via sysctl, with no
> L> >> printf(). Those, who need details on whether this is micro-burst or
> L> >> persistent condition, can run monitoring software that draws plots.
> L> >
> L> >
> L> > The single counter wouldn't tell you anything because it misses which
> L> > socket/accept queue is affected by the overflow.  The inpcb pointer
> L> > can be cross-refrenced with netstat -a.
> L> >
> L> > Andriy for example would never have found out about this problem other
> L> > than receiving vague user complaints about aborted connection attempts.
> L> > Maybe after spending many hours searching for the cause he may have
> L> > interfered from endless scrolling in Wireshark that something wasn't
> L> > right and blame syncache first.  Only later it would emerge that he's
> L> > either receiving too many connections or his application is too slow
> L> > dealing with incoming connections.
> L> >
> L> > If you can recommend a suitable and general sysadmin friendly monitoring
> L> > software that will point out this problem I'm all ears.
> L>
> L> the problem with these non-throttled messages is that they often
> L> cause thrashing -- you become slighly slow, messages start being
> L> generated and your system becomes a lot slower, making it hard
> L> to recover.
> L>
> L> What i usually do is throttle (in the kernel) and count the number of
> L> message suppressed. Something like this (in a macro):
> L>
> L> static int ctr, last_tick;
> L> if (ticks - last_tick > suppression_delay) {
> L>     printf("got this error ... (%d times)\n", ... , ctr);
> L>     ctr = 0;
> L>     last_tick = tick;
> L> } else {
> L>     ctr++;
> L> }
> L>
> L> the errors may not be exactly the same, the counter is race_prone
> L> (you can make it atomic if you really feel like) but the whole point is
> L> to get the idea that something is very wrong, not the exact count
> L> or pointer
>
> btw, there is ready function for that: ppsratecheck(), already utilized
> for suppressing some error messages.

yes, i think i saw it before. To me, the convenience of the macro is that
it can also wrap the declaration of the static variables and the printf.
I basically have macros like this (see sys/dev/netmap/netmap_kern.h)

     RD(max_pps, "printf format ", arguments....) // rate-limited printf

    ND(same arguments as above) // compiles to no-op

so i can quickly add the messages or disable them by simply changing
the macro name
FWIW the macro in netmap_kern.h does not have the counter of suppressed
messages (I just thought about it , but i should probably add it as a feature)

cheers
luigi

> --
> Totus tuus, Glebius.



-- 
-----------------------------------------+-------------------------------
 Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
 TEL      +39-050-2211611               . via Diotisalvi 2
 Mobile   +39-338-6809875               . 56122 PISA (Italy)
-----------------------------------------+-------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BhQ2%2Bj%2B-i63cyz-4b1widY-8Ht=puFwj%2BxH=ghMzptHe9k0Wg>