Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Nov 2012 00:04:49 -0800
From:      Luigi Rizzo <rizzo@iet.unipi.it>
To:        Andre Oppermann <andre@freebsd.org>
Cc:        Barney Cordoba <barney_cordoba@yahoo.com>, Adrian Chadd <adrian@freebsd.org>, khatfield@socllc.net, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Jim Thompson <jim@netgate.com>, Alfred Perlstein <bright@mu.org>
Subject:   Re: FreeBSD boxes as a 'router'...
Message-ID:  <CA%2BhQ2%2BiBLi%2B3_GAVBfiTMmiZPnqGRomXYABawPct4n7-Nrq2Ew@mail.gmail.com>
In-Reply-To: <50AC8393.3060001@freebsd.org>
References:  <1353448328.76219.YahooMailClassic@web121602.mail.ne1.yahoo.com> <E1F4816E-676C-4630-9FA1-817F737D007D@netgate.com> <50AC08EC.8070107@mu.org> <832757660.33924.1353460119408@238ae4dab3b4454b88aea4d9f7c372c1.nuevasync.com> <CAJ-Vmok8Ybdi%2BY8ZguMTKC7%2BF5=OxVDog27i4UgY-s3MCZkGcQ@mail.gmail.com> <250266404.35502.1353464214924@238ae4dab3b4454b88aea4d9f7c372c1.nuevasync.com> <50AC8393.3060001@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Nov 20, 2012 at 11:32 PM, Andre Oppermann <andre@freebsd.org> wrote:

> On 21.11.2012 03:16, khatfield@socllc.net wrote:
>
>> I may be misstating.
>>
>> Specifically under high burst floods either routed or being dropped by pf
>> we would see the system
>> go unresponsive to user-level applications / SSH for example.
>>
>> The system would still function but it was inaccessible. To clarify as
>> well this was any number
>> of floods or attacks to any ports, the behavior remained. (These were not
>> SSH ports being hit)
>>
>
> I'm working on a hybrid interrupt/polling with live-lock prevention
> scheme in my svn branch.  It works with a combination of disabling
> interrupts in interrupt context and then having an ithread loop over
> the RX DMA queue until it reaches the hardware and is done.  Only
> then interrupts are re-enabled again.  On a busy system it may never
> go back to interrupt.  To prevent live-lock the ithread gives up the
> CPU after a normal quantum to let other threads/processes run as well.
> After that it gets immediately re-scheduled again with a sufficient
> high priority not get starved out by userspace.
>
>
very cool. this seems similar to NAPI.
The only adjustment i'd suggest to your scheme, if possible, is to add
some control (as it existed in the old polling architecture) to make sure
that userspace is not starved by the ithreads and other kernel tasks
(otherwise you can have livelock, as it happens with NAPI).
I am afraid that simple priorities do not work, you either need some
kind of fair scheduler, or put some hard limit on the cpu fraction used
by the kernel tasks when there are userspace processes around.

cheers
luigi

With multiple RX queues and MSI-X interrupts as many ithreads as available
> cores can be run and none of them will live-lock.  I'm also looking at
> using the CoDel algorithm for totally maxed out systems to prevent long
> FIFO packet drop chains in the NIC.  Think of it as RED queue management
> but for the input queue.  That way we can use distributed single packet
> loss as a signalling mechanism for the senders to slow down.  For a
> misbehaving sender blasting away this obviously doesn't help much.  It
> improves the chance of good packets making it through though.
>
> While live-lock prevention is good you still won't be able to log in
> via ssh through an overloaded interface.  Any other interface will
> work w/o packet loss instead.
>
> So far I've fully converted fxp(4) to this new scheme because it is one
> of the simpler drivers with sufficient documentation.  And 100Mbit is
> easy to saturate.
>
> The bge(4) driver is mostly converted but not tested due to lack of
> hardware, which should arrive later this week though.
>
> The em(4), and with it due to similarity igb(4) and ixgbe(4) family,
> is in the works as well.  Again hardware is on the way for testing.
>
> When this work has stabilized I'm looking for testers to put it through
> the paces.  If you're interested and have a suitable test bed then drop
> me an email to get notified.
>
> --
> Andre
>
>
>  Now we did a lot of sysctl resource tuning to correct this with some
>> floods but high rate would
>> still cause the behavior. Other times the system would simply drop all
>> traffic (like a buffer
>> filled or max connections) but it was not either case.
>>
>> The attacks were also well within bandwidth capabilities for the pipe and
>> network gear.
>>
>> All of these issues stopped upon adding polling or the overall threshold
>> was increased
>> tremendously with polling.
>>
>> Yet, polling has some downsides not necessarily due to FreeBSD but
>> application issues. Haproxy is
>> one example where we had handshake/premature connections terminated with
>> polling. Those issues
>> were not present with polling disabled.
>>
>> So that is my reasoning for saying that it was perfect for some things
>> and not for others.
>>
>> In the end, we spent years tinkering and it was always satisfactory but
>> never perfect. Finally we
>> grew to the point of replacing the edge with MX80's and left BSD to load
>> balancing and the like.
>> This finally resolved all issues for us.
>>
>> Albeit, we were a DDoS mitigation company running high PPS and lots of
>> bursting. BSD was
>> beautiful until we ended up needing 10Gps+ on the edge and it was time to
>> go Juniper.
>>
>> I still say BSD took us from nothing to a $30M company. So despite
>> something's requiring
>> tinkering with I think it is still worth the effort to put in the testing
>> to find what is best
>> for your gear and environment.
>>
>> I got off-track but we did find one other thing. We found ipfw did seem
>> to reduce load on the
>> interrupts (likely because we couldn't do near the scrubbing with it vs
>> pf) at any rate less
>> filtering may also fix the issue with the op.
>>
>> Your forwarding - we found doing forwarding via a simple pf rule and a
>> GRE tunnel to an app
>> server or by using a tool like haproxy on the router itself seemed to
>> reduce a large majority of
>> our original stability issues (verses pure fw-based packet forwarding)
>>
>> *I also agree because as I mentioned in a previous email... (To me) our
>> overall PPS seemed to
>> decrease from FBSD 7 to 9. No idea why but we seemed to begin having less
>> effect with polling as
>> we seemed to get with polling on 7.4.
>>
>> Not to say that this wasn't due to error on our part  or some issue with
>> the Juniper switches but
>> we seemed to just run into more issues with newer releases when it came
>> to performance with Intel
>> 1Gbps NIC's. this later caused us to move more app servers to Linux
>> because we never could get to
>> the bottom of some of those things. We do intend to revisit BSD with our
>> new CDN company to see
>> if we can restandardize it for high volume traffic servers.
>>
>> Best, Kevin
>>
>>
>>
>> On Nov 20, 2012, at 7:19 PM, "Adrian Chadd" <adrian@freebsd.org> wrote:
>>
>>  Ok, so since people are talking about it, and i've been knee deep in at
>>> least the older intel
>>> gige interrupt moderation - at maximum pps, how exactly is the interrupt
>>> moderation giving you
>>> a livelock scenario?
>>>
>>> The biggest benefit I found when doing some forwarding work a few years
>>> ago was to write a
>>> little daemon that actually sat there and watched the interrupt rates
>>> and packet drop rates
>>> per-interface - and then tuned the interrupt moderation parameters to
>>> suit. So at the highest
>>> pps rates I wasn't swamped with interrupts.
>>>
>>> I think polling here is hiding some poor choices in driver design and
>>> network stack design..
>>>
>>>
>>>
>>> adrian
>>>
>> ______________________________**_________________ freebsd-net@freebsd.orgmailing list
>> http://lists.freebsd.org/**mailman/listinfo/freebsd-net<http://lists.freebsd.org/mailman/listinfo/freebsd-net>To unsubscribe, send any mail to
>> "freebsd-net-unsubscribe@**freebsd.org<freebsd-net-unsubscribe@freebsd.org>
>> "
>>
>>
>>
> ______________________________**_________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-net<http://lists.freebsd.org/mailman/listinfo/freebsd-net>;
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@**freebsd.org<freebsd-net-unsubscribe@freebsd.org>
> "
>



-- 
-----------------------------------------+-------------------------------
 Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
 TEL      +39-050-2211611               . via Diotisalvi 2
 Mobile   +39-338-6809875               . 56122 PISA (Italy)
-----------------------------------------+-------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BhQ2%2BiBLi%2B3_GAVBfiTMmiZPnqGRomXYABawPct4n7-Nrq2Ew>