Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Apr 2016 12:34:44 -0300
From:      Denis Pearson <dennix.pearson@gmail.com>
To:        Ze Claudio Pastore <zclaudio@bsd.com.br>
Cc:        Alan Somers <asomers@freebsd.org>,  "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: Best option to process packet ACL
Message-ID:  <CAFC3-mTBV1Vk4PbxWgv9VX4O82Cwsj_UqUGYWWVmS%2BwxEfrGAw@mail.gmail.com>
In-Reply-To: <CAEGk6G6uy0n8VEY1qtH8x%2B%2Bh7523YYyWLwNwrMq4O36s33o0-g@mail.gmail.com>
References:  <CAEGk6G4aMU_qxDMb3tBqyLNmUNqd3%2BRjDRZ29wMx7pK_w=kkJg@mail.gmail.com> <CAOtMX2h8tRtGeTLageLWiiXAi-Ap4Q8jqWFD2uiCtF1uCzSmOA@mail.gmail.com> <CAEGk6G6uy0n8VEY1qtH8x%2B%2Bh7523YYyWLwNwrMq4O36s33o0-g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Apr 28, 2016 at 2:20 PM, Ze Claudio Pastore <zclaudio@bsd.com.br
<javascript:_e(%7B%7D,'cvml','zclaudio@bsd.com.br');>> wrote:

> Because actually, this is ot a packet firewall.
>
> When I mentioned pf/ipfw is only to reffer to ideas on how to best match
> each acl criteria.
>
> But my userland application is a proxy, ACL will handle L7 requests withi=
n
> the packets. I will filter based on the mentioned criteria but it will be
> processed at a different moment unrelated to packet in kernel. It's also
> DPDK enabled so it's mostly skipping the whole kernel.
>

AFAIK, you are on your own. Both PF, IPFW and Iptables/NETFILTER if I am
not wrong are essentially monothread. Except that each packet is usually
processed by kernel leveraging on interrupt triggering, which might in this
case be multithreaded, but the efficiency depends on how the kernel balance
interrupt on SMP systems which on it's hand, depends on each interface
vendor and driver. For example, adaptive interrupt, etc. Each vendor and
NIC does this differently and advantage from multithreaded drivers *might*
benefit firewalling indirectlly, but the firewall code is in fact
monothread.

If I recall well, pf does a couple things multithread on FreeBSD, like
state processing, purge, loading flushing and something pfsync related,
while ipfw does table radix processing (some sort of) and dummynet
enqueuing. Everything else is leverage on how interruption is distributed
on the system.

So a general algo for YouTube ACL would first look into information which
won't match (match has less probability). You are looking for a FALSE here,
the first FALSE you get and the sooner you get it, better, you skip that
rule and try to match the next one. More common rules should be tested
first since you mentioned first match wins. I would first evaluate src-ip
(32bits/128bits or a CIDR to test), src-port after that, dst-ip and
dst-port after that, finally packet len and proto. Criterias unset on each
rule must be skipped (never tested since they will always return TRUE).
Remember to add a counter to help the user see the more common rules and
try to move it to the top of the list.

For threads I would have each packet evaluated in a different thread but
all test criterias tested in the same thread, and limit the number of
threads. Set a common default based on the hw.ncpu, let the user tune it as
it's done on nginx/apache (threaded workers)/suricata for example. You said
it's not a packet but a request, I am assuming testing criterias would take
some time not to let it serial. But you have a tradeoff to measure first,
how long it takes to create a thread, join it, in nanoseconds? I can't
tell, and how long it will take for you to test all your criterias in
serie? Because you will be spawning threads up to N threads which might be
(NCPU*N)=3DX and therefore it will cost Xns time on the parent thread to
spawn and get response from children. The tradeoff must be good when
comparing to serial testing.

Some people also prefer to split RX and TX in different threads, this is
how ACL is processed on vRouter for example and I believe this is how Luigi
Rizzo did it on tlem.c if I remember well. I don't know if you have
different criterias for IN vs OUT acl requests, if you do, splitting this
horizon would be something else to investigante.

I think one thread per test criteria is difficult to manage and possibly
costy (several instructions to manage it). One thread per batch of rules
looks better, while my suggestion is to have one thread per packet testing
all criteria on all rules. I am sure you have to investigate the tradeoff.

Finally, can't you simply have multiple (but fewer) threads of your
application as a whole, leveraging on all CPUs and balacing the load among
'em? So if you have 4 CPU you will have 4 main working threads (and maybe
one extra control thread to balance and handle the load distribution,
signal, configurations, etc), and you would implement ACL evaluation
criterias in series instead of multithread.

Because I am really afraid of your cost to spawn/join/manage such a huge
amount of threads. Doesn't matter if it's 200Kpps, 10Kpps, 1Kpps, it's
still a LOT of pps to handle in different threads IMHO (since it's
potentially ((packets*(test_criteria*num_rules))/second).

And maybe you will need to use mutex locks, semaphores or events since you
will limit the number of threads and therefore at some time your parent
thread will say "hey, hold on, let's wait for some slots to free" for
Packers arriving on a quepe before ACL evaluation.


>
>
>
> 2016-04-28 11:50 GMT-03:00 Alan Somers <asomers@freebsd.org
> <javascript:_e(%7B%7D,'cvml','asomers@freebsd.org');>>:
>
> > On Wed, Apr 27, 2016 at 1:21 PM, Z=C3=A9 Claudio Pastore <zclaudio@bsd.=
com.br
> <javascript:_e(%7B%7D,'cvml','zclaudio@bsd.com.br');>>
> > wrote:
> >
> >> Hello everyone,
> >>
> >> I would like to hear your suggestion regarding the best approach to
> >> process
> >> IP packets for filtering, in such a way I can avoid lowering my pps
> rate.
> >>
> >> Today a have a simple application proxies http application. It's dual
> >> threaded on a 4 core system with low CPU power. The current applicatio=
n
> >> uses two threads, one for control and one for data flow processing.
> >>
> >> I need to implement a simple set of stateless filtering, I will proces=
s
> >> only:
> >>
> >> - src-ip
> >> - dst-ip
> >> - src-port
> >> - dst-port
> >> - iplen
> >> - proto (tcp/udp/other)
> >>
> >> My current rate of requests per second is high, around 200K. I have no
> >> idea
> >> how I can leverage the IDLE CPUs the best way to implement this ACL
> >> filtering trying not to impact on the pps rate I have today.
> >>
> >> I have implemented it serial today (not threaded) and I get 40%
> >> performance
> >> loss. I will handle max 128 filter rules, this is a decision which is
> >> made.
> >> This is going to be first match wins.
> >>
> >> My current plans are to test:
> >>
> >> 1) Create 6 threads, one to test each aspect of the ACL (src-ip, dst-i=
p,
> >> etc) the first thread that returns false to parent thread I stop
> >> processing
> >> that rule and go to the next, and tell all other threads to die/exit
> since
> >> they don't matter anymore.
> >>
> >> 2) Create one thread to process a batch of rules, say, 8 rules per
> thread
> >> per request. Don't know if I would limit total number of threads and
> lock
> >> requests while threads ar e busy.
> >>
> >> 3) Someone suggested "do as pf/ipfw do" but I have no idea how it's
> done,
> >> how multithreaded it is and what is done on each thread.
> >>
> >> 4) Other suggestion?
> >>
> >> This is going to run FreeBSD 11, I use libevent2 on the current
> >> application
> >> so far.
> >>
> >> Thanks.
> >> _______________________________________________
> >>
> >>
> > Is there some reason why you can't simply use pf or ipfw?  ipfw can do
> > everything you described.  pf can do most of it, but I'm not sure if pf
> can
> > filter on iplen.  If I were you, I wouldn't attempt to write my own
> > userland firewall until I was absolutely sure that neither pf nor ipfw
> > would work.  If that's the case, then I would try using diverter socket=
s.
> > With a diverter socket, pf or ipfw does most of the work, but when it
> > encounters a packet it can't process it pushes it up to a userland
> helper.
> > The userland helper processes the packet and then tells pf or ipfw what
> to
> > do with it.  In realistic applications, pf or ipfw also creates a
> temporary
> > rule based on the userland helper's decision.  Applying the temporary
> rule
> > in the future is far faster than invoking the userland helper.  After a
> > certain amount of time, the temporary rule will expire again.
> >
> >
> > Here's an example in action:
> > http://daemonforums.org/showthread.php?t=3D8846
> >
> > -Alan
> >
> _______________________________________________
> freebsd-hackers@freebsd.org
> <javascript:_e(%7B%7D,'cvml','freebsd-hackers@freebsd.org');> mailing lis=
t
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org
> <javascript:_e(%7B%7D,'cvml','freebsd-hackers-unsubscribe@freebsd.org');>=
"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFC3-mTBV1Vk4PbxWgv9VX4O82Cwsj_UqUGYWWVmS%2BwxEfrGAw>