Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Feb 2003 22:24:39 -0700 (MST)
From:      Alex Rousskov <rousskov@measurement-factory.com>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Pawel Jakub Dawidek <nick@garage.freebsd.pl>, Scott Long <scott_long@btc.adaptec.com>, Sam Leffler <sam@errno.com>, Brad Knowles <brad.knowles@skynet.be>, freebsd-current@freebsd.org
Subject:   Re: Polygraph Considered Evil 8^) (was: Re: 5-STABLE Roadmap)
Message-ID:  <Pine.BSF.4.53.0302171357490.61629@measurement-factory.com>
In-Reply-To: <3E511E7A.8225ABA9@mindspring.com>
References:  <20030216184257.GZ10767@garage.freebsd.pl> <3E4FFDD3.9050802@btc.adaptec.com> <20030216214322.GB10767@garage.freebsd.pl> <Pine.BSF.4.53.0302162130370.46493@measurement-factory.com> <3E511E7A.8225ABA9@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 17 Feb 2003, Terry Lambert wrote:

> First, I just have a slight editorial comment, about cheating on
> Polygraph.

Terry,

	This is not the place to start a long discussion about our
Polygraph testing methodology, but I have to say, with all due
respect, that many of your statements are either misleading or based
on misinformation about Web Polygraph and the way standard tests are
executed. I have to respond because I both love and understand cache
benchmarking. I apologize to the majority of the audience for what may
be considered an out-of-scope thread.

> One issue I have with Polygraph is that it intentionally works for a
> very long time to get worst case performance out of caches;
> basically, it cache-busts on purpose.  Then the test runs.

This is plain wrong. I assume that you are referring to PolyMix
workloads that have a filling-the-cache phase and measurement phases.
Filling the cache phase does not bust the cache. Its primary purpose
is to bring cache's storage to a steady state (hopefully). If you
tested many caches, including Squid, then you know that cache
performance "on an empty stomach" often differs from sustained
performance by 50%. Since we must start from scratch, we must pump
enough data to approach steady state.

You might have been misinformed that all the fill objects are used
during the measurement phases; this is not true. Polygraph keeps the
size of the working set constant.  That size is usually much smaller
than the amount of traffic during the fill phase. Again, the fill
phase is there to reach a steady state after you start with an empty
disk.

> This seems to be an editorial comment on end-to-end guarantees, much
> more than it seems a valid measurement of actual cache performance.

Not sure what end-to-end guarantees you are referring here.

> If you change squid to force a random page preplacement, then you
> end up with a bounded worst case which is a better number than you
> would be able to get with your best (in terms of the real-world
> performance) algorithm (e.g. LRU or whatever), because you make it
> arbitrarily hard to characterize what that would be.

Random page replacement should not cause better performance, Polygraph
simulates hot subsets (aka flash crowds), which you would not be able
to take advantage of if you replace randomly. Also, random replacement
will lose partial advantages of temporal locality that Polygraph also
simulates (e.g., same HTML containers have same images).

> NetApp has a tunable in their cache product which might as well be
> labelled "get a good Polygraph score"; all it does is turn on random
> page replacement, so that the Polygraph code is unable to
> characterize "what would constitute worst case performance on this
> cache?", and then intentionally exercise that code path, which is
> what it would do, otherwise (i.e. pick a working set slightly larger
> than the cache size so everythings a miss, etc.).

I am unaware of any tunables of that kind. Moreover, I suspect they
simply would not work (see above). Are you a rich? If not, you may
want to sell a proof of the above to NetApp competitor. I, myself,
would be very interested to hear it as well. Keep in mind that NetApp
and most other vendors use Polygraph for day-to-day regression tests
so they are interested in making the tests realistic.

Also, offered Polygraph traffic does not depend on cache performance.
Polygraph code does not "characterize" anything run-time, at leat not
during PolyMix tests.

> Basically, most of the case numbers are 99.xx% miss rates.  With
> this modification, that number drops down to closer to 80%.

Actually, the measured miss ratio is usually about 50% (hit rate of
50+%), which is quite realistic. Offered hit ratio is about 55%. Byte
hit ratio is lower. Not sure where you got 99 or 80% numbers. See
cache-off results for true values.

> That's kind of evil; but at least it's a level playing field, and
> we can make a FreeBSD-specific patch for SQUID to get better numbers
> for FreeBSD.  8-) 8-).

I would not encourage you to cheat, even if there is a way. I would
recommend that you suggest ways to improve the benchmark instead.
Chances are, Polygraph can already do what you want.

> > >       options         MAXFILES=16384
> > >       options         NMBCLUSTERS=32678
>
> These I understand, though I think they are on the low end.

We have never run out of related resources with these settings during
a valid test. Keep in mind that we have to keep the number of open
concurrent HTTP connections below 5-8K to get robust performance given
PolyMix burstiness and other factors.

> > >       options         HZ=1000
>
> This one, I don't understand at all.  The web page says it's for faster
> dummynet processing.  But maybe this is an artifact of using NETISR.

This setting is a must-have if you use dummynet. We did not invent it,
it was suggested by the dummynet author himself, and it did solve
performance problems we experienced with standard setting of HZ. I do
not know what NETISR controls, so if you know of a better dummynet
tuning approach, please let us know!

> > >       kern.ipc.somaxconn=1024
> This one, either: it's really very small.

I do not think we overflow the queue during valid tests. If it gets
1000 requests long, the device under test is already in deep trouble
and will fail the test.

> > >       net.inet.ip.portrange.last=40000
>
> This one is OK, but small.  It only effects outbound connections; got
> to wonder why it isn't 65536, though.

This is actually for "dummy user" safety. Correctly configured
Polygraph does not use ephemeral ports. There is no reason to have it
at 65536 because Polygraph (under PolyMix workload) should not use
that many sockets.

> > >       net.inet.tcp.delayed_ack=0
>
> This seems designed to get a good connection rate.

I am not sure this is needed. It may be there for historical reasons.

> > >       net.inet.tcp.msl=3000
>
> And this seems designed to get a bad one.  You are aware that, by
> default, NT systems cheat on the MSL, right?  For gigabit, this is
> a larger number than you want, I think.

MSL must be small, or the kernel will choke on TIME_WAIT connections.
Please note that these are settings for Polygraph clients and servers,
and _not_ the device under test. During official tests, we use a
program (available in Polygraph distro) to verify that _proxies_ use
MSL of at least 59 seconds.

Cannot comment on Gigabit-related optimizations because we have not
played with Gigabit cards much. I would not be surprised if some
settings would have to be different.

> I haven't looked at the client code, but you are aware that adding
> IP aliases doesn't really do anything, unless you managed your port
> space your self, manually, with a couple of clever tricks?  In other
> words, you are going to be limited to your total number of outbound
> connections as your ports space (e.g. ~40K), because the port
> autoallocation takes place in the same space as the INADDR_ANY
> space?  I guess this doesn't matter, if your maxopenfiles is only
> 16K, since that's going to end up bounding you well before you run
> out of ports...

Polygraph does manage port space, but I am not sure what you mean by
"doesn't really do anything". We do not want aliases to do anything
other than participate in routing and make an impression (on the
proxy), that there are thousands of source IPs and hundreds of server
IPs. That seems to work as desired, but I may be missing your point.
The number of connections that is safe during a test is usually below
10K though.

> IMO, Polygraph is probably not something you want to include in a
> standard suite, if the intent is to get numbers that are good for
> FreeBSD PR (Sorry, Alex, but it's true: you have to do significant
> and clever and sometimes obtuse and counterintuitive things in order
> to get good Polygraph numbers for comparison).
>
> I don't think that anything you do in this regard is going to be
> able to give you iMimic or NetApp level numbers, which are created
> by professional benchmark-wranglers, so any comparison values you
> get will liekly be poor, compared to commercial offerings.

Not sure how to respond to that -- I can discuss specific limitations
of the benchmark (there are many) or vendors cheats (there are some
known ones), but I cannot defend Polygraph against general "all
vendors cheat" accusations.

If you want numbers that are good for FreeBSD PR you can simply use
iMimic numbers, for example. They are good, and they use FreeBSD.

IMO, Polygraph is the best proxy benchmark available as far as realism
is concerned, and we are very open to any specific suggestions on how
to make it better. If you can think of better workloads or better test
rules, please share your ideas with my team.


Thank you,

Alex.

-- 
                            | HTTP performance - Web Polygraph benchmark
www.measurement-factory.com | HTTP compliance+ - Co-Advisor test suite
                            | all of the above - PolyBox appliance

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.53.0302171357490.61629>