FreeBSD Mail Archives

Date:      Thu, 8 Feb 2007 03:17:03 +0000
From:      MQ <antinvidia@gmail.com>
To:        "Bruce Evans" <bde@zeta.org.au>
Cc:        Oleg Bulyzhin <oleg@freebsd.org>, net@freebsd.org
Subject:   Re: [antinvidia@gmail.com: some questions about bge(4)]
Message-ID:  <be0088ce0702071917o55337a22i8ceafc8b37557cfe@mail.gmail.com>
In-Reply-To: <20070206234100.S31484@besplex.bde.org>
References:  <20061206085401.GH32700@cell.sick.ru> <20061212224351.GE91560@lath.rinet.ru> <be0088ce0612131655j5829ca7cg3066b8855904c2e7@mail.gmail.com> <20061214092248.GA21394@lath.rinet.ru> <be0088ce0702051952w927d0bs4a20b7e34be5801e@mail.gmail.com> <20070206234100.S31484@besplex.bde.org>

2007/2/6, Bruce Evans <bde@zeta.org.au>:
>
> On Tue, 6 Feb 2007, MQ wrote:
>
> > 2006/12/14, Oleg Bulyzhin <oleg@freebsd.org>:
> >>
> >> On Thu, Dec 14, 2006 at 12:55:51AM +0000, MQ wrote:
> >> > 2006/12/12, Oleg Bulyzhin <oleg@freebsd.org>:
> >> > >
> >> > >On Wed, Dec 06, 2006 at 11:54:01AM +0300, Gleb Smirnoff wrote:
> >> > >>   Forwarding to net@ list and to Oleg, who has made polling
> >> > >> support for bge(4).
> >> > >>
> >> > >> ----- Forwarded message from MQ < antinvidia@gmail.com> -----
> >> > >>
> >> > >> From: MQ <antinvidia@gmail.com>
> >> > >> To: glebius@freebsd.org , davidch@broadcom.com
> >> > >> Subject: some questions about bge(4)
> >> > >> Date: Sat, 2 Dec 2006 09:32:27 +0000
> >> > >> Delivered-To: glebius@freebsd.org
> >> > >> DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
> >> > >>         s=beta; d=gmail.com;
> >> > >>
> >> >
> >h=received:message-id:date:from:to:subject:mime-version:content-type;
> >> > >>
> >> >
> >>
> >b=ZL3ZZ1zR0mt4LaUN2Rr+jXTPSzQgJYRwLiwKnv95r2UCEids5Wl7oA2BNgicJ2QRG8OalJ7DqY7lM1HBgv0OVTlXOhGQ9aFmKQAuTNi6ueZA817XUacXyViEepnj0oNyYgAnkbaaBO1+nl2Fpb3IxV+MIe575WRlqbglF8kdOek=
> >> > >
> >> > >>
> >> > >> Hi David and Gleb,
> >> > >>    I'm using several chips whose driver is bge(4). And now I have
> >> some
> >> > >> questions about the driver, would you please an answer for me?
> >> > >>    My confusion is related with some codes in
> /sys/dev/mii/brgphy.c.
> >> The
> >> > >
> >> > >> bge(4) uses the callout to drive the watchdog. And the
> >> brgphy_service()
> >> > >is
> >> > >> called once per second. It calls brgphy_mii_phy_auto() every 5
> >> seconds
> >> > >to
> >> > >> autonegotiate the media. Normally, it costs about 0.5ms in the
> first
> >> > >> function brgphy_service(), and about 5ms when autonegotiation is
> >> > >proceeded.
> >> > >
> >> > >brgphy_mii_phy_auto() is called only if there is no link.
>
> I haven't seen more than 50-100 uS for an average brgphy_service(), but
> even 500 uS shouldn't be a problem except near the maximum theoretical
> packet rate of ~1500 kpps, since the device has buffering for 512-1024
> packets (512 @ 1500 kpps = 341 uS = not quite 500 uS).  Half of the
> available rx buffering for non-jumbo packets is not being used, so the
> worst case is actually 170 uS of buffering @ 1500 kpps.
>
> However, known bugs cause brgphy_service() to often lose a packet.
>
> >> > >>    I haven't done streestest on it, consequently I don't know if
> this
> >> > >delay
> >> > >> will cause packets to be dropped. But I've enabled device polling
> >> with
> >> > >the
> >> > >> bge(4) on FreeBSD 6.1-RELEASE. If HZ is set to a high value(e.g.
> >> 4000),
> >> > >this
> >> > >> delay will cause the kern.polling.lost_polls to increase by one or
> >> two
> >> > >every
> >> > >> second. And for about five seconds, the lost poll will increase by
> at
> >> > >least
> >> > >> 16 regularly. So I think this behavior has some impact on the
> systems
> >> > >that
> >> > >> enables device polling.
>
> Lost polls shouldn't be much more of a problem.  Polls are only lost
> if the system can't actually poll at 1/HZ.  Increasing HZ won't make
> the worst case any better.  Polls are lost at HZ = 1000 then the worst
> case extra delay is >= 1mS the 512-1024 packets of buffering starts
> becoming a problem.  It is alread a problem at the maximum packet rate,
> since at least 1500 packets of buffering would be needed for polling
> at 1000 Hz to have any chacne of keeping up.  The practival limits for
> polling at 1000 Hz with bge are now close to 256 kpps for rx (since
> half the buffering is not configured) and 512 kpps for tx.
>
> >> Could we get something to make the bge(4) a
> >> bit
> >> > >more
> >> > >> friendly to the device polling? I don't know if autonegotiation is
> >> > >really
> >> > >> needed to be called so frequently when we are connected to a good
> >> > >network
> >> > >> environment. Can I modify the interval between two
> autonegotiations
> > ...
> >> > >If you have lost poll it does not guarantee packet loss.
>
> It hould never result in packet loss, due to buffering being adequate.
>
> >> > >Packets can be retrieved by next poll or even by idle_poll thread.
> >> > >bge_tick() is doing couple of pci register reads (it's polling phy
> >> status
> >> > >and
> >> > >updates some statistic counters), this why it takes some time.
>
> I don't believe in polling, but occasionally test it to check that
> interrupt handling doesn't lose to it :-).  I mostly use HZ = 100 and
> get up to 640 kpps (tx) where polling at 1000 Hz is limited to 512
> kpps.  Polling in idle can work better if the system is actually idle.
> Interrupt handling still loses to it for latency -- I have a ping
> latency of 50-60 uS with interrupt handling and 40-50 uS with polling
> in idle.  However, something (perhaps excessive PCI reads to check the
> link checks on every poll) limits packet rates for polling -- large
> values of HZ work as expected, and polling in idle should work better
> provided the system is actually idle, but in practice polling in idle
> with low HZ doesn't work as well for throughput as not polling in
> idle with a large HZ.  (I guess this is because the PCI reads take
> several cycles per poll and each poll delivers an average of <= 1 packet.
> For a ping latency of 40 uS, the few extra uS are not dominant, but
> at 100's of kpps they become dominant.)
>
> >> > By the way, bge_tick() takes about 0.5ms to finish its work, this
> >> results
> >> > the lost poll every second when HZ is higher. Lower HZ will limit the
> >> > performance under heavy traffic, and may result packet loss in that
> >> > situation. And higher HZ will make a confusing situation that whether
> we
> >>
> >> > have encountered a packet loss? It's really hard to make a decision
> >> between
> >> > these two kinds of situation.
> >>
> >> IMO, high HZ would not give perfomance gain if you have idle polling on
> >> (sysctl kern.polling.idle_poll=1 ).
> >> So it's better to have HZ=1000 & idle polling, than HZ=10000 and idle
> >> polling
> >> disabled.
>
> A higher HZ can work better than idle polling.  If the system is rarely
> idle, then idle polling is useless.  At any reasonable value of HZ,
> latency is very bad unless idle polling is used and the system is often
> idle.  Unreasonably large values of HZ (10000 - 100000 are probably
> possible) can be used to reduced the latency.
>
> Bruce
>

I don't know if you really have used bge(4) with device polling (or maybe
the chips you used had some big differences from mine), but I can't agree
with you on some of your points. The 0.5ms delay of the brgphy codes causes
the abnormal lost_polls, which may disturb us from identifying the real
situation and loads of the machine. Moreover, this delay causes packets to
accumulate in the packet buffer. Device polling uses poll burst to describe
how many packets can be get in a poll, the delay interrupts this mechanism
from working properly under high HZ.

The poll codes do NOT check the link very often. I don't know what do you
mean by 'PCI reads', maybe you were referring to the bge card registers?

At last you mentioned the higher HZ problem, that's why I initially posted
to this mailing list. HZ more than 2000 causes lost_polls to increase
regularly every second. As I said before, this abnormal lost_polls prevent
me from getting the actual status of the machine. If you really filled HZ
with 10000, you will get at least 5 lost_polls per second.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?be0088ce0702071917o55337a22i8ceafc8b37557cfe>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation