Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Apr 2016 11:39:10 +0200
From:      "O. Hartmann" <ohartman@zedat.fu-berlin.de>
To:        Cy Schubert <Cy.Schubert@komquats.com>
Cc:        Michael Butler <imb@protected-networks.net>, "K. Macy" <kmacy@freebsd.org>, FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   Re: CURRENT slow and shaky network stability
Message-ID:  <20160402113910.14de7eaf.ohartman@zedat.fu-berlin.de>
In-Reply-To: <20160402105503.7ede5be1.ohartman@zedat.fu-berlin.de>
References:  <56F6C6B0.6010103@protected-networks.net> <201604020807.u3287tgc034452@slippy.cwsent.com> <20160402105503.7ede5be1.ohartman@zedat.fu-berlin.de>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/cnPyYwlIcD24/.m6dd2EX7j
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Am Sat, 2 Apr 2016 10:55:03 +0200
"O. Hartmann" <ohartman@zedat.fu-berlin.de> schrieb:

> Am Sat, 02 Apr 2016 01:07:55 -0700
> Cy Schubert <Cy.Schubert@komquats.com> schrieb:
>=20
> > In message <56F6C6B0.6010103@protected-networks.net>, Michael Butler wr=
ites: =20
> > > -current is not great for interactive use at all. The strategy of
> > > pre-emptively dropping idle processes to swap is hurting .. big time.=
   =20
> >=20
> > FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk. L=
RU=20
> > doesn't do this.
> >  =20
> > >=20
> > > Compare inactive memory to swap in this example ..
> > >=20
> > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > CPU:  1.2% user,  0.0% nice,  4.3% system,  0.0% interrupt, 94.5% idle
> > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse   =20
> >=20
> > To analyze this you need to capture vmstat output. You'll see the free =
pool=20
> > dip below a threshold and pages go out to disk in response. If you have=
=20
> > daemons with small working sets, pages that are not part of the working=
=20
> > sets for daemons or applications will eventually be paged out. This is =
not=20
> > a bad thing. In your example above, the 281 MB of UFS buffers are more=
=20
> > active than the 917 MB paged out. If it's paged out and never used agai=
n,=20
> > then it doesn't hurt. However the 281 MB of buffers saves you I/O. The=
=20
> > inactive pages are part of your free pool that were active at one time =
but=20
> > now are not. They may be reclaimed and if they are, you've just saved m=
ore=20
> > I/O.
> >=20
> > Top is a poor tool to analyze memory use. Vmstat is the better tool to =
help=20
> > understand memory use. Inactive memory isn't a bad thing per se. Monito=
r=20
> > page outs, scan rate and page reclaims.
> >=20
> >  =20
>=20
> I give up! Tried to check via ssh/vmstat what is going on. Last lines bef=
ore broken
> pipe:
>=20
> [...]
> procs  memory       page                    disks     faults         cpu
> r b w  avm   fre   flt  re  pi  po    fr   sr ad0 ad1   in    sy    cs us=
 sy id
> 22 0 22 5.8G  1.0G 46319   0   0   0 55721 1297   0   4  219 23907  5400 =
95  5  0
> 22 0 22 5.4G  1.3G 51733   0   0   0 72436 1162   0   0  108 40869  3459 =
93  7  0
> 15 0 22  12G  1.2G 54400   0  27   0 52188 1160   0  42  148 52192  4366 =
91  9  0
> 14 0 22  12G  1.0G 44954   0  37   0 37550 1179   0  39  141 86209  4368 =
88 12  0
> 26 0 22  12G  1.1G 60258   0  81   0 69459 1119   0  27  123 779569 70435=
9 87 13  0
> 29 3 22  13G  774M 50576   0  68   0 32204 1304   0   2  102 507337 48486=
1 93  7  0
> 27 0 22  13G  937M 47477   0  48   0 59458 1264   3   2  112 68131 44407 =
95  5  0
> 36 0 22  13G  829M 83164   0   2   0 82575 1225   1   0  126 99366 38060 =
89 11  0
> 35 0 22 6.2G  1.1G 98803   0  13   0 121375 1217   2   8  112 99371  4999=
 85 15  0
> 34 0 22  13G  723M 54436   0  20   0 36952 1276   0  17  153 29142  4431 =
95  5  0
> Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
>=20
>=20
> This makes this crap system completely unusable. The server (FreeBSD 11.0=
-CURRENT #20
> r297503: Sat Apr  2 09:02:41 CEST 2016 amd64) in question did poudriere b=
ulk job. I can
> not even determine what terminal goes down first - another one, much more=
 time idle than
> the one shwoing the "vmstat 5" output, is still alive!=20
>=20
> i consider this a serious bug and it is no benefit what happened since th=
is "fancy"
> update. :-(

By the way - it might be of interest and some hint.

One of my boxes is acting as server and gateway. It utilises NAT, IPFW, whe=
n it is under
high load, as it was today, sometimes passing the network flow from ISP int=
o the network
for clients is extremely slow. I do not consider this the reason for collap=
sing ssh
sessions, since this incident happens also under no-load, but in the overal=
l-view onto
the problem, this could be a hint - I hope.=20

--Sig_/cnPyYwlIcD24/.m6dd2EX7j
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJW/5M+AAoJEOgBcD7A/5N8TsMH+wRrvRrKanSvZNB2fF1wensa
z3HSbLHcHTaQNI2DtsaVyiIJEybU7I90wCcA53QLVn17t4ksWs9jg4lJ2ZeDU1iY
5/cHwav9ZjUmVRRUJpF6VjeMjvlIRXVXDB29whVzlzVzyrAJHMdP5DWQy69teRlB
jVb1tstMscKlVQpmfNE4a3no7PNnoGCxsKk4soCntDjPalPzLJFNWftmfZvbIcsU
4MFn7y6gqMbeA0o72RLp8S6gHKlbalHaQHlkSqFPoY8pXk/EGf2z9vyCMMBysj/9
0HqyBts2T2djPmSBOEkkIgkJSht990giT5Y9hjGentWuyWCE+xD0bpE+l2peyW8=
=0c62
-----END PGP SIGNATURE-----

--Sig_/cnPyYwlIcD24/.m6dd2EX7j--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160402113910.14de7eaf.ohartman>