From owner-freebsd-current@freebsd.org Sat Apr 2 21:19:36 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 963DFB01A7C for ; Sat, 2 Apr 2016 21:19:36 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 57CC11AC1; Sat, 2 Apr 2016 21:19:35 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost.zedat.fu-berlin.de (Exim 4.85) with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (envelope-from ) id <1amSx2-003Vdv-W9>; Sat, 02 Apr 2016 23:19:25 +0200 Received: from x55b2332a.dyn.telefonica.de ([85.178.51.42] helo=thor.walstatt.dynvpn.de) by inpost2.zedat.fu-berlin.de (Exim 4.85) with esmtpsa (TLSv1.2:AES128-GCM-SHA256:128) (envelope-from ) id <1amSx2-001Ppv-Le>; Sat, 02 Apr 2016 23:19:24 +0200 Date: Sat, 2 Apr 2016 23:19:55 +0200 From: "O. Hartmann" To: Cy Schubert Cc: Michael Butler , "K. Macy" , FreeBSD CURRENT Subject: Re: CURRENT slow and shaky network stability Message-ID: <20160402231955.41b05526.ohartman@zedat.fu-berlin.de> In-Reply-To: <20160402113910.14de7eaf.ohartman@zedat.fu-berlin.de> References: <56F6C6B0.6010103@protected-networks.net> <201604020807.u3287tgc034452@slippy.cwsent.com> <20160402105503.7ede5be1.ohartman@zedat.fu-berlin.de> <20160402113910.14de7eaf.ohartman@zedat.fu-berlin.de> Organization: FU Berlin X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; boundary="Sig_/eJJPtbrEuK1nN2zIpc7BmVr"; protocol="application/pgp-signature" X-Originating-IP: 85.178.51.42 X-ZEDAT-Hint: A X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Apr 2016 21:19:36 -0000 --Sig_/eJJPtbrEuK1nN2zIpc7BmVr Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Am Sat, 2 Apr 2016 11:39:10 +0200 "O. Hartmann" schrieb: > Am Sat, 2 Apr 2016 10:55:03 +0200 > "O. Hartmann" schrieb: >=20 > > Am Sat, 02 Apr 2016 01:07:55 -0700 > > Cy Schubert schrieb: > > =20 > > > In message <56F6C6B0.6010103@protected-networks.net>, Michael Butler = writes: =20 > > > > -current is not great for interactive use at all. The strategy of > > > > pre-emptively dropping idle processes to swap is hurting .. big tim= e. =20 > > >=20 > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk.= LRU=20 > > > doesn't do this. > > > =20 > > > >=20 > > > > Compare inactive memory to swap in this example .. > > > >=20 > > > > 110 processes: 1 running, 108 sleeping, 1 zombie > > > > CPU: 1.2% user, 0.0% nice, 4.3% system, 0.0% interrupt, 94.5% i= dle > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse =20 > > >=20 > > > To analyze this you need to capture vmstat output. You'll see the fre= e pool=20 > > > dip below a threshold and pages go out to disk in response. If you ha= ve=20 > > > daemons with small working sets, pages that are not part of the worki= ng=20 > > > sets for daemons or applications will eventually be paged out. This i= s not=20 > > > a bad thing. In your example above, the 281 MB of UFS buffers are mor= e=20 > > > active than the 917 MB paged out. If it's paged out and never used ag= ain,=20 > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O. Th= e=20 > > > inactive pages are part of your free pool that were active at one tim= e but=20 > > > now are not. They may be reclaimed and if they are, you've just saved= more=20 > > > I/O. > > >=20 > > > Top is a poor tool to analyze memory use. Vmstat is the better tool t= o help=20 > > > understand memory use. Inactive memory isn't a bad thing per se. Moni= tor=20 > > > page outs, scan rate and page reclaims. > > >=20 > > > =20 > >=20 > > I give up! Tried to check via ssh/vmstat what is going on. Last lines b= efore broken > > pipe: > >=20 > > [...] > > procs memory page disks faults cpu > > r b w avm fre flt re pi po fr sr ad0 ad1 in sy cs = us sy id > > 22 0 22 5.8G 1.0G 46319 0 0 0 55721 1297 0 4 219 23907 540= 0 95 5 0 > > 22 0 22 5.4G 1.3G 51733 0 0 0 72436 1162 0 0 108 40869 345= 9 93 7 0 > > 15 0 22 12G 1.2G 54400 0 27 0 52188 1160 0 42 148 52192 436= 6 91 9 0 > > 14 0 22 12G 1.0G 44954 0 37 0 37550 1179 0 39 141 86209 436= 8 88 12 0 > > 26 0 22 12G 1.1G 60258 0 81 0 69459 1119 0 27 123 779569 704= 359 87 13 0 > > 29 3 22 13G 774M 50576 0 68 0 32204 1304 0 2 102 507337 484= 861 93 7 0 > > 27 0 22 13G 937M 47477 0 48 0 59458 1264 3 2 112 68131 4440= 7 95 5 0 > > 36 0 22 13G 829M 83164 0 2 0 82575 1225 1 0 126 99366 3806= 0 89 11 0 > > 35 0 22 6.2G 1.1G 98803 0 13 0 121375 1217 2 8 112 99371 49= 99 85 15 0 > > 34 0 22 13G 723M 54436 0 20 0 36952 1276 0 17 153 29142 443= 1 95 5 0 > > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe > >=20 > >=20 > > This makes this crap system completely unusable. The server (FreeBSD 11= .0-CURRENT #20 > > r297503: Sat Apr 2 09:02:41 CEST 2016 amd64) in question did poudriere= bulk job. I > > can not even determine what terminal goes down first - another one, muc= h more time > > idle than the one shwoing the "vmstat 5" output, is still alive!=20 > >=20 > > i consider this a serious bug and it is no benefit what happened since = this "fancy" > > update. :-( =20 >=20 > By the way - it might be of interest and some hint. >=20 > One of my boxes is acting as server and gateway. It utilises NAT, IPFW, w= hen it is under > high load, as it was today, sometimes passing the network flow from ISP i= nto the network > for clients is extremely slow. I do not consider this the reason for coll= apsing ssh > sessions, since this incident happens also under no-load, but in the over= all-view onto > the problem, this could be a hint - I hope.=20 I just checked on one box, that "broke pipe" very quickly after I started p= oudriere, while it did well a couple of hours before until the pipe broke. It seems i= t's load dependend when the ssh session gets wrecked, but more important, after the = long-haul poudriere run, I rebooted the box and tried again with the mentioned broken= pipe after a couple of minutes after poudriere ran. Then I left the box for several hour= s and logged in again and checked the swap. Although there was for hours no load or othe= r pressure, there were 31% of of swap used - still (box has 16 GB of RAM and is propell= ed by a XEON E3-1245 V2). --Sig_/eJJPtbrEuK1nN2zIpc7BmVr Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEbBAEBCAAGBQJXADd7AAoJEOgBcD7A/5N8pLgH+Mi7sirm20gfmc98xIKyeYDE rEMIoUnhdgoBZ2hM2hWtyHlyMUXw/j/EjGi6Z/HsFZVdEvialKb/rUWDzTkAtp3R bQDaYU9bQ2muWcku/ENvGdfdUa3VYRCh6BOiHFcciPITDoAvi5wRsZeF5KgwfdIx 2wiJeOS4EcT8LcmhE19OiKPEJc3esjy1NkLQi+JKBwT06hVf6QiVmXcmgOxWoVCX 2xED4O9Hc6TfPb5ig0q8Fjkgg2ojMk9AL1Kcy4nrZ02z8hOCUjMrPTM5dSQMBy3X AHGZy5hn5/0QvJBTU4XW08HvZtag00bioqbMPg4ZiJxU7O5Xv+SpiXIPErIocg== =ZLue -----END PGP SIGNATURE----- --Sig_/eJJPtbrEuK1nN2zIpc7BmVr--