From owner-freebsd-current@freebsd.org Sat Apr 2 22:28:24 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6443BAEC9A1 for ; Sat, 2 Apr 2016 22:28:24 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-ig0-x232.google.com (mail-ig0-x232.google.com [IPv6:2607:f8b0:4001:c05::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2BB1112EF; Sat, 2 Apr 2016 22:28:24 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: by mail-ig0-x232.google.com with SMTP id ma7so38867343igc.0; Sat, 02 Apr 2016 15:28:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=hnO34w+BYG7/zNs7XO4qTJ+7ER3mouhWHXn9MIR2Q3s=; b=zHDGbAKP5vdjZAyqPWPr/Md+QZ9/2Vkevx1SJQmHXTTlNCyNuqI/ZxY+rwvAHPBNUa wtdTrRPoRURx/15gaCcDjIRJUXte4aJbI1MdEXPdtJJV4EM7yv65DORNnyds3xMAh5PM pe3csl4ZxP4zl/Kps5FDIb+VwRWDi/wBX/Wk3jYf9+Z8re+z/0P/3glVQQ0h1xqas1GW m2DYdLu1sOtBIS18kz78ff/VsYqaKjLSJPSLHcfy0C7M5sABHCtd2gmmT4i5RlkYQ6CA FvUOI2trLohwRISgD2sIIhwNiqG1bNVRqCs4kWg0KQTbtUQwxUO4lemhovwwZeSUAwgQ g4bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=hnO34w+BYG7/zNs7XO4qTJ+7ER3mouhWHXn9MIR2Q3s=; b=K651EhiqaH/ZjhE+BLQzt8pKywdnxeAFpg8RQv1Nqa924sGGPyhRz1BvT6Ma2ruQcJ oxeqkqPpRQPP/XL9ZYdk2W2bQ4CMD6BVogZVc6BI7vVjgayCsDl/mKcNG+ciu1bVKvVL tge7rS9kWkoh15L9wkdwndaYVSHeLlYDJ18dZH05nzWAEDWzEos79coSiY8QH2ZWCmeG B7/zxt7wGK50dGXTP3cvi/RBXteS1JpHWnghafMerWAwsIa5Qaa7gYiiQhr16DQapcx9 aPVQuxHZs5OL/Q3RKFI1sirQOi2UyNi6b7B4wd1cXhVwqzJwSYdtfyYVTtAPF3jgqp5U 3aEQ== X-Gm-Message-State: AD7BkJIGsxMFmSd+UWIucZNQrs8iDuCGIW6CR+q4BHImo5gaO658WYqNqOD4etDpfLog7a158mR0EH+B9zCPkA== MIME-Version: 1.0 X-Received: by 10.50.8.101 with SMTP id q5mr4355154iga.22.1459636103401; Sat, 02 Apr 2016 15:28:23 -0700 (PDT) Sender: kob6558@gmail.com Received: by 10.79.35.33 with HTTP; Sat, 2 Apr 2016 15:28:23 -0700 (PDT) In-Reply-To: <20160402231955.41b05526.ohartman@zedat.fu-berlin.de> References: <56F6C6B0.6010103@protected-networks.net> <201604020807.u3287tgc034452@slippy.cwsent.com> <20160402105503.7ede5be1.ohartman@zedat.fu-berlin.de> <20160402113910.14de7eaf.ohartman@zedat.fu-berlin.de> <20160402231955.41b05526.ohartman@zedat.fu-berlin.de> Date: Sat, 2 Apr 2016 15:28:23 -0700 X-Google-Sender-Auth: IzKqzb94BduVnKaQT1E8gn8L_cY Message-ID: Subject: Re: CURRENT slow and shaky network stability From: Kevin Oberman To: "O. Hartmann" Cc: Cy Schubert , Michael Butler , "K. Macy" , FreeBSD CURRENT Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Apr 2016 22:28:24 -0000 On Sat, Apr 2, 2016 at 2:19 PM, O. Hartmann wrote: > Am Sat, 2 Apr 2016 11:39:10 +0200 > "O. Hartmann" schrieb: > > > Am Sat, 2 Apr 2016 10:55:03 +0200 > > "O. Hartmann" schrieb: > > > > > Am Sat, 02 Apr 2016 01:07:55 -0700 > > > Cy Schubert schrieb: > > > > > > > In message <56F6C6B0.6010103@protected-networks.net>, Michael > Butler writes: > > > > > -current is not great for interactive use at all. The strategy of > > > > > pre-emptively dropping idle processes to swap is hurting .. big > time. > > > > > > > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to > disk. LRU > > > > doesn't do this. > > > > > > > > > > > > > > Compare inactive memory to swap in this example .. > > > > > > > > > > 110 processes: 1 running, 108 sleeping, 1 zombie > > > > > CPU: 1.2% user, 0.0% nice, 4.3% system, 0.0% interrupt, 94.5% > idle > > > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free > > > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse > > > > > > > > To analyze this you need to capture vmstat output. You'll see the > free pool > > > > dip below a threshold and pages go out to disk in response. If you > have > > > > daemons with small working sets, pages that are not part of the > working > > > > sets for daemons or applications will eventually be paged out. This > is not > > > > a bad thing. In your example above, the 281 MB of UFS buffers are > more > > > > active than the 917 MB paged out. If it's paged out and never used > again, > > > > then it doesn't hurt. However the 281 MB of buffers saves you I/O. > The > > > > inactive pages are part of your free pool that were active at one > time but > > > > now are not. They may be reclaimed and if they are, you've just > saved more > > > > I/O. > > > > > > > > Top is a poor tool to analyze memory use. Vmstat is the better tool > to help > > > > understand memory use. Inactive memory isn't a bad thing per se. > Monitor > > > > page outs, scan rate and page reclaims. > > > > > > > > > > > > > > I give up! Tried to check via ssh/vmstat what is going on. Last lines > before broken > > > pipe: > > > > > > [...] > > > procs memory page disks faults > cpu > > > r b w avm fre flt re pi po fr sr ad0 ad1 in sy cs > us sy id > > > 22 0 22 5.8G 1.0G 46319 0 0 0 55721 1297 0 4 219 23907 > 5400 95 5 0 > > > 22 0 22 5.4G 1.3G 51733 0 0 0 72436 1162 0 0 108 40869 > 3459 93 7 0 > > > 15 0 22 12G 1.2G 54400 0 27 0 52188 1160 0 42 148 52192 > 4366 91 9 0 > > > 14 0 22 12G 1.0G 44954 0 37 0 37550 1179 0 39 141 86209 > 4368 88 12 0 > > > 26 0 22 12G 1.1G 60258 0 81 0 69459 1119 0 27 123 779569 > 704359 87 13 0 > > > 29 3 22 13G 774M 50576 0 68 0 32204 1304 0 2 102 507337 > 484861 93 7 0 > > > 27 0 22 13G 937M 47477 0 48 0 59458 1264 3 2 112 68131 > 44407 95 5 0 > > > 36 0 22 13G 829M 83164 0 2 0 82575 1225 1 0 126 99366 > 38060 89 11 0 > > > 35 0 22 6.2G 1.1G 98803 0 13 0 121375 1217 2 8 112 99371 > 4999 85 15 0 > > > 34 0 22 13G 723M 54436 0 20 0 36952 1276 0 17 153 29142 > 4431 95 5 0 > > > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe > > > > > > > > > This makes this crap system completely unusable. The server (FreeBSD > 11.0-CURRENT #20 > > > r297503: Sat Apr 2 09:02:41 CEST 2016 amd64) in question did > poudriere bulk job. I > > > can not even determine what terminal goes down first - another one, > much more time > > > idle than the one shwoing the "vmstat 5" output, is still alive! > > > > > > i consider this a serious bug and it is no benefit what happened since > this "fancy" > > > update. :-( > > > > By the way - it might be of interest and some hint. > > > > One of my boxes is acting as server and gateway. It utilises NAT, IPFW, > when it is under > > high load, as it was today, sometimes passing the network flow from ISP > into the network > > for clients is extremely slow. I do not consider this the reason for > collapsing ssh > > sessions, since this incident happens also under no-load, but in the > overall-view onto > > the problem, this could be a hint - I hope. > > I just checked on one box, that "broke pipe" very quickly after I started > poudriere, > while it did well a couple of hours before until the pipe broke. It seems > it's load > dependend when the ssh session gets wrecked, but more important, after the > long-haul > poudriere run, I rebooted the box and tried again with the mentioned > broken pipe after a > couple of minutes after poudriere ran. Then I left the box for several > hours and logged > in again and checked the swap. Although there was for hours no load or > other pressure, > there were 31% of of swap used - still (box has 16 GB of RAM and is > propelled by a XEON > E3-1245 V2). > Unless something has changed, just as things are not preemptively swapped out, they are also not preemptively swapped back in. AFAIK, once a process is swapped out, it will remain swapped out until/unless it becomes active. At that time it is swapped in and this can entail a significant delay. If my laptop is locked and something (usually Chromium) starts eating all of the memory and processes start swapping out, it can take >5 seconds to get the unlock window to display. -- Kevin Oberman, part-time kid-herd and retired network engineer