Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Mar 2018 00:47:10 +0100
From:      "O. Hartmann" <ohartmann@walstatt.org>
To:        Roman Bogorodskiy <novel@FreeBSD.org>
Cc:        "Danilo G. Baio" <dbaio@FreeBSD.org>, "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>, Trond Endrest?l <Trond.Endrestol@fagskolen.gjovik.no>, FreeBSD current <freebsd-current@freebsd.org>, Kurt Jaeger <lists@opsec.eu>
Subject:   Re: Strange ARC/Swap/CPU on yesterday's -CURRENT
Message-ID:  <20180311004737.3441dbf9@thor.intern.walstatt.dynvpn.de>
In-Reply-To: <20180307103911.GA72239@kloomba>
References:  <20180306173455.oacyqlbib4sbafqd@ler-imac.lerctr.org> <201803061816.w26IGaW5050053@pdx.rh.CN85.dnsmgr.net> <20180306193645.vv3ogqrhauivf2tr@ler-imac.lerctr.org> <20180306221554.uyshbzbboai62rdf@dx240.localdomain> <20180307103911.GA72239@kloomba>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/wdsAemlN0h2nwHJVkiIQ/UN
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Am Wed, 7 Mar 2018 14:39:13 +0400
Roman Bogorodskiy <novel@FreeBSD.org> schrieb:

>   Danilo G. Baio wrote:
>=20
> > On Tue, Mar 06, 2018 at 01:36:45PM -0600, Larry Rosenman wrote: =20
> > > On Tue, Mar 06, 2018 at 10:16:36AM -0800, Rodney W. Grimes wrote: =20
> > > > > On Tue, Mar 06, 2018 at 08:40:10AM -0800, Rodney W. Grimes wrote:=
 =20
> > > > > > > On Mon, 5 Mar 2018 14:39-0600, Larry Rosenman wrote:
> > > > > > >  =20
> > > > > > > > Upgraded to:
> > > > > > > >=20
> > > > > > > > FreeBSD borg.lerctr.org 12.0-CURRENT FreeBSD 12.0-CURRENT #=
11 r330385:
> > > > > > > > Sun Mar  4 12:48:52 CST 2018
> > > > > > > > root@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/VT-LE=
R  amd64
> > > > > > > > +1200060 1200060
> > > > > > > >=20
> > > > > > > > Yesterday, and I'm seeing really strange slowness, ARC use,=
 and SWAP use
> > > > > > > > and swapping.
> > > > > > > >=20
> > > > > > > > See http://www.lerctr.org/~ler/FreeBSD/Swapuse.png =20
> > > > > > >=20
> > > > > > > I see these symptoms on stable/11. One of my servers has 32 G=
iB of=20
> > > > > > > RAM. After a reboot all is well. ARC starts to fill up, and I=
 still=20
> > > > > > > have more than half of the memory available for user processe=
s.
> > > > > > >=20
> > > > > > > After running the periodic jobs at night, the amount of wired=
 memory=20
> > > > > > > goes sky high. /etc/periodic/weekly/310.locate is a particula=
r nasty=20
> > > > > > > one. =20
> > > > > >=20
> > > > > > I would like to find out if this is the same person I have
> > > > > > reporting this problem from another source, or if this is
> > > > > > a confirmation of a bug I was helping someone else with.
> > > > > >=20
> > > > > > Have you been in contact with Michael Dexter about this
> > > > > > issue, or any other forum/mailing list/etc?   =20
> > > > > Just IRC/Slack, with no response. =20
> > > > > >=20
> > > > > > If not then we have at least 2 reports of this unbound
> > > > > > wired memory growth, if so hopefully someone here can
> > > > > > take you further in the debug than we have been able
> > > > > > to get. =20
> > > > > What can I provide?  The system is still in this state as the ful=
l backup is
> > > > > slow. =20
> > > >=20
> > > > One place to look is to see if this is the recently fixed:
> > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D222288
> > > > g_bio leak.
> > > >=20
> > > > vmstat -z | egrep 'ITEM|g_bio|UMA'
> > > >=20
> > > > would be a good first look
> > > >  =20
> > > borg.lerctr.org /home/ler $ vmstat -z | egrep 'ITEM|g_bio|UMA'
> > > ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SL=
EEP
> > > UMA Kegs:               280,      0,     346,       5,     560,   0, =
  0
> > > UMA Zones:             1928,      0,     363,       1,     577,   0, =
  0
> > > UMA Slabs:              112,      0,25384098,  977762,102033225,   0,=
   0
> > > UMA Hash:               256,      0,      59,      16,     105,   0, =
  0
> > > g_bio:                  384,      0,      33,    1627,542482056,   0,=
   0
> > > borg.lerctr.org /home/ler $ =20
> > > > > > > Limiting the ARC to, say, 16 GiB, has no effect of the high a=
mount of=20
> > > > > > > wired memory. After a few more days, the kernel consumes virt=
ually all=20
> > > > > > > memory, forcing processes in and out of the swap device. =20
> > > > > >=20
> > > > > > Our experience as well.
> > > > > >=20
> > > > > > ...
> > > > > >=20
> > > > > > Thanks,
> > > > > > Rod Grimes
> > > > > > rgrimes@freebsd.org =20
> > > > > Larry Rosenman                     http://www.lerctr.org/~ler =20
> > > >=20
> > > > --=20
> > > > Rod Grimes                                                 rgrimes@=
freebsd.org =20
> > >=20
> > > --=20
> > > Larry Rosenman                     http://www.lerctr.org/~ler
> > > Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
> > > US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106 =20
> >=20
> >=20
> > Hi.
> >=20
> > I noticed this behavior as well and changed vfs.zfs.arc_max for a small=
er size.
> >=20
> > For me it started when I upgraded to 1200058, in this box I'm only using
> > poudriere for building tests. =20
>=20
> I've noticed that as well.
>=20
> I have 16G of RAM and two disks, the first one is UFS with the system
> installation and the second one is ZFS which I use to store media and
> data files and for poudreire.
>=20
> I don't recall the exact date, but it started fairly recently. System wou=
ld
> swap like crazy to a point when I cannot even ssh to it, and can hardly
> login through tty: it might take 10-15 minutes to see a command typed in
> the shell.
>=20
> I've updated loader.conf to have the following:
>=20
> vfs.zfs.arc_max=3D"4G"
> vfs.zfs.prefetch_disable=3D"1"
>=20
> It fixed the problem, but introduced a new one. When I'm building stuff
> with poudriere with ccache enabled, it takes hours to build even small
> projects like curl or gnutls.
>=20
> For example, current build:
>=20
> [10i386-default] [2018-03-07_07h44m45s] [parallel_build:] Queued: 3  Buil=
t: 1  Failed:
> 0  Skipped: 0  Ignored: 0  Tobuild: 2   Time: 06:48:35 [02]: security/gnu=
tls
> | gnutls-3.5.18             build           (06:47:51)
>=20
> Almost 7 hours already and still going!
>=20
> gstat output looks like this:
>=20
> dT: 1.002s  w: 1.000s
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     0      0      0      0    0.0      0      0    0.0    0.0  da0
>     0      1      0      0    0.0      1    128    0.7    0.1  ada0
>     1    106    106    439   64.6      0      0    0.0   98.8  ada1
>     0      1      0      0    0.0      1    128    0.7    0.1  ada0s1
>     0      0      0      0    0.0      0      0    0.0    0.0  ada0s1a
>     0      0      0      0    0.0      0      0    0.0    0.0  ada0s1b
>     0      1      0      0    0.0      1    128    0.7    0.1  ada0s1d
>=20
> ada0 here is UFS driver, and ada1 is ZFS.
>=20
> > Regards.
> > --=20
> > Danilo G. Baio (dbaio) =20
>=20
>=20
>=20
> Roman Bogorodskiy


This is from a APU, no ZFS, UFS on a small mSATA device, the APU (PCenigine=
) works as a
firewall, router, PBX):

last pid:  9665;  load averages:  0.13,  0.13,  0.11
up 3+06:53:55  00:26:26 19 processes:  1 running, 18 sleeping CPU:  0.3% us=
er,  0.0%
nice,  0.2% system,  0.0% interrupt, 99.5% idle Mem: 27M Active, 6200K Inac=
t, 83M
Laundry, 185M Wired, 128K Buf, 675M Free Swap: 7808M Total, 2856K Used, 780=
5M Free
[...]

The APU is running CURRENT ( FreeBSD 12.0-CURRENT #42 r330608: Wed Mar  7 1=
6:55:59 CET
2018 amd64). Usually, the APU never(!) uses swap, now it is starting to swa=
p like hell
for a couple of days and I have to reboot it failty often.

Another box, 16 GB RAM, ZFS, poudriere, the packaging box, is right now unr=
esponsible:
after hours of building packages, I tried to copy the repository from one l=
ocation on
the same ZFS volume to another - usually this task takes a couple of minute=
s for ~ 2200
ports. Now, I has taken 2 1/2 hours and the box got stuck, Ctrl-T  on the c=
onsole
delivers:
load: 0.00  cmd: make 91199 [pfault] 7239.56r 0.03u 0.04s 0% 740k

No response from the box anymore.


The problem of swapping like hell and performing slow isn't an issue of the=
 past days, it
is present at least since 1 1/2 weeks for now, even more. Since I build por=
ts fairly
often, time taken on that specific box has increased from 2 to 3 days for a=
ll ~2200
ports. The system has 16 GB of RAM, IvyBridge 4-core XEON at 3,4 GHz, if th=
is information
matters. The box is consuming swap really fast.

Today is the first time the machine got inresponsible (no ssh, no console l=
ogin so far).
Need to coldstart. OS is CURRENT as well.

Regards,

O. Hartmann


--=20
O. Hartmann

Ich widerspreche der Nutzung oder =C3=9Cbermittlung meiner Daten f=C3=BCr
Werbezwecke oder f=C3=BCr die Markt- oder Meinungsforschung (=C2=A7 28 Abs.=
 4 BDSG).

--Sig_/wdsAemlN0h2nwHJVkiIQ/UN
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iLUEARMKAB0WIQQZVZMzAtwC2T/86TrS528fyFhYlAUCWqRumQAKCRDS528fyFhY
lLlmAf9B2A2tYr22TZ1ykt2D20L5bC+hi9PUulHwwkwK0ntatPQu9AvGOcP91D+V
rhPC4/DPv3HjajwNYQcOZ7BhXjmBAf9BDQYvZboSBPUAU1jHiBDeV9LcqZR6wNuE
AYoPnsv5r7sT9Q+EQftRAaesK9KIUmiA13aCm7u00IhR+tRc40Pn
=MneR
-----END PGP SIGNATURE-----

--Sig_/wdsAemlN0h2nwHJVkiIQ/UN--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180311004737.3441dbf9>