Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Apr 2017 22:26:20 -0700
From:      Mark Millard <markmi@dsl-only.net>
To:        freebsd-arm <freebsd-arm@freebsd.org>, freebsd-hackers@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015 [mdconfig -d not effectively releasing memory?]
Message-ID:  <302D1255-4D34-4C1B-8F3A-9180A6AF8768@dsl-only.net>
In-Reply-To: <51050A07-B951-45C0-82CE-73BB342F012E@dsl-only.net>
References:  <4BC7E6BC-4BF9-4F5E-9851-E022AC9A3082@dsl-only.net> <51050A07-B951-45C0-82CE-73BB342F012E@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
[As the text does not really follow from the earlier text
I'd sent directly I'm top posting a hypothesis about where
so much active memory came from to be so low in available
memory to get an reclaim_pv_chunk attempt.]

My hypothesis is that the "mdconfig -d"s from vm_copy_base
(from /usr/src/release/tools/vmimage.subr ) did not actually
release the memory resources involved (from vnode backed
mdconfig use):

vm_copy_base() {
        # Creates a new UFS root filesystem and copies the contents of =
the
        # current root filesystem into it.  This produces a "clean" disk
        # image without any remnants of files which were created =
temporarily
        # during image-creation and have since been deleted (e.g., =
downloaded
        # package archives).

        mkdir -p ${DESTDIR}/old
        mdold=3D$(mdconfig -f ${VMBASE})
        mount /dev/${mdold} ${DESTDIR}/old

        truncate -s ${VMSIZE} ${VMBASE}.tmp
        mkdir -p ${DESTDIR}/new
        mdnew=3D$(mdconfig -f ${VMBASE}.tmp)
        newfs -L rootfs /dev/${mdnew}
        mount /dev/${mdnew} ${DESTDIR}/new

        tar -cf- -C ${DESTDIR}/old . | tar -xUf- -C ${DESTDIR}/new

        umount_loop /dev/${mdold}
        rmdir ${DESTDIR}/old
        mdconfig -d -u ${mdold}

        umount_loop /dev/${mdnew}
        rmdir ${DESTDIR}/new
        tunefs -n enable /dev/${mdnew}
        mdconfig -d -u ${mdnew}
        mv ${VMBASE}.tmp ${VMBASE}
}

Without such prior mdconfig activity the
"cp -p" and following "xz -T 0" do not get the
large Meme:Active figure in top, "xz -T 0" getting
more like 781M Mem:Active with a xz:SIZE of 791M and
xz:RES < 800M (varying). Zero xz:SWAP. xz also gets
all the cores going, so well over 300% in top always
(4 cores) instead of < 100%. In this context both
the cp and the xz finish just fine.

In other words: no low memory problem without
first having the vnode backed mdconfig use.

=46rom the prior top report for the failure,
partially repeated here:

. . .
Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free
Swap: 6144M Total, 34M Used, 6110M Free, 348K Out

  PID USERNAME    THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME     =
CPU COMMAND
48988 root          4  31    0   651M 27048K     0K RUN     0   0:03  =
87.60% xz -T 0 =
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-
. . .

The combination 1618M Mem:Active but 34M Swap:Used
and 651M xz:SIZE but 27M xz:RES and 0K xz:SWAP just
seems very odd, like it should not happen. The 17M
Mem:Inact is odd for the context as well. (Mem:Wired,
Mem:Buf, and Mem:Free look normal.)

An alternate hypothesis would be the memory "leak"
is from mkimg not having it memory-use cleaned up.
This happens after vm_copy_base but before the cp/xz
sequence and is what produces vm.raw.

For reference of what worked just fine after the
post-panic reboot, using the already existing
vm.raw (sparse) file as a starting place:

# cp -p vm.raw  =
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarc=
h64.raw
# xz -T 0 =
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarc=
h64.raw

# ls -lTt *raw*
-rw-r--r--  1 root  wheel  34360566272 Apr 27 18:40:24 2017 vm.raw
-rw-r--r--  1 root  wheel  34359746560 Apr 27 18:37:29 2017 =
vm.raw.nested_bsd
-rw-r--r--  1 root  wheel  27917287424 Apr 27 18:34:45 2017 raw.img

# du -sm *raw*
1762	raw.img
1583	vm.raw
1583	vm.raw.nested_bsd

(Before the .xz replaces the .raw:)

# ls -lT /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/
total 33820032
-rw-r--r--  1 root  wheel  34360566272 Apr 27 18:40:24 2017 =
FreeBSD-12.0-CURRENT-arm64-aarch64.raw

# du -sm /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/*
32777	=
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarc=
h64.raw

(After xz:)

# ls -lT /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/
total 258208
-rw-r--r--  1 root  wheel  264275808 Apr 27 18:40:24 2017 =
FreeBSD-12.0-CURRENT-arm64-aarch64.raw.xz

# du -sm /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/*
253	=
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarc=
h64.raw.xz

(Mem:Active returned to 10M when xz finished.)


Prior reports from capturing text:

On 2017-Apr-27, at 7:31 PM, Mark Millard <markmi at dsl-only.net> wrote:

> [Another example panic. Again no dump. But I have what
> a top -PCwaopid froze at this time.]
>=20
> On 2017-Apr-27, at 4:22 PM, Mark Millard <markmi at dsl-only.net> =
wrote:
>=20
>> Unfortunately for this FYI the attempt to produce a dump
>> failed and so all the information that I have is what I
>> first captured from the console output: a backtrace.
>>=20
>> The context was head -r317015 on a Pine64+ 2GB. At the
>> time I was experimenting with trying to build a vm.raw
>> from my own build of FreeBSD. The (root) file system
>> is on a USB SSD off of a powered USB hub.
>>=20
>> panic: ARM64TODO: reclaim_pv_chunk
>> cpuid =3D 1
>> time =3D 1493332968
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self_wrapper+0x28
>>        pc =3D 0xffff000000605cc0  lr =3D 0xffff0000000869cc
>>        sp =3D 0xffff000083ba4f00  fp =3D 0xffff000083ba5110
>>=20
>> db_trace_self_wrapper() at vpanic+0x164
>>        pc =3D 0xffff0000000869cc  lr =3D 0xffff00000031d464
>>        sp =3D 0xffff000083ba5120  fp =3D 0xffff000083ba5190
>>=20
>> vpanic() at panic+0x4c
>>        pc =3D 0xffff00000031d464  lr =3D 0xffff00000031d2fc
>>        sp =3D 0xffff000083ba51a0  fp =3D 0xffff000083ba5220
>>=20
>> panic() at reclaim_pv_chunk+0x10
>>        pc =3D 0xffff00000031d2fc  lr =3D 0xffff00000061a234
>>        sp =3D 0xffff000083ba5230  fp =3D 0xffff000083ba5230
>>=20
>> reclaim_pv_chunk() at get_pv_entry+0x240
>>        pc =3D 0xffff00000061a234  lr =3D 0xffff000000616184
>>        sp =3D 0xffff000083ba5240  fp =3D 0xffff000083ba5260
>>=20
>> get_pv_entry() at pmap_enter+0x694
>>        pc =3D 0xffff000000616184  lr =3D 0xffff0000006156a0
>>        sp =3D 0xffff000083ba5270  fp =3D 0xffff000083ba5300
>>=20
>> pmap_enter() at vm_fault_hold+0x28c
>>        pc =3D 0xffff0000006156a0  lr =3D 0xffff0000005b9740
>>        sp =3D 0xffff000083ba5310  fp =3D 0xffff000083ba5460
>>=20
>> vm_fault_hold() at vm_fault+0x70
>>        pc =3D 0xffff0000005b9740  lr =3D 0xffff0000005b9464
>>        sp =3D 0xffff000083ba5470  fp =3D 0xffff000083ba54a0
>>=20
>> vm_fault() at data_abort+0xe0
>>        pc =3D 0xffff0000005b9464  lr =3D 0xffff00000061ad94
>>        sp =3D 0xffff000083ba54b0  fp =3D 0xffff000083ba5560
>>=20
>> data_abort() at handle_el1h_sync+0x70
>>        pc =3D 0xffff00000061ad94  lr =3D 0xffff000000607870
>>        sp =3D 0xffff000083ba5570  fp =3D 0xffff000083ba5680
>>=20
>> handle_el1h_sync() at kern_select+0x9fc
>>        pc =3D 0xffff000000607870  lr =3D 0xffff00000037db3c
>>        sp =3D 0xffff000083ba5690  fp =3D 0xffff000083ba58f0
>>=20
>> kern_select() at sys_select+0x5c
>>        pc =3D 0xffff00000037db3c  lr =3D 0xffff00000037dc58
>>        sp =3D 0xffff000083ba5900  fp =3D 0xffff000083ba5930
>>=20
>> sys_select() at do_el0_sync+0xa48
>>        pc =3D 0xffff00000037dc58  lr =3D 0xffff00000061b91c
>>        sp =3D 0xffff000083ba5940  fp =3D 0xffff000083ba5a70
>>=20
>> do_el0_sync() at handle_el0_sync+0x6c
>>        pc =3D 0xffff00000061b91c  lr =3D 0xffff0000006079e8
>>        sp =3D 0xffff000083ba5a80  fp =3D 0xffff000083ba5b90
>>=20
>> handle_el0_sync() at 0x4948c
>>        pc =3D 0xffff0000006079e8  lr =3D 0x000000000004948c
>>        sp =3D 0xffff000083ba5ba0  fp =3D 0x0000ffffffffd960
>=20
>=20
> This time I got to record from top:
> (swap is on a swap partition)
> (pid 49888's SIZE vs. RES and SWAP might be interesting)
> (as might the Active figure)
>=20
> last pid: 48988;  load averages:  0.64,  0.44,  0.38                   =
                                                                         =
                                up 0+04:21:01  19:19:50
> 32 processes:  2 running, 30 sleeping
> CPU 0: 13.2% user,  0.0% nice, 23.2% system,  0.3% interrupt, 63.3% =
idle
> CPU 1:  4.6% user,  0.0% nice, 23.9% system,  0.0% interrupt, 71.5% =
idle
> CPU 2:  2.1% user,  0.0% nice, 23.2% system,  0.0% interrupt, 74.8% =
idle
> CPU 3:  3.3% user,  0.0% nice, 23.8% system,  0.0% interrupt, 72.8% =
idle
> Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free
> Swap: 6144M Total, 34M Used, 6110M Free, 348K Out
>=20
>  PID USERNAME    THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME    =
 CPU COMMAND
> 48988 root          4  31    0   651M 27048K     0K RUN     0   0:03  =
87.60% xz -T 0 =
/usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarc=
h64.raw
> 11983 root          1  22    0  5068K     0K     0K wait    3   0:00   =
0.00% make vm-image vm-install DESTDIR=3D/usr/obj/DESTDIRs/vmimage-aarch64=
 (<make>)
> 11981 root          1  42    0  7320K     0K  1516K wait    1   0:00   =
0.00% sh =
/root/sys_build_scripts.aarch64-host/make_noscript_aarch64_nodebug_clang_b=
ootstrap-aarch64-host.sh vm-image vm-install=20
> 11980 root          1  20    0  6656K  1548K     0K select  0   0:02   =
0.00% [script]
> 11977 root          1  30    0  7320K     0K  1516K wait    3   0:00   =
0.00% /bin/sh =
/root/sys_build_scripts.aarch64-host/make_aarch64_nodebug_clang_bootstrap-=
aarch64-host.sh vm-image vm-install DEST
> 2694 root          1  20    0  8804K  2072K     0K CPU2    2   0:07   =
0.17% top -PCwaopid
>  827 root          1  20    0  7320K     0K   360K wait    0   0:00   =
0.00% su (<sh>)
>  826 markmi        1  22    0 10372K     0K  1532K wait    3   0:00   =
0.00% su (<su>)
>  820 markmi        1  24    0  7320K     0K  1516K wait    1   0:00   =
0.00% -sh (<sh>)
>  819 markmi        1  20    0 18416K  1152K     0K select  1   0:21   =
0.00% sshd: markmi@pts/1 (sshd)
>  816 root          1  20    0 18416K  3276K     0K select  0   0:00   =
0.00% sshd: markmi [priv] (sshd)
>  765 root          1  20    0  7320K     0K   224K wait    2   0:00   =
0.00% su (<sh>)
>  764 markmi        1  23    0 10372K     0K  1532K wait    0   0:00   =
0.00% su (<su>)
>  758 markmi        1  31    0  7320K     0K  1516K wait    1   0:00   =
0.00% -sh (<sh>)
>  757 markmi        1  20    0 18416K   228K   904K select  3   0:01   =
0.01% sshd: markmi@pts/0 (sshd)
>  754 root          1  25    0 18416K  3276K     0K select  1   0:00   =
0.00% sshd: markmi [priv] (sshd)
>  746 root          1  27    0  7320K  1532K     0K ttyin   0   0:00   =
0.00% -sh (sh)
>  745 root          1  20    0 10372K     0K  1532K wait    1   0:00   =
0.00% login [pam] (<login>)
>  700 root          1  20    0  6948K     0K   168K nanslp  1   0:00   =
0.00% /usr/sbin/cron -s (<cron>)
>  696 smmsp         1  20    0 10460K     0K   184K pause   0   0:00   =
0.00% sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue =
(<sendmail>)
>  693 root          1  20    0 10460K  1392K     0K select  1   0:00   =
0.03% sendmail: accepting connections (sendmail)
>  690 root          1  20    0 15800K   968K     0K select  2   0:00   =
0.00% /usr/sbin/sshd
>  661 root          1  20    0  6656K   344K     0K select  2   0:01   =
0.00% /usr/sbin/powerd
>  658 root          2  20    0 12788K 12672K     0K select  0   0:02   =
0.01% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f =
/var/db/ntpd.drift
>  620 root         32  52    0  6384K  1100K     0K rpcsvc  1   0:00   =
0.00% nfsd: server (nfsd)
>  619 root          1  52    0  6384K   704K     0K select  1   0:00   =
0.00% nfsd: master (nfsd)
>  617 root          1  20    0  6684K   688K     0K select  1   0:00   =
0.00% /usr/sbin/mountd -r
>  478 root          1  20    0  6676K   596K     0K select  3   0:00   =
0.00% /usr/sbin/rpcbind
>  469 root          1  20    0  6680K   572K     0K select  2   0:00   =
0.00% /usr/sbin/syslogd -s
>  396 root          1  20    0  9580K    32K     0K select  0   0:00   =
0.00% /sbin/devd
>  308 _dhcp         1  20    0  6800K   532K     0K select  2   0:00   =
0.00% dhclient: awg0 (dhclient)
>  307 root          1  52    0  6800K   424K     0K select  2   0:00   =
0.00% dhclient: awg0 [priv] (dhclient)
>=20
> And here is the backtrace:
>=20
> timeout stopping cpus
> panic: ARM64TODO: reclaim_pv_chunk
> cpuid =3D 0
> time =3D 1493345992
> KDB: stack backtrace:
> db_trace_self() at db_trace_self_wrapper+0x28
>         pc =3D 0xffff000000605cc0  lr =3D 0xffff0000000869cc
>         sp =3D 0xffff000083d301d0  fp =3D 0xffff000083d303e0
>=20
> db_trace_self_wrapper() at vpanic+0x164
>         pc =3D 0xffff0000000869cc  lr =3D 0xffff00000031d464
>         sp =3D 0xffff000083d303f0  fp =3D 0xffff000083d30460
>=20
> vpanic() at panic+0x4c
>         pc =3D 0xffff00000031d464  lr =3D 0xffff00000031d2fc
>         sp =3D 0xffff000083d30470  fp =3D 0xffff000083d304f0
>=20
> panic() at reclaim_pv_chunk+0x10
>         pc =3D 0xffff00000031d2fc  lr =3D 0xffff00000061a234
>         sp =3D 0xffff000083d30500  fp =3D 0xffff000083d30500
>=20
> reclaim_pv_chunk() at get_pv_entry+0x240
>         pc =3D 0xffff00000061a234  lr =3D 0xffff000000616184
>         sp =3D 0xffff000083d30510  fp =3D 0xffff000083d30530
>=20
> get_pv_entry() at pmap_enter+0x694
>         pc =3D 0xffff000000616184  lr =3D 0xffff0000006156a0
>         sp =3D 0xffff000083d30540  fp =3D 0xffff000083d305d0
>=20
> pmap_enter() at vm_fault_hold+0x28c
>         pc =3D 0xffff0000006156a0  lr =3D 0xffff0000005b9740
>         sp =3D 0xffff000083d305e0  fp =3D 0xffff000083d30730
>=20
> vm_fault_hold() at vm_fault+0x70
>         pc =3D 0xffff0000005b9740  lr =3D 0xffff0000005b9464
>         sp =3D 0xffff000083d30740  fp =3D 0xffff000083d30770
>=20
> vm_fault() at data_abort+0xe0
>         pc =3D 0xffff0000005b9464  lr =3D 0xffff00000061ad94
>         sp =3D 0xffff000083d30780  fp =3D 0xffff000083d30830
>=20
> data_abort() at handle_el0_sync+0x6c
>         pc =3D 0xffff00000061ad94  lr =3D 0xffff0000006079e8
>         sp =3D 0xffff000083d30840  fp =3D 0xffff000083d30950
>=20
> handle_el0_sync() at 0x400a3de4
>         pc =3D 0xffff0000006079e8  lr =3D 0x00000000400a3de4
>         sp =3D 0xffff000083d30960  fp =3D 0x0000ffffbfdfcd30
>=20
> KDB: enter: panic
> [ thread pid 48988 tid 100230 ]
> Stopped at      kdb_enter+0x44: undefined       d4200000
>=20

=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?302D1255-4D34-4C1B-8F3A-9180A6AF8768>