From owner-freebsd-hackers@freebsd.org Sun Sep 10 10:45:13 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C772BE0953C for ; Sun, 10 Sep 2017 10:45:13 +0000 (UTC) (envelope-from j.deboynepollard-newsgroups@ntlworld.com) Received: from know-smtprelay-omc-4.server.virginmedia.net (know-smtprelay-omc-4.server.virginmedia.net [80.0.253.68]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (Client CN "Bizanga Labs SMTP Client Certificate", Issuer "Bizanga Labs CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 110CE6E45D for ; Sun, 10 Sep 2017 10:45:12 +0000 (UTC) (envelope-from j.deboynepollard-newsgroups@ntlworld.com) Received: from [192.168.1.5] ([86.10.211.13]) by know-smtprelay-4-imp with bizsmtp id 7mjv1w0080HtmFq01mjv2T; Sun, 10 Sep 2017 11:43:55 +0100 X-Originating-IP: [86.10.211.13] X-Authenticated-User: J.deBoynePollard-newsgroups@NTLWorld.COM X-Spam: 0 X-Authority: v=2.1 cv=E/ww3vpl c=1 sm=1 tr=0 a=SB7hr1IvJSWWr45F2gQiKw==:117 a=SB7hr1IvJSWWr45F2gQiKw==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=x7bEGLp0ZPQA:10 a=r77TgQKjGQsHNAKrUKIA:9 a=2rVjqWD_AAAA:8 a=6I5d2MoRAAAA:8 a=itly7gIdAAAA:8 a=Lvekd-s6X64mJFDZ9nUA:9 a=QEXdDO2ut3YA:10 a=ZUGwP7LCt9cA:10 a=7YrUDqsB9R4A:10 a=FSu5OgGmP5kA:10 a=-FEs8UIgK8oA:10 a=NWVoK91CQyQA:10 a=n6cf16u8BA8RtRk39fsA:9 a=f08GDC4oldWNsAXz:21 a=_W_S_7VecoQA:10 a=ULaUcM2Ibn9MdPUUwucP:22 a=IjZwj45LgO3ly-622nXo:22 a=1RpNR2E4bTkVPcsa2RFZ:22 Subject: nosh version 1.35 To: Debian users , FreeBSD Hackers , Supervision References: <54430B41.3010301@NTLWorld.com> <76c00c13-4cc9-ed9c-f48f-81a3f050b80b@NTLWorld.com> <0d6afc48-3465-3509-ff46-494da45022bc@NTLWorld.com> <731531599.156033.1491767527334.JavaMail.open-xchange@oxbe4.tb.ukmail.iss.as9143.net> <592685009.2293134.1499287287329.JavaMail.open-xchange@oxbe2.tb.ukmail.iss.as9143.net> From: Jonathan de Boyne Pollard Message-ID: <43de321f-e66c-5353-09db-58b9921354b4@NTLWorld.COM> Date: Sun, 10 Sep 2017 11:43:49 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <592685009.2293134.1499287287329.JavaMail.open-xchange@oxbe2.tb.ukmail.iss.as9143.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Sep 2017 10:45:13 -0000 The nosh package is now up to version 1.35 . * http://jdebp.eu./Softwares/nosh/ * https://www.freebsd.org/news/status/report-2015-07-2015-09.html#The-nosh-Project * http://jdebp.info./Softwares/nosh/ Networking As I mentioned a week or so ago, the external configuration import subsystem now converts a Debian-style /etc/network/interfaces configuration file, via rc.conf settings, into the native networking subsystem. There is also a whole new /Networking/ chapter in the /nosh Guide/, which explains this and several other things, including how Plug and Play integration interoperates with the networking services and what the native networking subsystem encompasses, to the level of what service does what and to what purpose. Work on the Plug and Play integration is on-going, and I hope to have yet more for this, and indeed for other parts of the networking subsystem, in version 1.36. Packages There are some Debian packages that declare that they need the logrotate package, even though they do not when run under nosh service management. For their benefit there is now a nosh-logrotate-shims Debian package that is simply a dummy package that satisfies this need without setting up a spurious and unnecessary logrotate system. Service bundles There are a few more service bundles, including ones for sysstat and elasticsearch. The existing service bundles for things such as unbound, clamav, and freshclam have been augmented and fixed in response to user feedback. And a bug that incorrectly resulted in the ldconfig service being disabled has been fixed. The dbus services, the system-wide one and the per-user one(s), have been renamed to dbus-daemon. This is because of the existence of a dbus-broker service bundle. This is a placeholder for if the dbus-broker people ever fix it so that it works. dbus-broker does not provide a working system right now. It is currently not possible to substitute dbus-broker for dbus-daemon on non-systemd systems, because dbus-broker is very tightly tied in to systemd's idiosyncratic D-Bus control interface. It /only/ speaks the systemd-specific protocol, and knows no other way of stopping and starting services, not even the service command. (In contrast dbus-daemon can still be configured to demand-start services using simple service management commands .) From owner-freebsd-hackers@freebsd.org Sun Sep 10 22:25:29 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6AD78E06A13 for ; Sun, 10 Sep 2017 22:25:29 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-10.reflexion.net [208.70.210.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 29DA3636A4 for ; Sun, 10 Sep 2017 22:25:28 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 10676 invoked from network); 10 Sep 2017 22:30:52 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 10 Sep 2017 22:30:52 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.2) with SMTP; Sun, 10 Sep 2017 18:25:27 -0400 (EDT) Received: (qmail 2931 invoked from network); 10 Sep 2017 22:25:27 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 10 Sep 2017 22:25:27 -0000 Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 77CE8EC8AAE; Sun, 10 Sep 2017 15:25:26 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: head -r323246 Pine64+ 2GB boot time context: acquiring blockable sleep lock with spinlock or critical section held for data_abort calling pmap_fault calling __mtx_lock_flags Date: Sun, 10 Sep 2017 15:25:25 -0700 References: <8419C238-702D-4BF7-89DB-EC649CD405A5@dsl-only.net> To: FreeBSD Toolchain , freebsd-arm , freebsd-hackers In-Reply-To: <8419C238-702D-4BF7-89DB-EC649CD405A5@dsl-only.net> Message-Id: <9DB26517-E4E0-4B2A-9855-9F7381AD4C66@dsl-only.net> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Sep 2017 22:25:29 -0000 [I got a boot-time panic with a debug kernel that reported a "acquiring blockable sleep lock with spinlock or critical section held (sleep mutex)". This was for data_abort calling pmap_fault calling __mtx_lock_flags . I first include prior non-debug kernel reports in case they are related.] On 2017-Sep-10, at 1:34 AM, Mark Millard wrote: > . . . >=20 > Booting with the non-debug kernel appears to hang for > a bit and then gets to a db> prompt and a bt showed > (for example): > (The console output for the register dump seems > to always be incomplete and there is a wait to > end up at the db> prompt. Note the data_abort > closest to the fork_exit .) >=20 > . . . > Release APs > APs not started > CPU 0: ARM Cortex-A53 r0p4 affinity: 0 > Instruction Set Attributes 0 =3D > Instruction Set Attributes 1 =3D <0> > Processor Features 0 =3D > Processor Features 1 =3D <0> > Memory Model Features 0 =3D <4k Granule,64k = Granule,MixedEndian,S/NS Mem,16bit ASID,1TB PA> > Memory Model Features 1 =3D <> > Debug Features 0 =3D <2 CTX Breakpoints,4 Watchpoints,6 = Breakpoints,PMUv3,Debug v8> > Debug Features 1 =3D <0> > Auxiliary Features 0 =3D <0> > Auxiliary Features 1 =3D <0> > CPU 1: (null) (null) r0p0 affinity: 0 > CPU 2: (null) (null) r0p0 affinity: 0 > CPU 3: (null) (null) r0p0 affinity: 0 > x0: ffff000000a1c000 > x1: fffffd000103a[ thread pid 0 tid 100057 ] > Stopped at thread_lock_flags_+0x298: ldr w4, [x3, #156] > db> bt > Tracing pid 0 tid 100057 td 0xfffffd000103a000 > db_trace_self() at db_stack_trace+0xec > pc =3D 0xffff000000613688 lr =3D 0xffff000000084db4 > sp =3D 0xffff0000698f4260 fp =3D 0xffff0000698f4290 >=20 > db_stack_trace() at db_command+0x224 > pc =3D 0xffff000000084db4 lr =3D 0xffff000000084a3c > sp =3D 0xffff0000698f42a0 fp =3D 0xffff0000698f4380 >=20 > db_command() at db_command_loop+0x60 > pc =3D 0xffff000000084a3c lr =3D 0xffff0000000847fc > sp =3D 0xffff0000698f4390 fp =3D 0xffff0000698f43b0 >=20 > db_command_loop() at db_trap+0xf4 > pc =3D 0xffff0000000847fc lr =3D 0xffff000000087964 > sp =3D 0xffff0000698f43c0 fp =3D 0xffff0000698f45e0 >=20 > db_trap() at kdb_trap+0x180 > pc =3D 0xffff000000087964 lr =3D 0xffff0000003693e0 > sp =3D 0xffff0000698f45f0 fp =3D 0xffff0000698f4650 >=20 > kdb_trap() at do_el1h_sync+0x90 > pc =3D 0xffff0000003693e0 lr =3D 0xffff00000062956c > sp =3D 0xffff0000698f4660 fp =3D 0xffff0000698f4690 >=20 > do_el1h_sync() at handle_el1h_sync+0x74 > pc =3D 0xffff00000062956c lr =3D 0xffff000000615074 > sp =3D 0xffff0000698f46a0 fp =3D 0xffff0000698f47b0 >=20 > handle_el1h_sync() at kdb_enter+0x38 > pc =3D 0xffff000000615074 lr =3D 0xffff000000368ac8 > sp =3D 0xffff0000698f47c0 fp =3D 0xffff0000698f4850 >=20 > kdb_enter() at vpanic+0x180 > pc =3D 0xffff000000368ac8 lr =3D 0xffff000000326dd8 > sp =3D 0xffff0000698f4860 fp =3D 0xffff0000698f48d0 >=20 > vpanic() at panic+0x48 > pc =3D 0xffff000000326dd8 lr =3D 0xffff000000326c54 > sp =3D 0xffff0000698f48e0 fp =3D 0xffff0000698f4960 >=20 > panic() at data_abort+0x21c > pc =3D 0xffff000000326c54 lr =3D 0xffff0000006298e8 > sp =3D 0xffff0000698f4970 fp =3D 0xffff0000698f4a20 >=20 > data_abort() at do_el1h_sync+0xfc > pc =3D 0xffff0000006298e8 lr =3D 0xffff0000006295d8 > sp =3D 0xffff0000698f4a30 fp =3D 0xffff0000698f4a60 >=20 > do_el1h_sync() at handle_el1h_sync+0x74 > pc =3D 0xffff0000006295d8 lr =3D 0xffff000000615074 > sp =3D 0xffff0000698f4a70 fp =3D 0xffff0000698f4b80 >=20 > handle_el1h_sync() at thread_lock_flags_+0x1a8 > pc =3D 0xffff000000615074 lr =3D 0xffff000000309060 > sp =3D 0xffff0000698f4b90 fp =3D 0xffff0000698f4c80 >=20 > thread_lock_flags_() at statclock_cnt+0x11c > pc =3D 0xffff000000309060 lr =3D 0xffff0000002c5b90 > sp =3D 0xffff0000698f4c90 fp =3D 0xffff0000698f4cb0 >=20 > statclock_cnt() at handleevents+0x108 > pc =3D 0xffff0000002c5b90 lr =3D 0xffff00000064ad84 > sp =3D 0xffff0000698f4cc0 fp =3D 0xffff0000698f4d00 >=20 > handleevents() at timercb+0xe0 > pc =3D 0xffff00000064ad84 lr =3D 0xffff00000064b51c > sp =3D 0xffff0000698f4d10 fp =3D 0xffff0000698f4d80 >=20 > timercb() at arm_tmr_intr+0x58 > pc =3D 0xffff00000064b51c lr =3D 0xffff000000600e5c > sp =3D 0xffff0000698f4d90 fp =3D 0xffff0000698f4d90 >=20 > arm_tmr_intr() at intr_event_handle+0x64 > pc =3D 0xffff000000600e5c lr =3D 0xffff0000002edd50 > sp =3D 0xffff0000698f4da0 fp =3D 0xffff0000698f4dd0 >=20 > intr_event_handle() at intr_isrc_dispatch+0x30 > pc =3D 0xffff0000002edd50 lr =3D 0xffff00000064d8ec > sp =3D 0xffff0000698f4de0 fp =3D 0xffff0000698f4df0 >=20 > intr_isrc_dispatch() at arm_gic_intr+0xf0 > pc =3D 0xffff00000064d8ec lr =3D 0xffff000000601848 > sp =3D 0xffff0000698f4e00 fp =3D 0xffff0000698f4e50 >=20 > arm_gic_intr() at intr_irq_handler+0x60 > pc =3D 0xffff000000601848 lr =3D 0xffff00000064d6e0 > sp =3D 0xffff0000698f4e60 fp =3D 0xffff0000698f4e80 >=20 > intr_irq_handler() at handle_el1h_irq+0x70 > pc =3D 0xffff00000064d6e0 lr =3D 0xffff000000615130 > sp =3D 0xffff0000698f4e90 fp =3D 0xffff0000698f4fa0 >=20 > handle_el1h_irq() at ns8250_putc+0x2c > pc =3D 0xffff000000615130 lr =3D 0xffff00000019a570 > sp =3D 0xffff0000698f4fb0 fp =3D 0xffff0000698f5050 >=20 > ns8250_putc() at ns8250_putc+0x2c > pc =3D 0xffff00000019a570 lr =3D 0xffff00000019a570 > sp =3D 0xffff0000698f5060 fp =3D 0xffff0000698f5080 >=20 > ns8250_putc() at uart_cnputc+0x94 > pc =3D 0xffff00000019a570 lr =3D 0xffff0000001a0860 > sp =3D 0xffff0000698f5090 fp =3D 0xffff0000698f50c0 >=20 > uart_cnputc() at cnputc+0x90 > pc =3D 0xffff0000001a0860 lr =3D 0xffff0000002cb3a8 > sp =3D 0xffff0000698f50d0 fp =3D 0xffff0000698f5120 >=20 > cnputc() at cnputs+0xb4 > pc =3D 0xffff0000002cb3a8 lr =3D 0xffff0000002cb7c8 > sp =3D 0xffff0000698f5130 fp =3D 0xffff0000698f5150 >=20 > cnputs() at putchar+0x158 > pc =3D 0xffff0000002cb7c8 lr =3D 0xffff00000036f04c > sp =3D 0xffff0000698f5160 fp =3D 0xffff0000698f51e0 >=20 > putchar() at kvprintf+0xda8 > pc =3D 0xffff00000036f04c lr =3D 0xffff00000036ec70 > sp =3D 0xffff0000698f51f0 fp =3D 0xffff0000698f5300 >=20 > kvprintf() at vprintf+0x7c > pc =3D 0xffff00000036ec70 lr =3D 0xffff00000036f838 > sp =3D 0xffff0000698f5310 fp =3D 0xffff0000698f5420 >=20 > vprintf() at printf+0x48 > pc =3D 0xffff00000036f838 lr =3D 0xffff00000036f7ac > sp =3D 0xffff0000698f5430 fp =3D 0xffff0000698f54b0 >=20 > printf() at print_registers+0x4c > pc =3D 0xffff00000036f7ac lr =3D 0xffff00000062966c > sp =3D 0xffff0000698f54c0 fp =3D 0xffff0000698f54f0 >=20 > print_registers() at data_abort+0x1f0 > pc =3D 0xffff00000062966c lr =3D 0xffff0000006298bc > sp =3D 0xffff0000698f5500 fp =3D 0xffff0000698f55b0 >=20 > data_abort() at do_el1h_sync+0xfc > pc =3D 0xffff0000006298bc lr =3D 0xffff0000006295d8 > sp =3D 0xffff0000698f55c0 fp =3D 0xffff0000698f55f0 >=20 > do_el1h_sync() at handle_el1h_sync+0x74 > pc =3D 0xffff0000006295d8 lr =3D 0xffff000000615074 > sp =3D 0xffff0000698f5600 fp =3D 0xffff0000698f5710 >=20 > handle_el1h_sync() at sched_switch+0x54c > pc =3D 0xffff000000615074 lr =3D 0xffff000000351dd4 > sp =3D 0xffff0000698f5720 fp =3D 0xffff0000698f5800 >=20 > sched_switch() at mi_switch+0x118 > pc =3D 0xffff000000351dd4 lr =3D 0xffff000000330c14 > sp =3D 0xffff0000698f5810 fp =3D 0xffff0000698f5830 >=20 > mi_switch() at taskqgroup_binder+0x74 > pc =3D 0xffff000000330c14 lr =3D 0xffff000000367864 > sp =3D 0xffff0000698f5840 fp =3D 0xffff0000698f5860 >=20 > taskqgroup_binder() at gtaskqueue_run_locked+0x160 > pc =3D 0xffff000000367864 lr =3D 0xffff000000367710 > sp =3D 0xffff0000698f5870 fp =3D 0xffff0000698f58e0 >=20 > gtaskqueue_run_locked() at gtaskqueue_thread_loop+0xcc > pc =3D 0xffff000000367710 lr =3D 0xffff0000003672c8 > sp =3D 0xffff0000698f58f0 fp =3D 0xffff0000698f5910 >=20 > gtaskqueue_thread_loop() at fork_exit+0x94 > pc =3D 0xffff0000003672c8 lr =3D 0xffff0000002eab20 > sp =3D 0xffff0000698f5920 fp =3D 0xffff0000698f5950 >=20 > fork_exit() at fork_trampoline+0x10 > pc =3D 0xffff0000002eab20 lr =3D 0xffff00000062934c > sp =3D 0xffff0000698f5960 fp =3D 0x0000000000000000 >=20 >=20 > Booting with a debug kernel worked fine. (This matches up > with past reports about "recent" pine64+ handling.) >=20 > But trying to have the root file system on a USB SSD > drive failed to see the USB drive at all. (This matches > up with past reports about "recent" pine64+ handling.) >=20 >=20 > =46rom a separate non-debug kernel boot attempt: > (remember the "thread_lock_flags_+0x298: ldr w4, [x3, #156]" > but also note x8 in addition to x3) >=20 > db> show reg > spsr 0x96000004000003c5 > x0 0xffff00000069b000 $d.2+0x1ac > x1 0x2 > x2 0xffff00000069ba48 $d.5+0x1d > x3 0xdeadc0d8 <<<<<<<<< Note the "0xdeadc0d8" > x4 0x3 > x5 0xffff000000610cf0 generic_bs_barrier > x6 0 > x7 0x40 $d.14 > x8 0xdeadc0de <<<<<<<<< Note the "0xdeadc0de" > x9 0 > x10 0x975c860b > x11 0x975c860b > x12 0x51eb850 > x13 0x4 > x14 0x66 $d.9+0x26 > x15 0xffff0000007004ce hex2ascii_data > x16 0 > x17 0 > x18 0xffff00006990ec10 > x19 0xfffffd000103a000 > x20 0xffff000000bcee70 blocked_lock+0x18 > x21 0xffff00000080e5a8 sdt_lockstat___spin__release > x22 0x3938700 > x23 0xfffffd000103a000 > x24 0xffff000000bcee58 blocked_lock > x25 0x4 > x26 0x98967f > x27 0xffff0000009ef000 next_to_notify > x28 0xffff000000bb9000 proc0+0x4f8 > x29 0xffff00006990ec80 > lr 0xffff000000309064 thread_lock_flags_+0x1ac > elr 0xffff000000309154 thread_lock_flags_+0x29c > sp 0xffff00006990ec10 > thread_lock_flags_+0x298: ldr w4, [x3, #156] > db> bt > Tracing pid 0 tid 100057 td 0xfffffd000103a000 > db_trace_self() at db_stack_trace+0xec > pc =3D 0xffff000000613688 lr =3D 0xffff000000084db4 > sp =3D 0xffff00006990e260 fp =3D 0xffff00006990e290 >=20 > db_stack_trace() at db_command+0x224 > pc =3D 0xffff000000084db4 lr =3D 0xffff000000084a3c > sp =3D 0xffff00006990e2a0 fp =3D 0xffff00006990e380 >=20 > db_command() at db_command_loop+0x60 > pc =3D 0xffff000000084a3c lr =3D 0xffff0000000847fc > sp =3D 0xffff00006990e390 fp =3D 0xffff00006990e3b0 >=20 > db_command_loop() at db_trap+0xf4 > pc =3D 0xffff0000000847fc lr =3D 0xffff000000087964 > sp =3D 0xffff00006990e3c0 fp =3D 0xffff00006990e5e0 >=20 > db_trap() at kdb_trap+0x180 > pc =3D 0xffff000000087964 lr =3D 0xffff0000003693e0 > sp =3D 0xffff00006990e5f0 fp =3D 0xffff00006990e650 >=20 > kdb_trap() at do_el1h_sync+0x90 > pc =3D 0xffff0000003693e0 lr =3D 0xffff00000062956c > sp =3D 0xffff00006990e660 fp =3D 0xffff00006990e690 >=20 > do_el1h_sync() at handle_el1h_sync+0x74 > pc =3D 0xffff00000062956c lr =3D 0xffff000000615074 > sp =3D 0xffff00006990e6a0 fp =3D 0xffff00006990e7b0 >=20 > handle_el1h_sync() at kdb_enter+0x38 > pc =3D 0xffff000000615074 lr =3D 0xffff000000368ac8 > sp =3D 0xffff00006990e7c0 fp =3D 0xffff00006990e850 >=20 > kdb_enter() at vpanic+0x180 > pc =3D 0xffff000000368ac8 lr =3D 0xffff000000326dd8 > sp =3D 0xffff00006990e860 fp =3D 0xffff00006990e8d0 >=20 > vpanic() at panic+0x48 > pc =3D 0xffff000000326dd8 lr =3D 0xffff000000326c54 > sp =3D 0xffff00006990e8e0 fp =3D 0xffff00006990e960 >=20 > panic() at data_abort+0x21c > pc =3D 0xffff000000326c54 lr =3D 0xffff0000006298e8 > sp =3D 0xffff00006990e970 fp =3D 0xffff00006990ea20 >=20 > data_abort() at do_el1h_sync+0xfc > pc =3D 0xffff0000006298e8 lr =3D 0xffff0000006295d8 > sp =3D 0xffff00006990ea30 fp =3D 0xffff00006990ea60 >=20 > do_el1h_sync() at handle_el1h_sync+0x74 > pc =3D 0xffff0000006295d8 lr =3D 0xffff000000615074 > sp =3D 0xffff00006990ea70 fp =3D 0xffff00006990eb80 >=20 > handle_el1h_sync() at thread_lock_flags_+0x1a8 > pc =3D 0xffff000000615074 lr =3D 0xffff000000309060 > sp =3D 0xffff00006990eb90 fp =3D 0xffff00006990ec80 >=20 > thread_lock_flags_() at statclock_cnt+0x11c > pc =3D 0xffff000000309060 lr =3D 0xffff0000002c5b90 > sp =3D 0xffff00006990ec90 fp =3D 0xffff00006990ecb0 >=20 > statclock_cnt() at handleevents+0x108 > pc =3D 0xffff0000002c5b90 lr =3D 0xffff00000064ad84 > sp =3D 0xffff00006990ecc0 fp =3D 0xffff00006990ed00 >=20 > handleevents() at timercb+0xe0 > pc =3D 0xffff00000064ad84 lr =3D 0xffff00000064b51c > sp =3D 0xffff00006990ed10 fp =3D 0xffff00006990ed80 >=20 > timercb() at arm_tmr_intr+0x58 > pc =3D 0xffff00000064b51c lr =3D 0xffff000000600e5c > sp =3D 0xffff00006990ed90 fp =3D 0xffff00006990ed90 >=20 > arm_tmr_intr() at intr_event_handle+0x64 > pc =3D 0xffff000000600e5c lr =3D 0xffff0000002edd50 > sp =3D 0xffff00006990eda0 fp =3D 0xffff00006990edd0 >=20 > intr_event_handle() at intr_isrc_dispatch+0x30 > pc =3D 0xffff0000002edd50 lr =3D 0xffff00000064d8ec > sp =3D 0xffff00006990ede0 fp =3D 0xffff00006990edf0 >=20 > intr_isrc_dispatch() at arm_gic_intr+0xf0 > pc =3D 0xffff00000064d8ec lr =3D 0xffff000000601848 > sp =3D 0xffff00006990ee00 fp =3D 0xffff00006990ee50 >=20 > arm_gic_intr() at intr_irq_handler+0x60 > pc =3D 0xffff000000601848 lr =3D 0xffff00000064d6e0 > sp =3D 0xffff00006990ee60 fp =3D 0xffff00006990ee80 >=20 > intr_irq_handler() at handle_el1h_irq+0x70 > pc =3D 0xffff00000064d6e0 lr =3D 0xffff000000615130 > sp =3D 0xffff00006990ee90 fp =3D 0xffff00006990efa0 >=20 > handle_el1h_irq() at ns8250_putc+0x2c > pc =3D 0xffff000000615130 lr =3D 0xffff00000019a570 > sp =3D 0xffff00006990efb0 fp =3D 0xffff00006990f050 >=20 > ns8250_putc() at ns8250_putc+0x2c > pc =3D 0xffff00000019a570 lr =3D 0xffff00000019a570 > sp =3D 0xffff00006990f060 fp =3D 0xffff00006990f080 >=20 > ns8250_putc() at uart_cnputc+0x94 > pc =3D 0xffff00000019a570 lr =3D 0xffff0000001a0860 > sp =3D 0xffff00006990f090 fp =3D 0xffff00006990f0c0 >=20 > uart_cnputc() at cnputc+0x90 > pc =3D 0xffff0000001a0860 lr =3D 0xffff0000002cb3a8 > sp =3D 0xffff00006990f0d0 fp =3D 0xffff00006990f120 >=20 > cnputc() at cnputs+0xb4 > pc =3D 0xffff0000002cb3a8 lr =3D 0xffff0000002cb7c8 > sp =3D 0xffff00006990f130 fp =3D 0xffff00006990f150 >=20 > cnputs() at putchar+0x158 > pc =3D 0xffff0000002cb7c8 lr =3D 0xffff00000036f04c > sp =3D 0xffff00006990f160 fp =3D 0xffff00006990f1e0 >=20 > putchar() at kvprintf+0xda8 > pc =3D 0xffff00000036f04c lr =3D 0xffff00000036ec70 > sp =3D 0xffff00006990f1f0 fp =3D 0xffff00006990f300 >=20 > kvprintf() at vprintf+0x7c > pc =3D 0xffff00000036ec70 lr =3D 0xffff00000036f838 > sp =3D 0xffff00006990f310 fp =3D 0xffff00006990f420 >=20 > vprintf() at printf+0x48 > pc =3D 0xffff00000036f838 lr =3D 0xffff00000036f7ac > sp =3D 0xffff00006990f430 fp =3D 0xffff00006990f4b0 >=20 > printf() at print_registers+0x4c > pc =3D 0xffff00000036f7ac lr =3D 0xffff00000062966c > sp =3D 0xffff00006990f4c0 fp =3D 0xffff00006990f4f0 >=20 > print_registers() at data_abort+0x1f0 > pc =3D 0xffff00000062966c lr =3D 0xffff0000006298bc > sp =3D 0xffff00006990f500 fp =3D 0xffff00006990f5b0 >=20 > data_abort() at do_el1h_sync+0xfc > pc =3D 0xffff0000006298bc lr =3D 0xffff0000006295d8 > sp =3D 0xffff00006990f5c0 fp =3D 0xffff00006990f5f0 >=20 > do_el1h_sync() at handle_el1h_sync+0x74 > pc =3D 0xffff0000006295d8 lr =3D 0xffff000000615074 > sp =3D 0xffff00006990f600 fp =3D 0xffff00006990f710 >=20 > handle_el1h_sync() at sched_switch+0x54c > pc =3D 0xffff000000615074 lr =3D 0xffff000000351dd4 > sp =3D 0xffff00006990f720 fp =3D 0xffff00006990f800 >=20 > sched_switch() at mi_switch+0x118 > pc =3D 0xffff000000351dd4 lr =3D 0xffff000000330c14 > sp =3D 0xffff00006990f810 fp =3D 0xffff00006990f830 >=20 > mi_switch() at taskqgroup_binder+0x74 > pc =3D 0xffff000000330c14 lr =3D 0xffff000000367864 > sp =3D 0xffff00006990f840 fp =3D 0xffff00006990f860 >=20 > taskqgroup_binder() at gtaskqueue_run_locked+0x160 > pc =3D 0xffff000000367864 lr =3D 0xffff000000367710 > sp =3D 0xffff00006990f870 fp =3D 0xffff00006990f8e0 >=20 > gtaskqueue_run_locked() at gtaskqueue_thread_loop+0xcc > pc =3D 0xffff000000367710 lr =3D 0xffff0000003672c8 > sp =3D 0xffff00006990f8f0 fp =3D 0xffff00006990f910 >=20 > gtaskqueue_thread_loop() at fork_exit+0x94 > pc =3D 0xffff0000003672c8 lr =3D 0xffff0000002eab20 > sp =3D 0xffff00006990f920 fp =3D 0xffff00006990f950 >=20 > fork_exit() at fork_trampoline+0x10 > pc =3D 0xffff0000002eab20 lr =3D 0xffff00000062934c > sp =3D 0xffff00006990f960 fp =3D 0x0000000000000000 [Another issue was modern boot1.efi (as bootaa64.efi) not working and so I'm using an old one (2016-Nov-7) that I found that allows getting this far.] A boot attempt with a older boot1.efi that works and a debug kernel got: . . . Release APs APs not started CPU 0: ARM Cortex-A53 r0p4 affinity: 0 Instruction Set Attributes 0 =3D Instruction Set Attributes 1 =3D <0> Processor Features 0 =3D Processor Features 1 =3D <0> Memory Model Features 0 =3D <4k Granule,64k = Granule,MixedEndian,S/NS Mem,16bit ASID,1TB PA> Memory Model Features 1 =3D <> Debug Features 0 =3D <2 CTX Breakpoints,4 Watchpoints,6 = Breakpoints,PMUv3,Debug v8> Debug Features 1 =3D <0> Auxiliary Features 0 =3D <0> Auxiliary Features 1 =3D <0> CPU 1: (null) (null) r0p0 affinity: 0 CPU 2: (null) (null) r0p0 affinity: 0 CPU 3: (null) (null) r0p0 affinity: 0 panic: acquiring blockable sleep lock with spinlock or critical section = held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710 cpuid =3D 0 time =3D 13 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc =3D 0xffff0000005efc78 lr =3D 0xffff000000088094 sp =3D 0xffff000069850080 fp =3D 0xffff000069850290 db_trace_self_wrapper() at vpanic+0x164 pc =3D 0xffff000000088094 lr =3D 0xffff00000031764c sp =3D 0xffff0000698502a0 fp =3D 0xffff000069850310 vpanic() at kassert_panic+0x15c pc =3D 0xffff00000031764c lr =3D 0xffff0000003174e4 sp =3D 0xffff000069850320 fp =3D 0xffff0000698503e0 kassert_panic() at witness_checkorder+0x160 pc =3D 0xffff0000003174e4 lr =3D 0xffff000000374990 sp =3D 0xffff0000698503f0 fp =3D 0xffff000069850470 witness_checkorder() at __mtx_lock_flags+0xa8 pc =3D 0xffff000000374990 lr =3D 0xffff0000002f8b7c sp =3D 0xffff000069850480 fp =3D 0xffff0000698504b0 __mtx_lock_flags() at pmap_fault+0x40 pc =3D 0xffff0000002f8b7c lr =3D 0xffff000000606994 sp =3D 0xffff0000698504c0 fp =3D 0xffff0000698504e0 pmap_fault() at data_abort+0xb8 pc =3D 0xffff000000606994 lr =3D 0xffff000000608a9c sp =3D 0xffff0000698504f0 fp =3D 0xffff0000698505a0 data_abort() at do_el1h_sync+0xfc pc =3D 0xffff000000608a9c lr =3D 0xffff0000006088f0 sp =3D 0xffff0000698505b0 fp =3D 0xffff0000698505e0 do_el1h_sync() at handle_el1h_sync+0x74 pc =3D 0xffff0000006088f0 lr =3D 0xffff0000005f1874 sp =3D 0xffff0000698505f0 fp =3D 0xffff000069850700 handle_el1h_sync() at sched_switch+0x2a8 pc =3D 0xffff0000005f1874 lr =3D 0xffff00000033f0c8 sp =3D 0xffff000069850710 fp =3D 0xffff0000698507f0 sched_switch() at mi_switch+0x1b8 pc =3D 0xffff00000033f0c8 lr =3D 0xffff00000032161c sp =3D 0xffff000069850800 fp =3D 0xffff000069850820 mi_switch() at taskqgroup_binder+0x7c pc =3D 0xffff00000032161c lr =3D 0xffff00000035510c sp =3D 0xffff000069850830 fp =3D 0xffff000069850860 taskqgroup_binder() at gtaskqueue_run_locked+0x104 pc =3D 0xffff00000035510c lr =3D 0xffff000000354f74 sp =3D 0xffff000069850870 fp =3D 0xffff0000698508e0 gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c pc =3D 0xffff000000354f74 lr =3D 0xffff000000354d10 sp =3D 0xffff0000698508f0 fp =3D 0xffff000069850910 gtaskqueue_thread_loop() at fork_exit+0x7c pc =3D 0xffff000000354d10 lr =3D 0xffff0000002dbd3c sp =3D 0xffff000069850920 fp =3D 0xffff000069850950 fork_exit() at fork_trampoline+0x10 pc =3D 0xffff0000002dbd3c lr =3D 0xffff000000608664 sp =3D 0xffff000069850960 fp =3D 0x0000000000000000 KDB: enter: panic [ thread pid 0 tid 100058 ] Stopped at sched_switch+0x2b8: ldrb w9, [x8, #894] db>=20 Unfortunately it was not taking console input so that is all I got. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Mon Sep 11 04:04:53 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2EFA9E17E85 for ; Mon, 11 Sep 2017 04:04:53 +0000 (UTC) (envelope-from Zhixin.Wan@watchguard.com) Received: from XCS01CO.watchguard.com (mx1.watchguard.com [206.191.171.101]) by mx1.freebsd.org (Postfix) with ESMTP id EA5AE6E71D for ; Mon, 11 Sep 2017 04:04:52 +0000 (UTC) (envelope-from Zhixin.Wan@watchguard.com) From: Zhixin Wan To: "freebsd-hackers@freebsd.org" Subject: OOM-killer can't work on FreeBSD 11.0 Thread-Topic: OOM-killer can't work on FreeBSD 11.0 Thread-Index: AdMqp76RyQybtFOlSuGM22kMXJHWTA== Date: Mon, 11 Sep 2017 03:56:45 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DM5SPR00MB253; 6:TbSfV3acElMMDzYFXUbNiCvKE+yv8o/7pkq9eZ4/WXEXs1DS5x7pzUhsGQB0Qm3AlRryPF1B/2jJcfthFhDnD2enAQvVaehruEOkDnctDIAKMC3DHK57Ci2XZ5wqYDmlDwSOQ72ZvbulqTT3d2iHS8N+f+FUDXrIFe0YVMk9v7Bv0gGhrbZ3oPF3AakIdBxcxvqDBMfWKbcYrESw45fRsh/iEyGbhxgbFJQSvcNpjaqSHZNj+GUepd31rETlmy0r3xKuyUPV+tKOLwyei6WvUqb6isTkkGAJfOSN/pWxtYbg3JHeHrIEzVFqchV/0Bv0mnUEXF4I13JMbOHEfBAMHA==; 5:lRNub5FY6jVZXAJMBBdt57IgrZF/mBQ/QJrSn6gFbpDE+E0SOmFIIYWMxVXJt+mjOFF+XdB8AlRQH9B7th1K3hhUwuRVQFrNmdG2bkaaPoYVTPzZKJc9BOn2UtBB3zPNVlqbxwezDhN6EtsXvkENug==; 24:hujWcj8c54U3+Vjky1hhzXyVfn5Bi8QvPToSjrfiK8UqWgKfv+8f/4tL/utvQxXgdsAZwopdltSqO+G09it7WV9/yAgwJhWz5KNDugl87+o=; 7:UXgItVsdMngvf2RMqcy0S3rtdT8IeeJaoZcU66/3FIjD2pXEr68wqB3U33tA4w5CPW6TOWqXQPgMzbnRknx9LRqCunHTChawmzqov2qqQXH7EajlKekjRPl/dSMjE03gAUYBUOp5j7d4RR0PUgBHDu21PS9BD0AWzIq+DavLTnA0bxmoGm0uU+k4zXl33TOpot8qR+mYKlpfPYmnhs2d/pdqDXdDrf/FjU0lA2yD5qg= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 4ef69695-f1ac-49db-752c-08d4f8c91f0d x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:DM5SPR00MB253; x-ms-traffictypediagnostic: DM5SPR00MB253: x-exchange-antispam-report-test: UriScan:(190756311086443)(21748063052155)(56005881305849); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(93006095)(93001095)(3002001)(10201501046)(100000703101)(100105400095)(6041248)(20161123560025)(20161123555025)(20161123558100)(20161123564025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DM5SPR00MB253; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DM5SPR00MB253; x-forefront-prvs: 04270EF89C x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39830400002)(189002)(199003)(790700001)(7696004)(5660300001)(2501003)(3660700001)(81156014)(66066001)(81166006)(8676002)(966005)(14454004)(3280700002)(7736002)(77096006)(8936002)(2906002)(54356999)(5640700003)(606006)(102836003)(50986999)(86362001)(99286003)(54896002)(236005)(6916009)(33656002)(6306002)(53936002)(68736007)(97736004)(5630700001)(110136004)(101416001)(72206003)(3846002)(55016002)(9686003)(478600001)(6116002)(6506006)(106356001)(105586002)(6436002)(2351001)(2900100001)(74316002)(25786009)(9326002)(189998001); DIR:OUT; SFP:1101; SCL:1; SRVR:DM5SPR00MB253; H:DM5PR10MB1754.namprd10.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Sep 2017 03:56:45.7298 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 2563c132-88f5-466f-bbb2-e83153b3c808 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5SPR00MB253 X-OriginatorOrg: watchguard.com Received-SPF: none Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Sep 2017 04:04:53 -0000 Hi, I have a mail system running FreeBSD 9.3 which is put on VMWare ESXi, it's = assigned a low memory (1G or 2G) and a reasonable swap disk size (2 x Memor= y size). The mail system was running for several years, and didn't see any freeze ev= en a lot of mail traffic through it. Recently I upgraded this mail system from FreeBSD 9.3 to FreeBSD 11.0, and = after running a few days, the mail system got freeze. I can't get any respo= nse from the console, and can't login to the mail system with SSH either, except ping to the syst= em got response. I look into the message log and found a lot of messages: swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(4): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(4): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(5): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(4): failed swap_pager: out of swap space swap_pager_getswapspace(1): failed swap_pager_getswapspace(16): failed swap_pager_getswapspace(12): failed swap_pager_getswapspace(9): failed swap_pager_getswapspace(16): failed ... It seems that the out of swap cause the system freeze. To figure out this problem, restore the mail system to previous backup snap= shot which is running on FreeBSD 9.3. Put mail traffic pressure on the mail system, and observe the memory and sw= ap space usage with a simple shell: #!/bin/sh while [ 1 ]; do vmstat pstat -s sleep 60 done >From the console, I saw the memory and swap space usage increased quickly. = Once the swap space was eat out, out of swap messages will be shown in message log: swap_pager_getswapspace(4): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(3): failed swap_pager_getswapspace(4): failed swap_pager_getswapspace(6): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(5): failed swap_pager_getswapspace(8): failed swap_pager_getswapspace(2): failed swap_pager_getswapspace(4): failed Sep 6 08:30:58 mail-system kernel: pid 92324 (bm_scanner), uid 5500, was ki= lled: out of swap space Compared to FreeBSD 11.0, there are still a lot of "swap_pager_getswapspace= failed" messages, except FreeBSD 9.3 will kill a process to free memory. This behavior cause the mail system can go on running, but FreeBSD 11.0 can= 't. Observe the system memory and swap space usage continuously, the OOM-killer works accurately: once the swap space usage is 100%, the OOM= -killer will be called to kill a process to free memory. Dig into the source code of FreeBSD 9.3, file vm_pageout.c, function vm_pag= eout_scan(): /* * If we are critically low on one of RAM or swap and low on * the other, kill the largest process. However, we avoid * doing this on the first pass in order to give ourselves a * chance to flush out dirty vnode-backed pages and to allow * active pages to be moved to the inactive queue and reclai= med. */ if (pass !=3D 0 && ((swap_pager_avail < 64 && vm_page_count_min()) || (swap_pager_full && vm_paging_target() > 0))) vm_pageout_oom(VM_OOM_MEM); the corresponding source code in FreeBSD 11.0, file vm_pageout.c, function = vm_pageout_scan(): /* * If the inactive queue scan fails repeatedly to meet its * target, kill the largest process. */ vm_pageout_mightbe_oom(vmd, page_shortage, starting_page_shortage); The OOM-killer function vm_pageout_oom() is wrapped with function vm_pageou= t_mightbe_oom(). To know from which commit this behavior was changed, I search the FreeBSD S= VN page and find a clue. https://svnweb.freebsd.org/base?view=3Drevision&revision=3D290920 In SVN commit r290920, a new sysctl node called vm.pageout_oom_seq was adde= d to control the sensitivity of OOM-killer. The default value of pageout_oom_seq is 12, the commit log said: The number of passes to trigger OOM was selected empirically and tested both on small (32M-64M i386 VM) and large (32G amd64) configurations. However, in my case, even vm.pageout_oom_seq is 12 by default, it didn't wo= rk as expected. I doubt it's a bug, but I'm not pretty sure since I can't fully understand = these codes. I just want OOM-killer behaving on FreeBSD 11.0 like FreeBSD 9.3 does. Is there anyone know how to solve it? Thanks! From owner-freebsd-hackers@freebsd.org Mon Sep 11 04:37:16 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D0A5BE19AA5 for ; Mon, 11 Sep 2017 04:37:16 +0000 (UTC) (envelope-from auryn@zirakzigil.org) Received: from mx1.etoilesoft.fr (mx1.etoilesoft.fr [52.57.51.18]) by mx1.freebsd.org (Postfix) with ESMTP id 98CFF6F583 for ; Mon, 11 Sep 2017 04:37:16 +0000 (UTC) (envelope-from auryn@zirakzigil.org) Received: from mx1.etoilesoft.fr (localhost [127.0.0.1]) by mx1.etoilesoft.fr (Postfix) with ESMTP id C96799C945 for ; Mon, 11 Sep 2017 04:38:08 +0000 (UTC) Received: from [172.25.93.65] (localhost [127.0.0.1]) (Authenticated sender: auryn@zirakzigil.org) by mx1.etoilesoft.fr (Postfix) with ESMTPA id 6FC629C944 for ; Mon, 11 Sep 2017 04:38:08 +0000 (UTC) Subject: Re: OOM-killer can't work on FreeBSD 11.0 To: freebsd-hackers@freebsd.org References: From: Giulio Ferro Message-ID: Date: Mon, 11 Sep 2017 06:37:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: fr X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Sep 2017 04:37:16 -0000 On 11/09/2017 05:56, Zhixin Wan via freebsd-hackers wrote: > Hi, > > I have a mail system running FreeBSD 9.3 which is put on VMWare ESXi, it's assigned a low memory (1G or 2G) and a reasonable swap disk size (2 x Memory size). > The mail system was running for several years, and didn't see any freeze even a lot of mail traffic through it. > > Recently I upgraded this mail system from FreeBSD 9.3 to FreeBSD 11.0, and after running a few days, the mail system got freeze. I can't get any response from the console, > and can't login to the mail system with SSH either, except ping to the system got response. I look into the message log and found a lot of messages: I don't know if it's relevant with your problem, but I experienced exactly the same problem on a Freebsd 11 EC2 instance on AWS. If I set up a swap, the system would freeze after some time (from a few hours to a few days). I solved by augmenting the RAM allocated and removing the swap... Cheers Giulio From owner-freebsd-hackers@freebsd.org Mon Sep 11 04:42:04 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1F98DE1A047 for ; Mon, 11 Sep 2017 04:42:04 +0000 (UTC) (envelope-from auryn@zirakzigil.org) Received: from mx1.etoilesoft.fr (mx1.etoilesoft.fr [52.57.51.18]) by mx1.freebsd.org (Postfix) with ESMTP id D3EC96FB17 for ; Mon, 11 Sep 2017 04:42:03 +0000 (UTC) (envelope-from auryn@zirakzigil.org) Received: from mx1.etoilesoft.fr (localhost [127.0.0.1]) by mx1.etoilesoft.fr (Postfix) with ESMTP id CFB149C945 for ; Mon, 11 Sep 2017 04:43:00 +0000 (UTC) Received: from [172.25.93.65] (localhost [127.0.0.1]) (Authenticated sender: auryn@zirakzigil.org) by mx1.etoilesoft.fr (Postfix) with ESMTPA id 8A6BE9C944 for ; Mon, 11 Sep 2017 04:43:00 +0000 (UTC) Subject: Re: devd in jail To: freebsd-hackers@freebsd.org References: <20170810225439.Horde.1s8Qi_dlNtxgEigsNKbdrer@webmail.leidinger.net> <4a1a99a5-35ea-19c9-7ac8-77875ac6f71f@zirakzigil.org> <20170905151537.Horde.10cHNOX1OVri7mGaUcDeX1l@webmail.leidinger.net> <7ca865ee-b613-2f0c-daf0-d828884b5e74@zirakzigil.org> <1C181EF2-B8B1-4F42-BF80-ABEA0593DD43@dsl-only.net> <20170906122556.Horde.5OdDwtii7HXPNArY77YUyBi@webmail.leidinger.net> <20170906221947.Horde.RITHvdc1wVE9v0-3nBavR0Z@webmail.leidinger.net> <20170909150335.Horde.wBLIPwBuhV3lyQlBxKud39f@webmail.leidinger.net> From: Giulio Ferro Message-ID: <27e72cfb-54cf-4af8-b569-85fff089c45f@zirakzigil.org> Date: Mon, 11 Sep 2017 06:42:01 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170909150335.Horde.wBLIPwBuhV3lyQlBxKud39f@webmail.leidinger.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: fr X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Sep 2017 04:42:04 -0000 On 09/09/2017 15:03, Alexander Leidinger wrote: > Please run this: > strings /boot/kernel/kernel| grep allow.kmem > > If it doesn't print out "allow.kmem_access", then your kernel doesn't > contain the patch. > > Bye, > Alexander. > # strings /boot/kernel/kernel | grep allow.kmem allow.kmem_access So it seems the kernel is ok... Maybe I can set this value at boot in /boot/loader.conf? From owner-freebsd-hackers@freebsd.org Mon Sep 11 08:08:45 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97855E21FED for ; Mon, 11 Sep 2017 08:08:45 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 17E5575961 for ; Mon, 11 Sep 2017 08:08:44 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v8B88aM3050997 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 11 Sep 2017 11:08:36 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v8B88aM3050997 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v8B88at7050996; Mon, 11 Sep 2017 11:08:36 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 11 Sep 2017 11:08:36 +0300 From: Konstantin Belousov To: Zhixin Wan Cc: "freebsd-hackers@freebsd.org" Subject: Re: OOM-killer can't work on FreeBSD 11.0 Message-ID: <20170911080836.GB6477@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.0 (2017-09-02) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Sep 2017 08:08:45 -0000 On Mon, Sep 11, 2017 at 03:56:45AM +0000, Zhixin Wan via freebsd-hackers wrote: > Hi, > > I have a mail system running FreeBSD 9.3 which is put on VMWare ESXi, it's assigned a low memory (1G or 2G) and a reasonable swap disk size (2 x Memory size). > The mail system was running for several years, and didn't see any freeze even a lot of mail traffic through it. > > Recently I upgraded this mail system from FreeBSD 9.3 to FreeBSD 11.0, and after running a few days, the mail system got freeze. I can't get any response from the console, > and can't login to the mail system with SSH either, except ping to the system got response. I look into the message log and found a lot of messages: > > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(5): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager: out of swap space > swap_pager_getswapspace(1): failed > swap_pager_getswapspace(16): failed > swap_pager_getswapspace(12): failed > swap_pager_getswapspace(9): failed > swap_pager_getswapspace(16): failed > ... > > It seems that the out of swap cause the system freeze. > > To figure out this problem, restore the mail system to previous backup snapshot which is running on FreeBSD 9.3. > Put mail traffic pressure on the mail system, and observe the memory and swap space usage with a simple shell: > > #!/bin/sh > while [ 1 ]; do > vmstat > pstat -s > sleep 60 > done > > >From the console, I saw the memory and swap space usage increased quickly. Once the swap space was eat out, > out of swap messages will be shown in message log: > > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(6): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(5): failed > swap_pager_getswapspace(8): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(4): failed > Sep 6 08:30:58 mail-system kernel: pid 92324 (bm_scanner), uid 5500, was killed: out of swap space > > Compared to FreeBSD 11.0, there are still a lot of "swap_pager_getswapspace failed" messages, except FreeBSD 9.3 will kill a process to free memory. > This behavior cause the mail system can go on running, but FreeBSD 11.0 can't. Observe the system memory and swap space usage continuously, > the OOM-killer works accurately: once the swap space usage is 100%, the OOM-killer will be called to kill a process to free memory. No, this is not the right behaviour. Filling up the swap space must not cause the OOM to trigger (in the default setup of swap overcommit turned off). > > Dig into the source code of FreeBSD 9.3, file vm_pageout.c, function vm_pageout_scan(): > /* > * If we are critically low on one of RAM or swap and low on > * the other, kill the largest process. However, we avoid > * doing this on the first pass in order to give ourselves a > * chance to flush out dirty vnode-backed pages and to allow > * active pages to be moved to the inactive queue and reclaimed. > */ > if (pass != 0 && > ((swap_pager_avail < 64 && vm_page_count_min()) || > (swap_pager_full && vm_paging_target() > 0))) > vm_pageout_oom(VM_OOM_MEM); > > the corresponding source code in FreeBSD 11.0, file vm_pageout.c, function vm_pageout_scan(): > /* > * If the inactive queue scan fails repeatedly to meet its > * target, kill the largest process. > */ > vm_pageout_mightbe_oom(vmd, page_shortage, starting_page_shortage); > > The OOM-killer function vm_pageout_oom() is wrapped with function vm_pageout_mightbe_oom(). > > To know from which commit this behavior was changed, I search the FreeBSD SVN page and find a clue. > https://svnweb.freebsd.org/base?view=revision&revision=290920 > In SVN commit r290920, a new sysctl node called vm.pageout_oom_seq was added to control the sensitivity of OOM-killer. > The default value of pageout_oom_seq is 12, the commit log said: > The number of passes to trigger OOM was selected empirically and > tested both on small (32M-64M i386 VM) and large (32G amd64) > configurations. > > However, in my case, even vm.pageout_oom_seq is 12 by default, it didn't work as expected. So lower the sysctl. Lower the value, more sensitive OOM is to the lack of the pagedaemon progress. > I doubt it's a bug, but I'm not pretty sure since I can't fully understand these codes. > I just want OOM-killer behaving on FreeBSD 11.0 like FreeBSD 9.3 does. FreeBSD 9 OOM behavior was buggy, it caused serious issues on small machines and on swap-less setups. New OOM trigger might require some manual tuning for specific combination of workload and machine config. > Is there anyone know how to solve it? > > Thanks! > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@freebsd.org Mon Sep 11 09:21:38 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CCCBBE0003F for ; Mon, 11 Sep 2017 09:21:38 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-10.reflexion.net [208.70.210.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9069E7D483 for ; Mon, 11 Sep 2017 09:21:38 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 12934 invoked from network); 11 Sep 2017 09:21:37 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 11 Sep 2017 09:21:37 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.2) with SMTP; Mon, 11 Sep 2017 05:21:37 -0400 (EDT) Received: (qmail 11245 invoked from network); 11 Sep 2017 09:21:36 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 11 Sep 2017 09:21:36 -0000 Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 163F0EC86EF; Mon, 11 Sep 2017 02:21:36 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: head -r323246 Pine64+ 2GB boot time context: acquiring blockable sleep lock with spinlock or critical section held for data_abort calling pmap_fault calling __mtx_lock_flags Date: Mon, 11 Sep 2017 02:21:35 -0700 References: <8419C238-702D-4BF7-89DB-EC649CD405A5@dsl-only.net> <9DB26517-E4E0-4B2A-9855-9F7381AD4C66@dsl-only.net> To: FreeBSD Toolchain , freebsd-arm , freebsd-hackers In-Reply-To: <9DB26517-E4E0-4B2A-9855-9F7381AD4C66@dsl-only.net> Message-Id: X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Sep 2017 09:21:38 -0000 [I got another blockable sleep lock panic during the Pine64+ 2GB boot, this time with ddb> input working. I show both the older example and the new one.] On 2017-Sep-10, at 3:25 PM, Mark Millard wrote: > [I got a boot-time panic with a debug kernel that > reported a "acquiring blockable sleep lock with > spinlock or critical section held (sleep mutex)". > This was for data_abort calling pmap_fault calling > __mtx_lock_flags . I first include prior non-debug > kernel reports in case they are related.] >=20 > . . . >=20 > . . . > Release APs > APs not started > CPU 0: ARM Cortex-A53 r0p4 affinity: 0 > Instruction Set Attributes 0 =3D > Instruction Set Attributes 1 =3D <0> > Processor Features 0 =3D > Processor Features 1 =3D <0> > Memory Model Features 0 =3D <4k Granule,64k = Granule,MixedEndian,S/NS Mem,16bit ASID,1TB PA> > Memory Model Features 1 =3D <> > Debug Features 0 =3D <2 CTX Breakpoints,4 Watchpoints,6 = Breakpoints,PMUv3,Debug v8> > Debug Features 1 =3D <0> > Auxiliary Features 0 =3D <0> > Auxiliary Features 1 =3D <0> > CPU 1: (null) (null) r0p0 affinity: 0 > CPU 2: (null) (null) r0p0 affinity: 0 > CPU 3: (null) (null) r0p0 affinity: 0 > panic: acquiring blockable sleep lock with spinlock or critical = section held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710 > cpuid =3D 0 > time =3D 13 > KDB: stack backtrace: > db_trace_self() at db_trace_self_wrapper+0x28 > pc =3D 0xffff0000005efc78 lr =3D 0xffff000000088094 > sp =3D 0xffff000069850080 fp =3D 0xffff000069850290 >=20 > db_trace_self_wrapper() at vpanic+0x164 > pc =3D 0xffff000000088094 lr =3D 0xffff00000031764c > sp =3D 0xffff0000698502a0 fp =3D 0xffff000069850310 >=20 > vpanic() at kassert_panic+0x15c > pc =3D 0xffff00000031764c lr =3D 0xffff0000003174e4 > sp =3D 0xffff000069850320 fp =3D 0xffff0000698503e0 >=20 > kassert_panic() at witness_checkorder+0x160 > pc =3D 0xffff0000003174e4 lr =3D 0xffff000000374990 > sp =3D 0xffff0000698503f0 fp =3D 0xffff000069850470 >=20 > witness_checkorder() at __mtx_lock_flags+0xa8 > pc =3D 0xffff000000374990 lr =3D 0xffff0000002f8b7c > sp =3D 0xffff000069850480 fp =3D 0xffff0000698504b0 >=20 > __mtx_lock_flags() at pmap_fault+0x40 > pc =3D 0xffff0000002f8b7c lr =3D 0xffff000000606994 > sp =3D 0xffff0000698504c0 fp =3D 0xffff0000698504e0 >=20 > pmap_fault() at data_abort+0xb8 > pc =3D 0xffff000000606994 lr =3D 0xffff000000608a9c > sp =3D 0xffff0000698504f0 fp =3D 0xffff0000698505a0 >=20 > data_abort() at do_el1h_sync+0xfc > pc =3D 0xffff000000608a9c lr =3D 0xffff0000006088f0 > sp =3D 0xffff0000698505b0 fp =3D 0xffff0000698505e0 >=20 > do_el1h_sync() at handle_el1h_sync+0x74 > pc =3D 0xffff0000006088f0 lr =3D 0xffff0000005f1874 > sp =3D 0xffff0000698505f0 fp =3D 0xffff000069850700 >=20 > handle_el1h_sync() at sched_switch+0x2a8 > pc =3D 0xffff0000005f1874 lr =3D 0xffff00000033f0c8 > sp =3D 0xffff000069850710 fp =3D 0xffff0000698507f0 >=20 > sched_switch() at mi_switch+0x1b8 > pc =3D 0xffff00000033f0c8 lr =3D 0xffff00000032161c > sp =3D 0xffff000069850800 fp =3D 0xffff000069850820 >=20 > mi_switch() at taskqgroup_binder+0x7c > pc =3D 0xffff00000032161c lr =3D 0xffff00000035510c > sp =3D 0xffff000069850830 fp =3D 0xffff000069850860 >=20 > taskqgroup_binder() at gtaskqueue_run_locked+0x104 > pc =3D 0xffff00000035510c lr =3D 0xffff000000354f74 > sp =3D 0xffff000069850870 fp =3D 0xffff0000698508e0 >=20 > gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c > pc =3D 0xffff000000354f74 lr =3D 0xffff000000354d10 > sp =3D 0xffff0000698508f0 fp =3D 0xffff000069850910 >=20 > gtaskqueue_thread_loop() at fork_exit+0x7c > pc =3D 0xffff000000354d10 lr =3D 0xffff0000002dbd3c > sp =3D 0xffff000069850920 fp =3D 0xffff000069850950 >=20 > fork_exit() at fork_trampoline+0x10 > pc =3D 0xffff0000002dbd3c lr =3D 0xffff000000608664 > sp =3D 0xffff000069850960 fp =3D 0x0000000000000000 >=20 > KDB: enter: panic > [ thread pid 0 tid 100058 ] > Stopped at sched_switch+0x2b8: ldrb w9, [x8, #894] > db>=20 >=20 > Unfortunately it was not taking console input so that is > all I got. =46rom the new example: CPU 1: (null) (null) r0p0 affinity: 0 CPU 2: (null) (null) r0p0 affinity: 0 CPU 3: (null) (null) r0p0 affinity: 0 panic: acquiring blockable sleep lock with spinlock or critical section = held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710 cpuid =3D 0 time =3D 13 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc =3D 0xffff0000005efc78 lr =3D 0xffff000000088094 sp =3D 0xffff000069850080 fp =3D 0xffff000069850290 db_trace_self_wrapper() at vpanic+0x164 pc =3D 0xffff000000088094 lr =3D 0xffff00000031764c sp =3D 0xffff0000698502a0 fp =3D 0xffff000069850310 vpanic() at kassert_panic+0x15c pc =3D 0xffff00000031764c lr =3D 0xffff0000003174e4 sp =3D 0xffff000069850320 fp =3D 0xffff0000698503e0 kassert_panic() at witness_checkorder+0x160 pc =3D 0xffff0000003174e4 lr =3D 0xffff000000374990 sp =3D 0xffff0000698503f0 fp =3D 0xffff000069850470 witness_checkorder() at __mtx_lock_flags+0xa8 pc =3D 0xffff000000374990 lr =3D 0xffff0000002f8b7c sp =3D 0xffff000069850480 fp =3D 0xffff0000698504b0 __mtx_lock_flags() at pmap_fault+0x40 pc =3D 0xffff0000002f8b7c lr =3D 0xffff000000606994 sp =3D 0xffff0000698504c0 fp =3D 0xffff0000698504e0 pmap_fault() at data_abort+0xb8 pc =3D 0xffff000000606994 lr =3D 0xffff000000608a9c sp =3D 0xffff0000698504f0 fp =3D 0xffff0000698505a0 data_abort() at do_el1h_sync+0xfc pc =3D 0xffff000000608a9c lr =3D 0xffff0000006088f0 sp =3D 0xffff0000698505b0 fp =3D 0xffff0000698505e0 do_el1h_sync() at handle_el1h_sync+0x74 pc =3D 0xffff0000006088f0 lr =3D 0xffff0000005f1874 sp =3D 0xffff0000698505f0 fp =3D 0xffff000069850700 handle_el1h_sync() at sched_switch+0x2a8 pc =3D 0xffff0000005f1874 lr =3D 0xffff00000033f0c8 sp =3D 0xffff000069850710 fp =3D 0xffff0000698507f0 sched_switch() at mi_switch+0x1b8 pc =3D 0xffff00000033f0c8 lr =3D 0xffff00000032161c sp =3D 0xffff000069850800 fp =3D 0xffff000069850820 mi_switch() at taskqgroup_binder+0x7c pc =3D 0xffff00000032161c lr =3D 0xffff00000035510c sp =3D 0xffff000069850830 fp =3D 0xffff000069850860 taskqgroup_binder() at gtaskqueue_run_locked+0x104 pc =3D 0xffff00000035510c lr =3D 0xffff000000354f74 sp =3D 0xffff000069850870 fp =3D 0xffff0000698508e0 gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c pc =3D 0xffff000000354f74 lr =3D 0xffff000000354d10 sp =3D 0xffff0000698508f0 fp =3D 0xffff000069850910 gtaskqueue_thread_loop() at fork_exit+0x7c pc =3D 0xffff000000354d10 lr =3D 0xffff0000002dbd3c sp =3D 0xffff000069850920 fp =3D 0xffff000069850950 fork_exit() at fork_trampoline+0x10 pc =3D 0xffff0000002dbd3c lr =3D 0xffff000000608664 sp =3D 0xffff000069850960 fp =3D 0x0000000000000000 KDB: enter: panic [ thread pid 0 tid 100058 ] Stopped at sched_switch+0x2b8: ldrb w9, [x8, #894] It turns our that x8 is reported as holding the value zero: db> show regs No such command; use "help" to list available commands db> show reg spsr 0x9600000440000085 x0 0xffff000000ac1000 __pcpu+0x200 x1 0x4 x2 0xffff00000068a5cb $d.4+0x15c x3 0x218 $d.9+0x1d8 x4 0 x5 0x11 x6 0xffff000000a45f20 x7 0x40 $d.14 x8 0 x9 0x5 x10 0xffff0000009a7d88 tdq_cpu+0xe08 x11 0x18 x12 0x1ddc88 x13 0x7ff707d0 x14 0 x15 0x7ff6e010 x16 0x2af8 $d.1+0x122e x17 0x27c0 $d.1+0xef6 x18 0xffff000069850790 x19 0xfffffd0001415a80 x20 0xffff0000009a7c80 tdq_cpu+0xd00 x21 0xffff0000009a6f80 tdq_cpu x22 0xffff0000009a7d1d tdq_cpu+0xd9d x23 0x1 x24 0 x25 0xffff0000009a6f80 tdq_cpu x26 0xffff000000c85000 dpcpu+0x158 x27 0xffff000000c85000 dpcpu+0x158 x28 0 x29 0xffff0000698507f0 lr 0xffff00000033f0cc sched_switch+0x2ac elr 0xffff00000033f0dc sched_switch+0x2bc sp 0xffff000069850790 sched_switch+0x2b8: ldrb w9, [x8, #894] db> show lockchain thread 100058 (pid 0, softirq_1) is on a run queue db> show locks db> show lock db> show locktree db> show sleepqueue db> show sleepq ddb> show sleepchain thread 100058 (pid 0, softirq_1) is on a run queue db> show alllocks Process 0 (kernel) thread 0xffff000000acd500 (100000) exclusive sleep mutex Giant (Giant) r =3D 0 (0xffff000000c5d860) locked = @ /usr/src/sys/kern/kern_module.c:116 db> show allchains chain 1: thread 100049 (pid 18, vmdaemon) sleeping on 0xffff000000aa811c = "psleep" chain 2: thread 100054 (pid 17, laundry: dom0) sleeping on 0xffff000000aa80c4 = "launds" chain 3: thread 100055 (pid 17, uma) sleeping on 0xffff000000aa7b68 "umarcl" chain 4: thread 100047 (pid 16, mmcsd0: mmc/sd card) sleeping on = 0xfffffd0000638800 "mmcsd disk jobqueue" chain 5: thread 100046 (pid 15, soaiod4) sleeping on 0xffff000000a9dbe4 "-" chain 6: thread 100045 (pid 9, soaiod3) sleeping on 0xffff000000a9dbe4 "-" chain 7: thread 100044 (pid 8, soaiod2) sleeping on 0xffff000000a9dbe4 "-" chain 8: thread 100043 (pid 7, soaiod1) sleeping on 0xffff000000a9dbe4 "-" chain 9: thread 100036 (pid 5, sctp_iterator) sleeping on 0xffff000000c7bf20 = "waiting_for_work" chain 10: thread 100028 (pid 14, usbus0) sleeping on 0xffff000040925358 "-" chain 11: thread 100029 (pid 14, usbus0) sleeping on 0xffff0000409253b0 "-" chain 12: thread 100030 (pid 14, usbus0) sleeping on 0xffff000040925408 "-" chain 13: thread 100031 (pid 14, usbus0) sleeping on 0xffff000040925460 "-" chain 14: thread 100032 (pid 14, usbus0) sleeping on 0xffff0000409254b8 "-" chain 15: thread 100025 (pid 4, doneq0) sleeping on 0xffff000000878280 "-" chain 16: thread 100042 (pid 4, scanner) sleeping on 0xffff0000008780c8 "-" chain 17: thread 100024 (pid 3, crypto returns) sleeping on 0xffff000000aa6008 = "crypto_ret_wait" chain 18: thread 100023 (pid 2, crypto) sleeping on 0xffff000000aa5ec0 = "crypto_wait" chain 19: thread 100019 (pid 13, g_event) sleeping on 0xffff000000c6a450 "-" chain 20: thread 100020 (pid 13, g_up) sleeping on 0xffff000000c6a460 "-" chain 21: thread 100021 (pid 13, g_down) sleeping on 0xffff000000c6a458 "-" chain 22: thread 100014 (pid 12, swi4: clock (0)) blocked on lock = 0xffff000000c5d860 (sleep mutex) "Giant" thread 100000 (pid 0, swapper) is on a run queue chain 23: thread 100002 (pid 1, kernel) blocked on lock 0xffff000000c5d860 (sleep = mutex) "Giant" thread 100000 (pid 0, swapper) is on a run queue chain 24: thread 100001 (pid 10, audit) sleeping on 0xffff000000c808e0 = "audit_worker_cv" chain 25: thread 100009 (pid 0, thread taskq) sleeping on 0xfffffd00005f2b00 "-" chain 26: thread 100010 (pid 0, aiod_kick taskq) sleeping on 0xfffffd00005f2a00 = "-" chain 27: thread 100012 (pid 0, kqueue_ctx taskq) sleeping on 0xfffffd00005f2700 = "-" chain 28: thread 100022 (pid 0, firmware taskq) sleeping on 0xfffffd00005f2000 = "-" chain 29: thread 100037 (pid 0, acpi_task_0) sleeping on 0xfffffd00005f1400 "-" chain 30: thread 100038 (pid 0, acpi_task_1) sleeping on 0xfffffd00005f1400 "-" chain 31: thread 100039 (pid 0, acpi_task_2) sleeping on 0xfffffd00005f1400 "-" chain 32: thread 100041 (pid 0, CAM taskq) sleeping on 0xfffffd00005f1e00 "-" chain 33: thread 100056 (pid 0, if_config_tqg_0) sleeping on 0xfffffd00005f1300 = "-" chain 34: thread 100057 (pid 0, softirq_0) sleeping on 0xfffffd00005f1200 "-" chain 35: thread 100059 (pid 0, softirq_2) sleeping on 0xfffffd00005f1000 "-" chain 36: thread 100060 (pid 0, softirq_3) sleeping on 0xfffffd00005f0e00 "-" The code for: panic: acquiring blockable sleep lock with spinlock or critical section = held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710 is the PMAP_LOCK in: int pmap_fault(pmap_t pmap, uint64_t esr, uint64_t far) { #ifdef SMP uint64_t par; #endif =20 switch (ESR_ELx_EXCEPTION(esr)) { case EXCP_DATA_ABORT_L: case EXCP_DATA_ABORT: break; default: return (KERN_FAILURE); } =20 #ifdef SMP PMAP_LOCK(pmap); switch (esr & ISS_DATA_DFSC_MASK) { case ISS_DATA_DFSC_TF_L0: case ISS_DATA_DFSC_TF_L1: case ISS_DATA_DFSC_TF_L2: case ISS_DATA_DFSC_TF_L3: /* Ask the MMU to check the address */ if (pmap =3D=3D kernel_pmap) par =3D arm64_address_translate_s1e1r(far); else par =3D arm64_address_translate_s1e0r(far); =20 /* * If the translation was successful the address was = invalid * due to a break-before-make sequence. We can unlock = and * return success to the trap handler. */ if (PAR_SUCCESS(par)) { PMAP_UNLOCK(pmap); return (KERN_SUCCESS); } break; default: break; } PMAP_UNLOCK(pmap); #endif =20 return (KERN_FAILURE); } =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Mon Sep 11 14:13:27 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5F70EE0DF7E for ; Mon, 11 Sep 2017 14:13:27 +0000 (UTC) (envelope-from Alexander@leidinger.net) Received: from mailgate.Leidinger.net (mailgate.leidinger.net [IPv6:2a00:1828:2000:375::1:5]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0DA172E66 for ; Mon, 11 Sep 2017 14:13:27 +0000 (UTC) (envelope-from Alexander@leidinger.net) Date: Mon, 11 Sep 2017 16:12:53 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=leidinger.net; s=outgoing-alex; t=1505139204; bh=zswE8X7OgZ2isCn4apHQMRVx3L5MHkFWHpmVbFh6NJw=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=eiiRlJtwscFEAxekzEEdxtweLSI7ruuLwMh+eal2F1wLsVrJu8nOwOifDvH9nDjWL lBlHAKWqWQ2S0MERMA26bggA9hxG0/G/x3DvjmL5+bVJk1YmRdYpVbAamyCXUi19NB rUBuzU0VJxcYN38FWOMRd3UvBAOn1qYKKP4SPYBKYPF/aP286b/32sKQqG1fMz+4iJ YJxvx+vz5Dvfd3UgTuzh17YqGGaL2Ichjcgoq05NdgsrU8MLSOec0zS1zP3ygpYSKV FusYI6U14zawOMaEW5HrF8zC9SzYR9SkICLsV0DnFCmW/7V/vpHbRiAcBs6ILOG6mj p7CpN2nHiMARg== Message-ID: <20170911161253.Horde.vawLu00EtbbHOVeJRXjp7N0@webmail.leidinger.net> From: Alexander Leidinger To: Giulio Ferro Cc: freebsd-hackers@freebsd.org Subject: Re: devd in jail References: <20170810225439.Horde.1s8Qi_dlNtxgEigsNKbdrer@webmail.leidinger.net> <4a1a99a5-35ea-19c9-7ac8-77875ac6f71f@zirakzigil.org> <20170905151537.Horde.10cHNOX1OVri7mGaUcDeX1l@webmail.leidinger.net> <7ca865ee-b613-2f0c-daf0-d828884b5e74@zirakzigil.org> <1C181EF2-B8B1-4F42-BF80-ABEA0593DD43@dsl-only.net> <20170906122556.Horde.5OdDwtii7HXPNArY77YUyBi@webmail.leidinger.net> <20170906221947.Horde.RITHvdc1wVE9v0-3nBavR0Z@webmail.leidinger.net> <20170909150335.Horde.wBLIPwBuhV3lyQlBxKud39f@webmail.leidinger.net> <27e72cfb-54cf-4af8-b569-85fff089c45f@zirakzigil.org> In-Reply-To: <27e72cfb-54cf-4af8-b569-85fff089c45f@zirakzigil.org> User-Agent: Horde Application Framework 5 Content-Type: multipart/signed; boundary="=_gn425jJPn-rBq0VmMCJoYYq"; protocol="application/pgp-signature"; micalg=pgp-sha1 MIME-Version: 1.0 X-Mailman-Approved-At: Mon, 11 Sep 2017 16:02:29 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Sep 2017 14:13:27 -0000 This message is in MIME format and has been PGP signed. --=_gn425jJPn-rBq0VmMCJoYYq Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoting Giulio Ferro (from Mon, 11 Sep 2017=20=20 06:42:01=20+0200): > On 09/09/2017 15:03, Alexander Leidinger wrote: >> Please run this: >> strings /boot/kernel/kernel| grep allow.kmem >> >> If it doesn't print out "allow.kmem_access", then your kernel=20=20 >>=20doesn't contain the patch. >> >> Bye, >> Alexander. >> > > # strings /boot/kernel/kernel | grep allow.kmem > allow.kmem_access > > So it seems the kernel is ok... > > > Maybe I can set this value at boot in /boot/loader.conf? No, this is not a loader.conf setting. Can you try to use "old style" jail config =3D settings in rc.conf=20=20 instead=20of using jail.conf on a test system? It may be that you need=20= =20 to=20move away the jail.conf temporary, I haven't checked what takes=20=20 precedence=20when both (rc.conf and jail.conf) settings are there. Bye, Alexander. --=20 http://www.Leidinger.net=20Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF --=_gn425jJPn-rBq0VmMCJoYYq Content-Type: application/pgp-signature Content-Description: Digitale PGP-Signatur Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJZtpnlAAoJEKrxQhqFIICERUsQAKYM4sp3/FpasVH4ARW5VsxE upikzSNmKAo8e2C40Y19JvH/pbQ8TaqXDnnQ/UcGGAeHlX9l624N6sbplbRLyvAC +pqo5VIyIDLN/XV/dey3cez8XUFxSO2SgfCj5UjQ0B4OKEg3ZIZWx19QXYtu524F ZcquaaPDqJPGtiQJ5Rn8pPCjgjVrJHEmUSc7pRiTREHv8ipeCtN8cfeprgq0Fh+l MrZVSfzvaso+KtLdzF4ODlvewh4ww+nLElJ79axrUM1EIDTgrjBTdOhO5hxMLAqS x/hqfrmwE//3J4oCKgL4YV0deHEfFySghns+n059neFrdJyJt7sY3VULha6+fZ/D K7VGtCdsmz7afxc2WtzsLnENUPxdUc8z/CrFO9bwSlaF39QgRDN2hksmy4uyB6+S dTEtq5pj2AIcUVrNDgKcpJWmVtGsJaIYqNZK4O0sHp3uu5uPHgkQYyAeqFVIJ9Li R2eyr/7GVp3ZVLgAqVfY7KQr2dhOltiN2ws+h8lT6uCcKKDCUOlgV6xnOzNQ/zlo uz931CuAl2WCog+Oy+JF5bnYcucRayUC/uktqjuXE5hj8VTjNXhqbrjXS11K2lYv ZIv7IT1sgS1X6IWFYQlRZYihM/XSDbAOrODYPcdd42Vmm/nEI54fAtdAY4hVxhCF 82lrPkRcWUqxQTTjfcIb =8Scq -----END PGP SIGNATURE----- --=_gn425jJPn-rBq0VmMCJoYYq-- From owner-freebsd-hackers@freebsd.org Mon Sep 11 18:56:58 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8854BE1D5C3 for ; Mon, 11 Sep 2017 18:56:58 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-66.reflexion.net [208.70.210.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4B03C6DD2B for ; Mon, 11 Sep 2017 18:56:58 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 30028 invoked from network); 11 Sep 2017 18:56:57 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 11 Sep 2017 18:56:57 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.2) with SMTP; Mon, 11 Sep 2017 14:56:57 -0400 (EDT) Received: (qmail 19537 invoked from network); 11 Sep 2017 18:56:57 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 11 Sep 2017 18:56:57 -0000 Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 4786CEC94BF; Mon, 11 Sep 2017 11:56:56 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: FYI: I have submitted an intermittent -r323246 debug-kernel panic problem for the Pine64+ 2GB context (so A64): bugzilla 222234 Message-Id: <99F9944A-E03C-45F5-8F6C-C3DD1E76D328@dsl-only.net> Date: Mon, 11 Sep 2017 11:56:55 -0700 To: Emmanuel Vadot , freebsd-arm , FreeBSD Current , freebsd-hackers X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Sep 2017 18:56:58 -0000 [Note: I've jumped from way back around -r308??? to -r323246 finally. The -r323246 is an example but I've no clue what range of revisions of head also show the issue. But before this jump I'd never seen such a boot-panic.] The content of the description is: Based on a head -r323246 debug kernel build: Occasionally when I boot the Pine64+ 2GB I get: panic: acquiring blockable sleep lock with spinlock or critical section = held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710 This is the PMAP_LOCK in: int pmap_fault(pmap_t pmap, uint64_t esr, uint64_t far) . . . It reports: [ thread pid 0 tid 100058 ] Stopped at sched_switch+0x2b8: ldrb w9, [x8, #894] It turns our that x8 is reported as holding the value zero: db> show reg . . . x8 0 . . . The back trace is: . . . (a little text given a clue about where in the boot sequence) . . = . CPU 1: (null) (null) r0p0 affinity: 0 CPU 2: (null) (null) r0p0 affinity: 0 CPU 3: (null) (null) r0p0 affinity: 0 panic: acquiring blockable sleep lock with spinlock or critical section = held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710 cpuid =3D 0 time =3D 13 KDB: stack backtrace: db_trace_self() at db_trace_self_wrapper+0x28 pc =3D 0xffff0000005efc78 lr =3D 0xffff000000088094 sp =3D 0xffff000069850080 fp =3D 0xffff000069850290 db_trace_self_wrapper() at vpanic+0x164 pc =3D 0xffff000000088094 lr =3D 0xffff00000031764c sp =3D 0xffff0000698502a0 fp =3D 0xffff000069850310 vpanic() at kassert_panic+0x15c pc =3D 0xffff00000031764c lr =3D 0xffff0000003174e4 sp =3D 0xffff000069850320 fp =3D 0xffff0000698503e0 kassert_panic() at witness_checkorder+0x160 pc =3D 0xffff0000003174e4 lr =3D 0xffff000000374990 sp =3D 0xffff0000698503f0 fp =3D 0xffff000069850470 witness_checkorder() at __mtx_lock_flags+0xa8 pc =3D 0xffff000000374990 lr =3D 0xffff0000002f8b7c sp =3D 0xffff000069850480 fp =3D 0xffff0000698504b0 __mtx_lock_flags() at pmap_fault+0x40 pc =3D 0xffff0000002f8b7c lr =3D 0xffff000000606994 sp =3D 0xffff0000698504c0 fp =3D 0xffff0000698504e0 pmap_fault() at data_abort+0xb8 pc =3D 0xffff000000606994 lr =3D 0xffff000000608a9c sp =3D 0xffff0000698504f0 fp =3D 0xffff0000698505a0 data_abort() at do_el1h_sync+0xfc pc =3D 0xffff000000608a9c lr =3D 0xffff0000006088f0 sp =3D 0xffff0000698505b0 fp =3D 0xffff0000698505e0 do_el1h_sync() at handle_el1h_sync+0x74 pc =3D 0xffff0000006088f0 lr =3D 0xffff0000005f1874 sp =3D 0xffff0000698505f0 fp =3D 0xffff000069850700 handle_el1h_sync() at sched_switch+0x2a8 pc =3D 0xffff0000005f1874 lr =3D 0xffff00000033f0c8 sp =3D 0xffff000069850710 fp =3D 0xffff0000698507f0 sched_switch() at mi_switch+0x1b8 pc =3D 0xffff00000033f0c8 lr =3D 0xffff00000032161c sp =3D 0xffff000069850800 fp =3D 0xffff000069850820 mi_switch() at taskqgroup_binder+0x7c pc =3D 0xffff00000032161c lr =3D 0xffff00000035510c sp =3D 0xffff000069850830 fp =3D 0xffff000069850860 taskqgroup_binder() at gtaskqueue_run_locked+0x104 pc =3D 0xffff00000035510c lr =3D 0xffff000000354f74 sp =3D 0xffff000069850870 fp =3D 0xffff0000698508e0 gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c pc =3D 0xffff000000354f74 lr =3D 0xffff000000354d10 sp =3D 0xffff0000698508f0 fp =3D 0xffff000069850910 gtaskqueue_thread_loop() at fork_exit+0x7c pc =3D 0xffff000000354d10 lr =3D 0xffff0000002dbd3c sp =3D 0xffff000069850920 fp =3D 0xffff000069850950 fork_exit() at fork_trampoline+0x10 pc =3D 0xffff0000002dbd3c lr =3D 0xffff000000608664 sp =3D 0xffff000069850960 fp =3D 0x0000000000000000 See: = https://lists.freebsd.org/pipermail/freebsd-toolchain/2017-September/00330= 0.html for more details. In the console output this seem to be about the same place that the non-debug kernel (typically?) fails. =3D=3D=3D Mark Millard markmi@dsl-only.net From owner-freebsd-hackers@freebsd.org Mon Sep 11 20:55:16 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9B4A5E23D6B for ; Mon, 11 Sep 2017 20:55:16 +0000 (UTC) (envelope-from yaneurabeya@gmail.com) Received: from mail-pg0-x233.google.com (mail-pg0-x233.google.com [IPv6:2607:f8b0:400e:c05::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6908C72765; Mon, 11 Sep 2017 20:55:16 +0000 (UTC) (envelope-from yaneurabeya@gmail.com) Received: by mail-pg0-x233.google.com with SMTP id i130so10102316pgc.3; Mon, 11 Sep 2017 13:55:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:date:message-id:cc:to:mime-version; bh=LtqEyjW71+DMy2K8voa6w4prkUrT4tk0QHBePilQ4us=; b=GXyS5jGmYEzRx9jk/IgFACcI4TLbLPw23U8ExifUbxCBmArSfEffy5wRdV1Or38eya vvDAfDXbBYWImZL9ZNsBZdMgBWVzj5uLBYgaAnziRKgnjJ2Q+0SubwaBTaHMdrormdiK rmZcejEDSxcxiFPHmDxt2TbZ3pEymmHcBhvqoueBkh0KufKo16oaK+ciYyeJ4Z2eZPTJ 1Y2Mi6Pt4Rh7x9wiT68cvCmKiqCr5CYw2O3lT1JNfLMgh641teL2yGg4hC1KnR9/IamU iI+YQZkFdkRSYY4wromWg9VkHamY+RpKTzlaYozKj3IA/6yaJf6gMh+q03VLRl9JiJaB cVWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:date:message-id:cc:to:mime-version; bh=LtqEyjW71+DMy2K8voa6w4prkUrT4tk0QHBePilQ4us=; b=n98nKGbMXKbNijvVQKF4sfRTpAQxRKw4u2mg/hVja+vyaUpjnSZJKgq004PdGpSRGE S8GfeaDTK+vz0np9ZG91A257yl3U6sH5lMSQ5sfhL27gbT7KExwcZ02oNXl46wny0f0J HSB09psIMGODYQWRGdBvHX2WDWFU/721R5zoIdviNW6P1AZmHlfdgGCsVuPy1nF/YSqA wcbU74AvFEIJX/xor9rU7NF6LoTIrQlG4r2D3jJoGGlTYHQEK/wcT9CTcYldwDEwTG0r SIGkTfJ41qwqgOhBB6aZRJMLD25I2bxwJmBzbWfuVovxcOcPNkrDQmK/ctoCKnSVh0Yn kJLg== X-Gm-Message-State: AHPjjUhuC2FJE0KdrsytUMHad4bFLt+aNv++c/ojoAM9YyZ1fv9vY40G zBtLWDWswxNRqVlW1ls= X-Google-Smtp-Source: ADKCNb4ng2uRFV06XrSXYT/4I68gb9RK1wVnukDxFO+rutBOKb7CA/3Wbh+vTcPX7XbmuwSnN2VbgQ== X-Received: by 10.99.37.66 with SMTP id l63mr13094173pgl.348.1505163315667; Mon, 11 Sep 2017 13:55:15 -0700 (PDT) Received: from ?IPv6:2620:10d:c096:122:3532:ef25:91ca:bfe0? ([2620:10d:c090:380::2:4405]) by smtp.gmail.com with ESMTPSA id v71sm17981638pfa.45.2017.09.11.13.55.14 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 11 Sep 2017 13:55:14 -0700 (PDT) From: "Ngie Cooper (yaneurabeya)" X-Pgp-Agent: GPGMail Content-Type: multipart/signed; boundary="Apple-Mail=_C0E698D5-14FF-4B4F-938B-4040A30ECD1A"; protocol="application/pgp-signature"; micalg=pgp-sha512 Subject: Bypassing libcasper (for fun and profit)...? Date: Mon, 11 Sep 2017 13:55:13 -0700 Message-Id: Cc: FreeBSD Hackers To: Mariusz Zaborski Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) X-Mailer: Apple Mail (2.3124) X-Mailman-Approved-At: Mon, 11 Sep 2017 21:00:26 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Sep 2017 20:55:16 -0000 --Apple-Mail=_C0E698D5-14FF-4B4F-938B-4040A30ECD1A Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Mariusz, I=E2=80=99m trying to get runtime coverage to work on = ^/projects/runtime-coverage . I noticed that libcasper is unfortunately = getting in the way of writing .gcda files via install(1) (libgcov and = libprofile_rt hijack binaries and libraries to write out profiling = information at runtime, which doesn=E2=80=99t seem to jive with = libcasper). I assume install(1) won=E2=80=99t be the only program to = have this issue, so a generic solution should probably be implemented. I was wondering if there=E2=80=99s a way that you would like to = design a formalized hook to bypass this, was wondering if you think = conditionally compiling out the functions (as stubs) was the proper way = to do it (security vs flexibility), have the the library implement weak = symbols and override with another library using strong symbols as a = bypass, or use a combination of LD_PRELOAD and hooks to disable = libcasper/turn the calls into warnings (instead of fatal errors). Thanks! -Ngie --Apple-Mail=_C0E698D5-14FF-4B4F-938B-4040A30ECD1A Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJZtvgxAAoJEPWDqSZpMIYV9YEP/iY3RvHkhJckFLICHTNPcuON 2wR2oeptw7IeqKbJI2TZEb/nIVgfQQ4oU9Rhum88L6SC2bWRzSVgYZAyyH5Vrc0T 40s6M5MXS4jiByH22S1kldbQPCjzbhH4kpyAzLT2su0tgGITItQyUANMC76AI/aP KafsNJ0rFmxR8LyCZBE8EicoaZsFuXKCfFiICYbBZbT5Nskdbc8oeAjnQdfwYLcL cKlF7OKMNAExoMF0zgPHe1d2yZg03NDKI5i11zEEi/Kxq8zTiXBNSQZIsC6UwPnZ dklia0AwZuYZdry9HbTwm7ztvrCnK1GNPoc4PKwHYqV92nMd+OXfSC6Y7VwjnKqt jkSEtY8yFykvviRVOHzEtURQeckWQTkp0Xy4k74N0lDefiGA7yBAsFgqzjjja0Mn hxaxkuuSHXoWXLf7f/gF0CEoJuvzuV2ttV3kV+mbWj2F1iAAlU4iDEx0KOgr1ce9 L1ZQaa4EG39hYb3RfVcdeSB7uwm1SlLGNoIximtYCFcwgDbI9b5cn1bbFI9DbwDj 3JMLK1m7ZCjBZeTyYaKXNCpQKQJJDmSZOltK26K2VyunSQcsKmY5spuIvBjjP3i7 WYBfROjGu6BcsXfYe8zNjT2Du77DhrqbFRg5FqH6v6PNoHcglFO69PfHRh7V+A2c XNz5QBzuVqJojBZgEMvf =ytg7 -----END PGP SIGNATURE----- --Apple-Mail=_C0E698D5-14FF-4B4F-938B-4040A30ECD1A-- From owner-freebsd-hackers@freebsd.org Tue Sep 12 01:07:33 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5A77EE0C49F for ; Tue, 12 Sep 2017 01:07:33 +0000 (UTC) (envelope-from Zhixin.Wan@watchguard.com) Received: from XCS01CO.watchguard.com (mx1.watchguard.com [206.191.171.101]) by mx1.freebsd.org (Postfix) with ESMTP id 1CB157EF8C for ; Tue, 12 Sep 2017 01:07:32 +0000 (UTC) (envelope-from Zhixin.Wan@watchguard.com) From: Zhixin Wan To: Konstantin Belousov CC: "freebsd-hackers@freebsd.org" Subject: RE: OOM-killer can't work on FreeBSD 11.0 Thread-Topic: OOM-killer can't work on FreeBSD 11.0 Thread-Index: AdMqp76RyQybtFOlSuGM22kMXJHWTAALWxUAACOL8BA= Date: Tue, 12 Sep 2017 01:07:29 +0000 Message-ID: References: <20170911080836.GB6477@kib.kiev.ua> In-Reply-To: <20170911080836.GB6477@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DM5PR10MB1483; 6:x6hRD8xzReCWXJ4lBgTVvGo7tEn4Fk9ph3CvYLYQUhdd+l/pOURGcCfglZZz/czx3rEv0thQtzx5IjPsvI3Ocy/l3cAOuKiYsnlAMVga3eo2NE0CtiNz2YPlLlOn92L/XuvWqMM7dxBvSvre0/CxrbpwIwOFq3+2qbk8QFT4f5QpAxmgKJwl9eRUMokvqZxFzVIVGLR9qXmdt1op4gZ/xFxjlbRocPm/bz0pJRcWi8Vj65gNRHUMgWJ3ywWqmsR1Hxv31DZR2YNbt1YmDE1jU5FFZ2n+VYw7Sbrk843ZAzu5UpWguBicb0HcGjaYLCdid9Ve/wOTXdZ+Qcm2nxCbmA==; 5:IP7Owxm/OMka00X1bY1Z9aCtdezbFKafPRvMwta+7OfpkUfl0n6txNh8VszqjaYW3CHxuBBC+Bhsl8F2vSvrY4moAjf2QvWi17OidA1EdUq5WCzLHvoWgWQTzjE/zP/eWQs25ZUMNqQZbXok+QnCR8KiZGgrY/Mg/M7CMEyfB1Y=; 24:9iMGN3BiIKsAin6REhKIVDzi5CWlwWQWhvup9vd4AmSCt6AaPCRBBGHpSI5M5ZnLgZxK5PGjI/rAal3/uPQGebFqHimNKc2yLpkOEvW9PQY=; 7:E+2sTZFr4B+GTJVuIocKV4CMk1tPmFBB2JHtnLcUlhPSl6srccWRVs7ProLT4HSH0wxxxTkfOPVwn/ypeUrYPEtkD0rd7xIyX8ES1H/AOHxyGb5agY0g+oV4KsI6wEddGlihqqAc5Emi4Cvdg6JZG4WrdO0iT9WE1Q1StVo0MFuXuhVuOiWQJRZpNe3nTkqlXo8gJWAzyYCKBVVSDfamz00T5z6Z0H4/Pf3QA55NSlM= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 1a0f347e-147d-47e8-0b4d-08d4f97aa3b2 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:DM5PR10MB1483; x-ms-traffictypediagnostic: DM5PR10MB1483: x-exchange-antispam-report-test: UriScan:(190756311086443)(75325880899374)(56005881305849); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(93006095)(93001095)(3002001)(10201501046)(100000703101)(100105400095)(6041248)(20161123558100)(20161123564025)(20161123555025)(20161123560025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DM5PR10MB1483; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DM5PR10MB1483; x-forefront-prvs: 042857DBB5 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39830400002)(189002)(24454002)(13464003)(199003)(9686003)(5660300001)(6306002)(8676002)(4326008)(68736007)(305945005)(7736002)(86362001)(3660700001)(53546010)(99286003)(7696004)(55016002)(3280700002)(39060400002)(14454004)(966005)(478600001)(6246003)(53936002)(81166006)(81156014)(189998001)(6506006)(102836003)(50986999)(6436002)(74316002)(6116002)(3846002)(101416001)(66066001)(2900100001)(1411001)(33656002)(54356999)(76176999)(97736004)(110136004)(77096006)(105586002)(8936002)(72206003)(6916009)(2950100002)(2906002)(106356001)(229853002)(25786009)(316002); DIR:OUT; SFP:1101; SCL:1; SRVR:DM5PR10MB1483; H:DM5PR10MB1754.namprd10.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Sep 2017 01:07:29.1732 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 2563c132-88f5-466f-bbb2-e83153b3c808 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR10MB1483 X-OriginatorOrg: watchguard.com Received-SPF: none X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Sep 2017 01:07:33 -0000 Thanks!=20 I will try to tune this sysctl, let's see what to happen. -----Original Message----- From: Konstantin Belousov [mailto:kostikbel@gmail.com]=20 Sent: Monday, September 11, 2017 16:09 To: Zhixin Wan Cc: freebsd-hackers@freebsd.org Subject: Re: OOM-killer can't work on FreeBSD 11.0 On Mon, Sep 11, 2017 at 03:56:45AM +0000, Zhixin Wan via freebsd-hackers wr= ote: > Hi, >=20 > I have a mail system running FreeBSD 9.3 which is put on VMWare ESXi, it'= s assigned a low memory (1G or 2G) and a reasonable swap disk size (2 x Mem= ory size). > The mail system was running for several years, and didn't see any freeze = even a lot of mail traffic through it. >=20 > Recently I upgraded this mail system from FreeBSD 9.3 to FreeBSD 11.0,=20 > and after running a few days, the mail system got freeze. I can't get any= response from the console, and can't login to the mail system with SSH eit= her, except ping to the system got response. I look into the message log an= d found a lot of messages: >=20 > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(5): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager: out of swap space > swap_pager_getswapspace(1): failed > swap_pager_getswapspace(16): failed > swap_pager_getswapspace(12): failed > swap_pager_getswapspace(9): failed > swap_pager_getswapspace(16): failed > ... >=20 > It seems that the out of swap cause the system freeze. >=20 > To figure out this problem, restore the mail system to previous backup sn= apshot which is running on FreeBSD 9.3. > Put mail traffic pressure on the mail system, and observe the memory and = swap space usage with a simple shell: >=20 > #!/bin/sh > while [ 1 ]; do > vmstat > pstat -s > sleep 60 > done >=20 > >From the console, I saw the memory and swap space usage increased=20 > >quickly. Once the swap space was eat out, > out of swap messages will be shown in message log: >=20 > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(3): failed > swap_pager_getswapspace(4): failed > swap_pager_getswapspace(6): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(5): failed > swap_pager_getswapspace(8): failed > swap_pager_getswapspace(2): failed > swap_pager_getswapspace(4): failed > Sep 6 08:30:58 mail-system kernel: pid 92324 (bm_scanner), uid 5500,=20 > was killed: out of swap space >=20 > Compared to FreeBSD 11.0, there are still a lot of "swap_pager_getswapspa= ce failed" messages, except FreeBSD 9.3 will kill a process to free memory. > This behavior cause the mail system can go on running, but FreeBSD=20 > 11.0 can't. Observe the system memory and swap space usage continuously, = the OOM-killer works accurately: once the swap space usage is 100%, the OOM= -killer will be called to kill a process to free memory. No, this is not the right behaviour. Filling up the swap space must not cau= se the OOM to trigger (in the default setup of swap overcommit turned off). >=20 > Dig into the source code of FreeBSD 9.3, file vm_pageout.c, function vm_p= ageout_scan(): > /* > * If we are critically low on one of RAM or swap and low = on > * the other, kill the largest process. However, we avoid > * doing this on the first pass in order to give ourselves= a > * chance to flush out dirty vnode-backed pages and to all= ow > * active pages to be moved to the inactive queue and recl= aimed. > */ > if (pass !=3D 0 && > ((swap_pager_avail < 64 && vm_page_count_min()) || > (swap_pager_full && vm_paging_target() > 0))) > vm_pageout_oom(VM_OOM_MEM); >=20 > the corresponding source code in FreeBSD 11.0, file vm_pageout.c, functio= n vm_pageout_scan(): > /* > * If the inactive queue scan fails repeatedly to meet its > * target, kill the largest process. > */ > vm_pageout_mightbe_oom(vmd, page_shortage,=20 > starting_page_shortage); >=20 > The OOM-killer function vm_pageout_oom() is wrapped with function vm_page= out_mightbe_oom(). >=20 > To know from which commit this behavior was changed, I search the FreeBSD= SVN page and find a clue. > https://svnweb.freebsd.org/base?view=3Drevision&revision=3D290920 > In SVN commit r290920, a new sysctl node called vm.pageout_oom_seq was ad= ded to control the sensitivity of OOM-killer. > The default value of pageout_oom_seq is 12, the commit log said: > The number of passes to trigger OOM was selected empirically and=20 > tested both on small (32M-64M i386 VM) and large (32G amd64)=20 > configurations. >=20 > However, in my case, even vm.pageout_oom_seq is 12 by default, it didn't = work as expected. So lower the sysctl. Lower the value, more sensitive OOM is to the lack of= the pagedaemon progress. > I doubt it's a bug, but I'm not pretty sure since I can't fully understan= d these codes. > I just want OOM-killer behaving on FreeBSD 11.0 like FreeBSD 9.3 does. FreeBSD 9 OOM behavior was buggy, it caused serious issues on small machine= s and on swap-less setups. New OOM trigger might require some manual tunin= g for specific combination of workload and machine config. > Is there anyone know how to solve it? >=20 > Thanks! >=20 >=20 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list=20 > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " From owner-freebsd-hackers@freebsd.org Tue Sep 12 07:19:10 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 47681E1E11D for ; Tue, 12 Sep 2017 07:19:10 +0000 (UTC) (envelope-from ilovefd@topaz.plala.or.jp) Received: from msa02y.plala.or.jp (msa02.plala.or.jp [58.93.240.2]) by mx1.freebsd.org (Postfix) with ESMTP id C31B16556D for ; Tue, 12 Sep 2017 07:19:08 +0000 (UTC) (envelope-from ilovefd@topaz.plala.or.jp) Received: from msc01.plala.or.jp ([172.23.12.31]) by msa02y.plala.or.jp with ESMTP id <20170912071906.OJAL6311.msa02y.plala.or.jp@msc01.plala.or.jp> for ; Tue, 12 Sep 2017 16:19:06 +0900 Received: from [192.168.22.65] (really [153.167.4.99]) by msc01.plala.or.jp with ESMTP id <20170912071906.GOEQ5677.msc01.plala.or.jp@[192.168.22.65]> for ; Tue, 12 Sep 2017 16:19:06 +0900 From: ilovefd Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: FBSD11.1: Portmaster -d x11/slim x11-themes/slim-themes doesn't work. Message-Id: <7B438374-761B-44B2-B8EC-E74366AEFD12@topaz.plala.or.jp> Date: Tue, 12 Sep 2017 16:19:06 +0900 To: freebsd-hackers@freebsd.org X-Mailer: Apple Mail (2.3273) X-VirusScan: Outbound; msa02m; Tue, 12 Sep 2017 16:19:07 +0900 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Sep 2017 07:19:10 -0000 Hi I am trying to setup MATE GUI on FreeBSD11.1 and have a trouble. M/B Is N3050N-D3H. CPU is Celeron. portmaster -d x11/slim x11-themes/slim-themes Doesn=E2=80=99t work. It=E2=80=99s saying " could not fetch nasa-2.13.01.tar.xz=E2=80=9D. Does my trouble come from CPU? From owner-freebsd-hackers@freebsd.org Tue Sep 12 07:51:03 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 528EDE1FAA8 for ; Tue, 12 Sep 2017 07:51:03 +0000 (UTC) (envelope-from auryn@zirakzigil.org) Received: from mx1.etoilesoft.fr (mx1.etoilesoft.fr [52.57.51.18]) by mx1.freebsd.org (Postfix) with ESMTP id 14019665DF for ; Tue, 12 Sep 2017 07:51:02 +0000 (UTC) (envelope-from auryn@zirakzigil.org) Received: from mx1.etoilesoft.fr (localhost [127.0.0.1]) by mx1.etoilesoft.fr (Postfix) with ESMTP id C2B979C945; Tue, 12 Sep 2017 07:51:55 +0000 (UTC) Received: from [192.168.43.15] (localhost [127.0.0.1]) (Authenticated sender: auryn@zirakzigil.org) by mx1.etoilesoft.fr (Postfix) with ESMTPA id 96EEC9C944; Tue, 12 Sep 2017 07:51:55 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: devd in jail From: Giulio Ferro X-Mailer: iPhone Mail (14G60) In-Reply-To: <20170911161253.Horde.vawLu00EtbbHOVeJRXjp7N0@webmail.leidinger.net> Date: Tue, 12 Sep 2017 09:50:54 +0200 Cc: freebsd-hackers@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <3236AD55-0D14-49A5-B5B9-3147A216D8A5@zirakzigil.org> References: <20170810225439.Horde.1s8Qi_dlNtxgEigsNKbdrer@webmail.leidinger.net> <4a1a99a5-35ea-19c9-7ac8-77875ac6f71f@zirakzigil.org> <20170905151537.Horde.10cHNOX1OVri7mGaUcDeX1l@webmail.leidinger.net> <7ca865ee-b613-2f0c-daf0-d828884b5e74@zirakzigil.org> <1C181EF2-B8B1-4F42-BF80-ABEA0593DD43@dsl-only.net> <20170906122556.Horde.5OdDwtii7HXPNArY77YUyBi@webmail.leidinger.net> <20170906221947.Horde.RITHvdc1wVE9v0-3nBavR0Z@webmail.leidinger.net> <20170909150335.Horde.wBLIPwBuhV3lyQlBxKud39f@webmail.leidinger.net> <27e72cfb-54cf-4af8-b569-85fff089c45f@zirakzigil.org> <20170911161253.Horde.vawLu00EtbbHOVeJRXjp7N0@webmail.leidinger.net> To: Alexander Leidinger X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Sep 2017 07:51:03 -0000 Hi Alexander, I don't know how to set your parameter in old style jails in rc.conf In there, new parameters are mapped to old style jail_xxx entries, but there= are none for your parameter unless I'm mistaken. Can you please tell me exactly what I should put in rc.conf? I've already mo= ved the jail.conf file Thanks. Giulio=20 > Il giorno 11 set 2017, alle ore 16:12, Alexander Leidinger ha scritto: >=20 > Quoting Giulio Ferro (from Mon, 11 Sep 2017 06:42:0= 1 +0200): >=20 >>> On 09/09/2017 15:03, Alexander Leidinger wrote: >>> Please run this: >>> strings /boot/kernel/kernel| grep allow.kmem >>>=20 >>> If it doesn't print out "allow.kmem_access", then your kernel doesn't co= ntain the patch. >>>=20 >>> Bye, >>> Alexander. >>>=20 >>=20 >> # strings /boot/kernel/kernel | grep allow.kmem >> allow.kmem_access >>=20 >> So it seems the kernel is ok... >>=20 >>=20 >> Maybe I can set this value at boot in /boot/loader.conf? >=20 > No, this is not a loader.conf setting. >=20 > Can you try to use "old style" jail config =3D settings in rc.conf instead= of using jail.conf on a test system? It may be that you need to move away t= he jail.conf temporary, I haven't checked what takes precedence when both (r= c.conf and jail.conf) settings are there. >=20 > Bye, > Alexander. >=20 > --=20 > http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF > http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF From owner-freebsd-hackers@freebsd.org Wed Sep 13 15:55:51 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6F9B0E026C2 for ; Wed, 13 Sep 2017 15:55:51 +0000 (UTC) (envelope-from mwlucas@mail.michaelwlucas.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 587916ADCC for ; Wed, 13 Sep 2017 15:55:51 +0000 (UTC) (envelope-from mwlucas@mail.michaelwlucas.com) Received: by mailman.ysv.freebsd.org (Postfix) id 57CF4E026C1; Wed, 13 Sep 2017 15:55:51 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5736AE026BF for ; Wed, 13 Sep 2017 15:55:51 +0000 (UTC) (envelope-from mwlucas@mail.michaelwlucas.com) Received: from mail.michaelwlucas.com (mail.michaelwlucas.com [104.236.197.233]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2079A6ADCB for ; Wed, 13 Sep 2017 15:55:50 +0000 (UTC) (envelope-from mwlucas@mail.michaelwlucas.com) Received: from mail.michaelwlucas.com (localhost [127.0.0.1]) by mail.michaelwlucas.com (8.15.2/8.15.2) with ESMTP id v8DFtgiO025935 for ; Wed, 13 Sep 2017 11:55:43 -0400 (EDT) (envelope-from mwlucas@mail.michaelwlucas.com) Received: (from mwlucas@localhost) by mail.michaelwlucas.com (8.15.2/8.15.2/Submit) id v8DFtgOb025934 for hackers@freebsd.org; Wed, 13 Sep 2017 11:55:42 -0400 (EDT) (envelope-from mwlucas) Date: Wed, 13 Sep 2017 11:55:42 -0400 From: "Michael W. Lucas" To: hackers@freebsd.org Subject: required kernel rebuilds Message-ID: <20170913155542.GA25871@mail.michaelwlucas.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.8.3 (2017-05-23) X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail.michaelwlucas.com X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mail.michaelwlucas.com [127.0.0.1]); Wed, 13 Sep 2017 11:55:44 -0400 (EDT) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Sep 2017 15:55:51 -0000 Hi, Book research question. Way back in the Dark Ages of the 1990s, it wasn't uncommon to rebuild a kernel to fix a recurring panic. You'd have to tune MAXUSERS, or PMAP_SHPGPERPROC. AFAIK, dang near everything is tunable in either loader.conf or sysctl.conf. I really want to say that kernel rebuilds for these kinds of limits aren't needed any more. Does anyone have a counter-example, though? Is anything possibly crash-inducing non-tunable without a kernel rebuild? Thanks, ==ml -- Michael W. Lucas https://mwl.io/ nonfiction: https://www.michaelwlucas.com/ fiction: https://www.michaelwarrenlucas.com/ From owner-freebsd-hackers@freebsd.org Wed Sep 13 16:07:36 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EA391E03279 for ; Wed, 13 Sep 2017 16:07:36 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id C72F06B813 for ; Wed, 13 Sep 2017 16:07:36 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id C31A5E03278; Wed, 13 Sep 2017 16:07:36 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C257AE03277 for ; Wed, 13 Sep 2017 16:07:36 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-lf0-x22b.google.com (mail-lf0-x22b.google.com [IPv6:2a00:1450:4010:c07::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 473316B812 for ; Wed, 13 Sep 2017 16:07:36 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-lf0-x22b.google.com with SMTP id l196so1895044lfl.1 for ; Wed, 13 Sep 2017 09:07:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=6gv/QkK4PMg+Lew1BRf18M+unKNfByaApZSM0kwUBg0=; b=bHAyglC2xlnUUBS69KYKN2KUTjvfT+2sw6tB5NKkt/jlvZ6uWoti8Mua57JohxMQBS lIQLZm7t1kLlD6ijyz7t3gAMeyM/raAccEYwMGBspZ2WZo5hcwXyAhMDQyv7T+wrGxn2 0Pi4J0nncxjINwALU/68hVlzi7N2tA6FhY2zOCyGs6ZQOYGxv+NUGGbaWx68tUZgI5Fa UGSfl2spwiCvWjUutlkl6hZXPd90qzQgonbnqKAlc0ZVx9a56zeaY7STVzoZsdVSi60O Qj3VxMip6AydiVeIGIfbb+K/3zEW60AOAMJnaV76ymPndVd7g1MxejYCpVqJBiB3JNZ/ k0FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=6gv/QkK4PMg+Lew1BRf18M+unKNfByaApZSM0kwUBg0=; b=pmbsJTwSJLbaui17uSjq2higyaOcT+1NEI30+Klx/IRKsQG0OTPod6B7+ssHmtunRM 6ulS74r3zMNbY44kIe7DHmNQLPHNJHDaBe88YYRhP1o/8CE80gNHov+sE+glELO211Tj yFPgxB9A04u163hYjdPPw7hDzjQjdae2zUnjxOpZp+O9yEHZRmzZZtSXcI7FF5q/k4/v EPm8Do5UCFrX4doC399ozaHKIcg/cdfR5UF3g6RLFkuIeKGx7WJxVpc7tOmh+j655EWa eQ3Gmw3yJEHj85a1KgXwdHfCLdyKMZKLOHMHyDx/Z0zadjEQY0ZWuiQQv/Nvnf0QB2im tyJw== X-Gm-Message-State: AHPjjUhApVWTGFgkThh9KIJ4SJ8V8VK4Mf/H63X05IFxlNU614KJFeRH zZDtndpS99GVmuafxX5X6LvP8tiZeuP0NWiEzso= X-Google-Smtp-Source: AOwi7QBJgfUYza7G8yMLUm0Pb8bpAXRtLWQkcgylpRQiSpp1jq55VBm92vlc3xByOL5Ue2uHjvVuHGyJnNg9Fji/kPU= X-Received: by 10.25.217.203 with SMTP id s72mr5748568lfi.68.1505318854191; Wed, 13 Sep 2017 09:07:34 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 10.179.26.6 with HTTP; Wed, 13 Sep 2017 09:07:33 -0700 (PDT) In-Reply-To: <20170913155542.GA25871@mail.michaelwlucas.com> References: <20170913155542.GA25871@mail.michaelwlucas.com> From: Alan Somers Date: Wed, 13 Sep 2017 10:07:33 -0600 X-Google-Sender-Auth: 1gUYETpApJGr2N7vQ6EMU77a8z8 Message-ID: Subject: Re: required kernel rebuilds To: "Michael W. Lucas" Cc: "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Sep 2017 16:07:37 -0000 On Wed, Sep 13, 2017 at 9:55 AM, Michael W. Lucas wrote: > Hi, > > Book research question. > > Way back in the Dark Ages of the 1990s, it wasn't uncommon to rebuild > a kernel to fix a recurring panic. You'd have to tune MAXUSERS, or > PMAP_SHPGPERPROC. > > AFAIK, dang near everything is tunable in either loader.conf or > sysctl.conf. I really want to say that kernel rebuilds for these kinds > of limits aren't needed any more. > > Does anyone have a counter-example, though? Is anything possibly > crash-inducing non-tunable without a kernel rebuild? > > Thanks, > ==ml Enabling VIMAGE still requires a kernel rebuild, because of the possibility of crashes. I've seen such crashes on 10.2 or 10.3. I don't know if they're fixed in 11.1, but I haven't seen any yet. Changing MAXPHYS also requires a kernel rebuild. AFAIK MAXPHYS isn't related to any kernel crashes, but it does have performance implications. -Alan From owner-freebsd-hackers@freebsd.org Wed Sep 13 16:30:53 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C31DFE04517 for ; Wed, 13 Sep 2017 16:30:53 +0000 (UTC) (envelope-from brooks@spindle.one-eyed-alien.net) Received: from spindle.one-eyed-alien.net (spindle.one-eyed-alien.net [199.48.129.229]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A2CB96CB78; Wed, 13 Sep 2017 16:30:53 +0000 (UTC) (envelope-from brooks@spindle.one-eyed-alien.net) Received: by spindle.one-eyed-alien.net (Postfix, from userid 3001) id 462385A9F12; Wed, 13 Sep 2017 16:30:46 +0000 (UTC) Date: Wed, 13 Sep 2017 16:30:46 +0000 From: Brooks Davis To: "Ngie Cooper (yaneurabeya)" Cc: Mariusz Zaborski , FreeBSD Hackers Subject: Re: Bypassing libcasper (for fun and profit)...? Message-ID: <20170913163046.GB89845@spindle.one-eyed-alien.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Sep 2017 16:30:53 -0000 On Mon, Sep 11, 2017 at 01:55:13PM -0700, Ngie Cooper (yaneurabeya) wrote: > Hi Mariusz, > I???m trying to get runtime coverage to work on ^/projects/runtime-coverage . I noticed that libcasper is unfortunately getting in the way of writing .gcda files via install(1) (libgcov and libprofile_rt hijack binaries and libraries to write out profiling information at runtime, which doesn???t seem to jive with libcasper). I assume install(1) won???t be the only program to have this issue, so a generic solution should probably be implemented. > I was wondering if there???s a way that you would like to design a formalized hook to bypass this, was wondering if you think conditionally compiling out the functions (as stubs) was the proper way to do it (security vs flexibility), have the the library implement weak symbols and override with another library using strong symbols as a bypass, or use a combination of LD_PRELOAD and hooks to disable libcasper/turn the calls into warnings (instead of fatal errors). I attempted to suggest solutions and then realized that I don't understand the problem. How does install interact with libgcov and libprofile_rt? -- Brooks From owner-freebsd-hackers@freebsd.org Wed Sep 13 17:24:28 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2525AE06D72 for ; Wed, 13 Sep 2017 17:24:28 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id F27476F435 for ; Wed, 13 Sep 2017 17:24:27 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mailman.ysv.freebsd.org (Postfix) id F1B61E06D6F; Wed, 13 Sep 2017 17:24:27 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F1484E06D6D for ; Wed, 13 Sep 2017 17:24:27 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-io0-x229.google.com (mail-io0-x229.google.com [IPv6:2607:f8b0:4001:c06::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B9B636F434 for ; Wed, 13 Sep 2017 17:24:27 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-io0-x229.google.com with SMTP id n69so4899768ioi.5 for ; Wed, 13 Sep 2017 10:24:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=619AVfm25gEAAsH7nF28tSbhzeJN7xLy5R1xdEFGkwU=; b=ZSbLndx/SJGlOmjj21RDcreY10UfDPYC4n1UzCrkukSYLXNRT071LTNzh9nB7G4s2F I1WnsEUil3znmxPLHMFlZNiM0KAPFL37/ELXmWwiYjpiE6Ba89ztKBdQKOr/RN6nYc7j tuOSI8WycoLvZ37QPnliOyG+wGIeRawzrB+zlFrirPmjfgA+7s5cnXQl4+ryvfRvepJZ I+y9+IgjoBOi2sfSEhYcjo5etOU83yzMpn9T/bdBjEBEdZxc2vWj2LBj10nzXrVUhZ/X WCBaixgj7CjnT4Vn9Nrlvg5v/s1UGD+jpqDxO4YOruKAYoxt6h0ebff8VUmm6dYP33AC OItQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=619AVfm25gEAAsH7nF28tSbhzeJN7xLy5R1xdEFGkwU=; b=FRKCwbLfq8K9XWZYqGrIGvoncW4Optt0CKPW1yJJYadtz8uckmeLgvguGHgy2zrQrn 9yKG/xMC2UnqZL25nryoCc54ijdAy19b1FyX7waP8zEK6WEDDb2yZS/esYYcdI48UMg8 L6BMlxTSyz2c8to8qB7XLGxxlJsvb8jkokCRlVB1C26PQ/THLp0dyd2TKPijXwExDFkm QnlEjz5LtvtwNT4d9esTL+Pf6LFB9HmQ9amiZ3bMfaz7vdIorDdjtpthEqpF+k133SCJ /9sZujg//bmrlTiTkc4nAI/AjhKV0Z0YFQNl8Knyj7h5LJ1Z3ypMedjHg2ljFUbD3i+X FiTQ== X-Gm-Message-State: AHPjjUheGdL50AVrDtJtTHDB3TXkRJGrDJeongmBoGqdhC4fpGaRFwzb EPN7RdLUV+t1d+I91UjPuc1xqm20w6iL5cfs9BsFQQ== X-Google-Smtp-Source: AOwi7QCgpbUrZPlgWqFT7C5MNDFkwl5LErY6NHP/ro1U0CfW5HVsC9a5ntgFBeQmVAdQWjY7mMCjaw9XhqIkWfkxIx4= X-Received: by 10.107.135.147 with SMTP id r19mr25478480ioi.26.1505323467166; Wed, 13 Sep 2017 10:24:27 -0700 (PDT) MIME-Version: 1.0 Sender: wlosh@bsdimp.com Received: by 10.79.10.71 with HTTP; Wed, 13 Sep 2017 10:24:26 -0700 (PDT) X-Originating-IP: [2603:300b:6:5100:60a7:d7e6:aef8:c203] In-Reply-To: References: <20170913155542.GA25871@mail.michaelwlucas.com> From: Warner Losh Date: Wed, 13 Sep 2017 11:24:26 -0600 X-Google-Sender-Auth: S9TsR6D1tkhA-hEJBv-X5bVNcMI Message-ID: Subject: Re: required kernel rebuilds To: Alan Somers Cc: "Michael W. Lucas" , "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Sep 2017 17:24:28 -0000 On Wed, Sep 13, 2017 at 10:07 AM, Alan Somers wrote: > On Wed, Sep 13, 2017 at 9:55 AM, Michael W. Lucas > wrote: > > Hi, > > > > Book research question. > > > > Way back in the Dark Ages of the 1990s, it wasn't uncommon to rebuild > > a kernel to fix a recurring panic. You'd have to tune MAXUSERS, or > > PMAP_SHPGPERPROC. > > > > AFAIK, dang near everything is tunable in either loader.conf or > > sysctl.conf. I really want to say that kernel rebuilds for these kinds > > of limits aren't needed any more. > > > > Does anyone have a counter-example, though? Is anything possibly > > crash-inducing non-tunable without a kernel rebuild? > > > > Thanks, > > ==ml > > Enabling VIMAGE still requires a kernel rebuild, because of the > possibility of crashes. I've seen such crashes on 10.2 or 10.3. I > don't know if they're fixed in 11.1, but I haven't seen any yet. > Changing MAXPHYS also requires a kernel rebuild. AFAIK MAXPHYS isn't > related to any kernel crashes, but it does have performance > implications. > MAXPHYS isn't currently dynamic. We have to set it at Netflix because we want to do super large I/Os. We'd just bump it in base, but it requires extra memory for every single I/O (since there's arrays in struct buf / bio based on this size for pages related to the I/O). I've had one kernel panic that was fixed by changing MAXPHYS, but that was due to a stupid driver bug... Warner From owner-freebsd-hackers@freebsd.org Wed Sep 13 23:26:33 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EBCC6E152AA for ; Wed, 13 Sep 2017 23:26:33 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id CF93E7F3B4 for ; Wed, 13 Sep 2017 23:26:33 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: by mailman.ysv.freebsd.org (Postfix) id CEDDDE152A7; Wed, 13 Sep 2017 23:26:33 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CE676E152A5 for ; Wed, 13 Sep 2017 23:26:33 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [78.47.246.247]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 60EB47F3B3 for ; Wed, 13 Sep 2017 23:26:32 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (root@eg.sd.rdtc.ru [62.231.161.221] (may be forged)) by hz.grosbein.net (8.15.2/8.15.2) with ESMTPS id v8DNQJBY079675 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Sep 2017 01:26:20 +0200 (CEST) (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: mwlucas@michaelwlucas.com Received: from [10.58.0.4] ([10.58.0.4]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTPS id v8DNQAkn004978 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Thu, 14 Sep 2017 06:26:10 +0700 (+07) (envelope-from eugen@grosbein.net) Subject: Re: required kernel rebuilds To: "Michael W. Lucas" , hackers@freebsd.org References: <20170913155542.GA25871@mail.michaelwlucas.com> From: Eugene Grosbein Message-ID: <59B9BE8D.5050807@grosbein.net> Date: Thu, 14 Sep 2017 06:26:05 +0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <20170913155542.GA25871@mail.michaelwlucas.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=2.2 required=5.0 tests=BAYES_00, LOCAL_FROM, RDNS_NONE autolearn=no autolearn_force=no version=3.4.1 X-Spam-Report: * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 2.6 LOCAL_FROM From my domains * 1.9 RDNS_NONE Delivered to internal network by a host with no rDNS X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on hz.grosbein.net X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Sep 2017 23:26:34 -0000 13.09.2017 22:55, Michael W. Lucas пишет: > Hi, > > Book research question. > > Way back in the Dark Ages of the 1990s, it wasn't uncommon to rebuild > a kernel to fix a recurring panic. You'd have to tune MAXUSERS, or > PMAP_SHPGPERPROC. > > AFAIK, dang near everything is tunable in either loader.conf or > sysctl.conf. I really want to say that kernel rebuilds for these kinds > of limits aren't needed any more. > > Does anyone have a counter-example, though? Is anything possibly > crash-inducing non-tunable without a kernel rebuild? Kernel stack overflow produce "double fault" panic and while i386 and amd64 platforms have loader tunnable kern.kstack_pages to increase this limit without need to rebuild the kernel, other platforms still do not have such tunnable and have kernel config file "options KSTACK_PAGES" only. See also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219476 for details. From owner-freebsd-hackers@freebsd.org Thu Sep 14 00:47:51 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7CC60E18877 for ; Thu, 14 Sep 2017 00:47:51 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-66.reflexion.net [208.70.210.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3F9B781AF4 for ; Thu, 14 Sep 2017 00:47:50 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 3235 invoked from network); 14 Sep 2017 00:53:11 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 14 Sep 2017 00:53:11 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.2) with SMTP; Wed, 13 Sep 2017 20:47:43 -0400 (EDT) Received: (qmail 13704 invoked from network); 14 Sep 2017 00:47:43 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 14 Sep 2017 00:47:43 -0000 Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 8E8B7EC7C39; Wed, 13 Sep 2017 17:47:42 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: nondebug+INVARIANTS+INVARIANT_SUPPORT sufficient to boot From: Mark Millard In-Reply-To: Date: Wed, 13 Sep 2017 17:47:41 -0700 Cc: freebsd-hackers Content-Transfer-Encoding: quoted-printable Message-Id: <6D63486A-E933-4CC2-9A24-0688BE01A0DA@dsl-only.net> References: <1C18FF04-6772-4E9C-88C5-B8D5478C5809@dsl-only.net> To: Emmanuel Vadot , freebsd-arm X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Sep 2017 00:47:51 -0000 [This time a debug kernel (including witness) and verbose booting. Also showing what has spin locks active (none) and what has critical sections mentioned on the back traces (critical_exit).] On 2017-Sep-12, at 11:16 PM, Mark Millard = wrote: > [Back to nooptions for INVARIANTS and INVARIANT_SUPPORT > but now verbose booting. taskqgroup_adjust_softirq(0)... > is the one to not get a "done." before failure.] >=20 > On 2017-Sep-12, at 7:19 PM, Mark Millard = wrote: >=20 >> I took my normal GENERIC-NODBG (that includes GENERIC) >> and changed INVARIANTS and INVARIANT_SUPPORT to have >> "options" status instead of "nooptions" status. The >> result boots (so far no counterexamples). (This is >> head -r323246 .) >>=20 >> So it appears that one or more INVARIANT tests are >> "fixing" the Pine64+ 2GB boot problem. I've no clue >> which. But other debug options are not required. >>=20 >> FYI. . . >>=20 >> # more /usr/src/sys/arm64/conf/GENERIC-NODBG = = =20 >> # >> # GENERIC -- Custom configuration for the arm64/aarch64 >> # >>=20 >> include "GENERIC" >>=20 >> ident GENERIC-NODBG >>=20 >> makeoptions DEBUG=3D-g # Build kernel with gdb(1) = debug symbols >>=20 >> options ALT_BREAK_TO_DEBUGGER >>=20 >> options KDB # Enable kernel debugger = support >>=20 >> # For minimum debugger support (stable branch) use: >> #options KDB_TRACE # Print a stack trace for a = panic >> options DDB # Enable the kernel debugger >>=20 >> # Extra stuff: >> #options VERBOSE_SYSINIT # Enable verbose sysinit = messages >> #options BOOTVERBOSE=3D1 >> #options BOOTHOWTO=3DRB_VERBOSE >> #options KTR >> #options KTR_MASK=3DKTR_TRAP >> ##options KTR_CPUMASK=3D0xF >> #options KTR_VERBOSE >>=20 >> # Disable any extra checking for. . . >> nooptions DEADLKRES # Enable the deadlock = resolver >> options INVARIANTS # Enable calls of extra = sanity checking >> options INVARIANT_SUPPORT # Extra sanity checks of = internal structures, required by INVARIANTS >> nooptions WITNESS # Enable checks to detect = deadlocks and cycles >> nooptions WITNESS_SKIPSPIN # Don't run witness on = spinlocks for speed >> nooptions DIAGNOSTIC >> nooptions MALLOC_DEBUG_MAXZONES # Separate malloc(9) zones >> nooptions BUF_TRACKING >> nooptions FULL_BUF_TRACKING >=20 > I've changed to have: >=20 > options VERBOSE_SYSINIT # Enable verbose sysinit = messages > options BOOTVERBOSE=3D1 > options BOOTHOWTO=3DRB_VERBOSE >=20 > and: >=20 > nooptions INVARIANTS # Enable calls of extra = sanity checking > nooptions INVARIANT_SUPPORT # Extra sanity checks of = internal structures, required by INVARIANTS >=20 > The tail of the verbose failing boot looks like: >=20 > . . . > vt_upgrade(&vt_consdev)... done. > subsystem b000000 > nfs_rootconf(0)... done. > fhanew_init(0)... done. > subsystem d000000 > proc0_post(0)... done. > subsystem d800000 > sctp_syscalls_init(0)... done. > selectinit(0)... done. > subsystem dffff9c > linker_preload_finish(0)... done. > subsystem e000000 > kick_init(0)... done. > kstack_cache_init(0)... done. > subsystem e400000 > vm_pageout_init(0)... done. > $x.1(&page_kp)... done. > subsystem e800000 > $x.1(&vm_kp)... done. > subsystem ea00000 > $x.1(&bufspace_kp)... done. > $x.1(&buf_kp)... done. > subsystem ec00000 > $x.1(&vnlru_kp)... done. > $x.1(&up_kp)... done. > subsystem ee00000 > acpi_acad_ac_only(0)... done. > nfsiod_setup(0)... done. > subsystem f000000 > release_aps(0)... Release APs > APs not started > done. > tmr_setup_user_access(0)... done. > intr_irq_shuffle(0)... done. > tqg_record_smp_started(0)... done. > netisr_start(0)... done. > cpuset_init(0)... done. > taskqgroup_adjust_if_config_tqg(0)... done. > identify_cpu_sysinit(0)... CPU 0: ARM Cortex-A53 r0p4 affinity: 0 > Instruction Set Attributes 0 =3D > Instruction Set Attributes 1 =3D <0> > Processor Features 0 =3D > Processor Features 1 =3D <0> > Memory Model Features 0 =3D <4k Granule,64k = Granule,MixedEndian,S/NS Mem,16bit ASID,1TB PA> > Memory Model Features 1 =3D <> > Debug Features 0 =3D <2 CTX Breakpoints,4 Watchpoints,6 = Breakpoints,PMUv3,Debug v8> > Debug Features 1 =3D <0> > Auxiliary Features 0 =3D <0> > Auxiliary Features 1 =3D <0> > CPU 1: (null) (null) r0p0 affinity: 0 > CPU 2: (null) (null) r0p0 affinity: 0 > CPU 3: (null) (null) r0p0 affinity: 0 > done. > taskqgroup_adjust_softirq(0)... x0: ffff000000a1c080 > x1: fffffd0001031a80 > x2: 3 > [ thread pid 0 tid 100055 ] > Stopped at thread_lock_flags_+0x298: ldr w4, [x3, #156] > db>=20 >=20 > taskqgroup_adjust_softirq seems to be from: >=20 > /usr/src/sys/kern/subr_gtaskqueue.c : >=20 > TASKQGROUP_DEFINE(softirq, mp_ncpus, 1); [The above was a non-debug kernel with verbose messages.] So a debug kernel with verbose boot messages: CPU 1: (null) (null) r0p0 affinity: 0 CPU 2: (null) (null) r0p0 affinity: 0 CPU 3: (null) (null) r0p0 affinity: 0 done. taskqgroup_adjust_softirq(0)... panic: acquiring blockable sleep lock = with spinlock or critical section held (sleep mutex) pmap @ = /usr/src/sys/arm64/arm64/pmap.c:4710 cpuid =3D 0 time =3D 13 Thus the non-debug kernel boot-failures stop during "taskqgroup_adjust_softirq(0)..." and that is also what the debug kernel reports via witness (or invariant testing if witness is disabled). Witness does catch the problem somewhat earlier than invariant in the code sequence (when the race happens). Without invariants (and without witness) the failure seems to happen reliably. For this witness context. . . db> show allpcpu Current CPU: 0 cpuid =3D 0 dynamic pcpu =3D 0x84af00 curthread =3D 0xfffffd0001415a80: pid 0 tid 100058 "softirq_1" curpcb =3D 0xffff000069850cb0 fpcurthread =3D none idlethread =3D 0xfffffd00005de000: tid 100003 "idle: cpu0" spin locks held: cpuid =3D 1 dynamic pcpu =3D 0x81324f00 curthread =3D none curpcb =3D 0 fpcurthread =3D none idlethread =3D 0xfffffd00005dda80: tid 100004 "idle: cpu1" spin locks held: cpuid =3D 2 dynamic pcpu =3D 0x81325f00 curthread =3D none curpcb =3D 0 fpcurthread =3D none idlethread =3D 0xfffffd00005dd540: tid 100005 "idle: cpu2" spin locks held: cpuid =3D 3 dynamic pcpu =3D 0x81326f00 curthread =3D none curpcb =3D 0 fpcurthread =3D none idlethread =3D 0xfffffd00005dd000: tid 100006 "idle: cpu3" spin locks held: So no spin locks held. As for critical sections. . . db> show allchains . . . (just ones mentioning "on a run queue"). . . chain 20: thread 100014 (pid 12, swi4: clock (0)) blocked on lock = 0xffff000000c5d8e0 (sleep mutex) "Giant" thread 100000 (pid 0, swapper) is on a run queue chain 21: thread 100002 (pid 1, kernel) blocked on lock 0xffff000000c5d8e0 (sleep = mutex) "Giant" thread 100000 (pid 0, swapper) is on a run queue . . . db> thread 100000 [ thread pid 0 tid 100000 ] 0 db> bt Tracing pid 0 tid 100000 td 0xffff000000acd580 sched_switch() at mi_switch+0x1b8 pc =3D 0xffff00000033f494 lr =3D 0xffff000000321754 sp =3D 0xffff0000000109f0 fp =3D 0xffff000000010a10 mi_switch() at critical_exit+0x84 pc =3D 0xffff000000321754 lr =3D 0xffff00000031e72c sp =3D 0xffff000000010a20 fp =3D 0xffff000000010a30 critical_exit() at spinlock_exit+0x10 pc =3D 0xffff00000031e72c lr =3D 0xffff0000005f83b4 sp =3D 0xffff000000010a40 fp =3D 0xffff000000010a50 spinlock_exit() at wakeup_one+0x30 pc =3D 0xffff0000005f83b4 lr =3D 0xffff00000032157c sp =3D 0xffff000000010a60 fp =3D 0xffff000000010a70 wakeup_one() at grouptaskqueue_enqueue+0xcc pc =3D 0xffff00000032157c lr =3D 0xffff0000003533ec sp =3D 0xffff000000010a80 fp =3D 0xffff000000010aa0 =20 grouptaskqueue_enqueue() at taskqgroup_adjust+0x83c pc =3D 0xffff0000003533ec lr =3D 0xffff00000035483c sp =3D 0xffff000000010ab0 fp =3D 0xffff000000010b40 taskqgroup_adjust() at mi_startup+0x254 pc =3D 0xffff00000035483c lr =3D 0xffff0000002b5704 sp =3D 0xffff000000010b50 fp =3D 0xffff000000010bb0 mi_startup() at virtdone+0x54 pc =3D 0xffff0000002b5704 lr =3D 0xffff000000001084 sp =3D 0xffff000000010bc0 fp =3D 0x0000000000000000 From: db> show threads . . . (just ones mentioning critical_exit). . . 100027 (0xfffffd000062e000) (stack 0xffff00006a58a000) 100033 = (0xfffffd0000796000) (stack 0xffff00006a5a9000) 100034 = (0xfffffd0000795a80) (stack 0xffff00006a5b6000) 100003 = (0xfffffd00005de000) (stack 0xffff000081baa000) sched_switch() at = mi_switch+0x1b8 pc =3D 0xffff00000033f494 lr =3D 0xffff000000321754 sp =3D 0xffff000081bada20 fp =3D 0xffff000081bada40 mi_switch() at critical_exit+0x84 pc =3D 0xffff000000321754 lr =3D 0xffff00000031e72c sp =3D 0xffff000081bada50 fp =3D 0xffff000081bada60 critical_exit() at cpu_idle+0x3c pc =3D 0xffff00000031e72c lr =3D 0xffff0000005f8308 sp =3D 0xffff000081bada70 fp =3D 0xffff000081bada80 cpu_idle() at sched_idletd+0xf4 pc =3D 0xffff0000005f8308 lr =3D 0xffff000000341b84 sp =3D 0xffff000081bada90 fp =3D 0xffff000081badb50 sched_idletd() at fork_exit+0x7c pc =3D 0xffff000000341b84 lr =3D 0xffff0000002dbe74 sp =3D 0xffff000081badb60 fp =3D 0xffff000081badb90 fork_exit() at fork_trampoline+0x10 pc =3D 0xffff0000002dbe74 lr =3D 0xffff000000608664 sp =3D 0xffff000081badba0 fp =3D 0x0000000000000000 . . . I did not find any other references to "critical". Only swapper listed on the run queue as far as critical_exit goes. The other critical_exit's were from cpu_idle. It appears to me as fairly likely that what witness and invariant reports sometimes is the same thing that is involved for the non-debug kernels run into (more) reliably: non-debug is likely hanging on the lock attempt while (it appears that) a critical section is still active. As near as I can tell some invariant logic makes the critical section vs. blockable lock conflict far less likely to happen: some form of race. Thus the invariant-only and full-debug kernels usually booting, but not always booting. (But I make no claim to be expert in these areas.) =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Thu Sep 14 03:45:39 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CDA4DE22ED0 for ; Thu, 14 Sep 2017 03:45:39 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-66.reflexion.net [208.70.210.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8C22D381D for ; Thu, 14 Sep 2017 03:45:39 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 25475 invoked from network); 14 Sep 2017 03:45:37 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 14 Sep 2017 03:45:37 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.2) with SMTP; Wed, 13 Sep 2017 23:45:37 -0400 (EDT) Received: (qmail 29784 invoked from network); 14 Sep 2017 03:45:37 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 14 Sep 2017 03:45:37 -0000 Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id D8CF2EC91CC; Wed, 13 Sep 2017 20:45:36 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: A question on possible A64 (Pine64+ 2GB) aarch64 blocked_lock misuse. . . Message-Id: <91EBBD4A-DC93-44E0-A3DD-0DCECD5BB93C@dsl-only.net> Date: Wed, 13 Sep 2017 20:45:36 -0700 To: freebsd-arm , freebsd-hackers X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Sep 2017 03:45:39 -0000 I've been trying to gather evidence for why for some times head hangs up or panics on Pine64+ 2GB's (and other A64's?) during: taskqgroup_adjust_softirq(0)... in the following contexts: A) non-debug kernel build (no witness, no invariants): hang, possibly always (I've never seen a boot get past that point). B) debug kernel build (witness and invariants): sometimes gets: panic: acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) pmap @ = /usr/src/sys/arm64/arm64/pmap.c:4710 C) debug kernel build (invariants but no witness): sometimes gets a kassert failure Exploring this is appears that in all cases of explicitly reported failure there is something like (witness example): . . . kassert_panic() at witness_checkorder+0x160 pc =3D 0xffff0000003174e4 lr =3D 0xffff000000374990 sp =3D 0xffff0000698503f0 fp =3D 0xffff000069850470 witness_checkorder() at __mtx_lock_flags+0xa8 pc =3D 0xffff000000374990 lr =3D 0xffff0000002f8b7c sp =3D 0xffff000069850480 fp =3D 0xffff0000698504b0 __mtx_lock_flags() at pmap_fault+0x40 pc =3D 0xffff0000002f8b7c lr =3D 0xffff000000606994 sp =3D 0xffff0000698504c0 fp =3D 0xffff0000698504e0 pmap_fault() at data_abort+0xb8 pc =3D 0xffff000000606994 lr =3D 0xffff000000608a9c sp =3D 0xffff0000698504f0 fp =3D 0xffff0000698505a0 data_abort() at do_el1h_sync+0xfc pc =3D 0xffff000000608a9c lr =3D 0xffff0000006088f0 sp =3D 0xffff0000698505b0 fp =3D 0xffff0000698505e0 . . . with the thread in question having the status of "blocked lock" (so blocked_lock in use): db> show thread 100058 Thread 100058 at 0xfffffd0001415a80: proc (pid 0): 0xffff000000c5db88 name: softirq_1 stack: 0xffff00006984d000-0xffff000069850fff flags: 0x4010004 pflags: 0x200000 state: RUNQ priority: 24 container lock: blocked lock (0xffff000000c73e30) last voluntary switch: 245 ms ago The Question: Should pmap_fault's lock activity be possible while blocked_lock is in use for the thread's container lock? FYI: The call chain leading to that status shows: do_el1h_sync() at handle_el1h_sync+0x74 pc =3D 0xffff0000006088f0 lr =3D 0xffff0000005f1874 sp =3D 0xffff0000698505f0 fp =3D 0xffff000069850700 handle_el1h_sync() at sched_switch+0x2a8 pc =3D 0xffff0000005f1874 lr =3D 0xffff00000033f0c8 sp =3D 0xffff000069850710 fp =3D 0xffff0000698507f0 sched_switch() at mi_switch+0x1b8 pc =3D 0xffff00000033f0c8 lr =3D 0xffff00000032161c sp =3D 0xffff000069850800 fp =3D 0xffff000069850820 mi_switch() at taskqgroup_binder+0x7c pc =3D 0xffff00000032161c lr =3D 0xffff00000035510c sp =3D 0xffff000069850830 fp =3D 0xffff000069850860 taskqgroup_binder() at gtaskqueue_run_locked+0x104 pc =3D 0xffff00000035510c lr =3D 0xffff000000354f74 sp =3D 0xffff000069850870 fp =3D 0xffff0000698508e0 gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c pc =3D 0xffff000000354f74 lr =3D 0xffff000000354d10 sp =3D 0xffff0000698508f0 fp =3D 0xffff000069850910 gtaskqueue_thread_loop() at fork_exit+0x7c pc =3D 0xffff000000354d10 lr =3D 0xffff0000002dbd3c sp =3D 0xffff000069850920 fp =3D 0xffff000069850950 fork_exit() at fork_trampoline+0x10 pc =3D 0xffff0000002dbd3c lr =3D 0xffff000000608664 sp =3D 0xffff000069850960 fp =3D 0x0000000000000000 Apparently sched_switch did one of the last 2 cases of: if (TD_IS_IDLETHREAD(td)) { . . . } else if (TD_IS_RUNNING(td)) { MPASS(td->td_lock =3D=3D TDQ_LOCKPTR(tdq)); srqflag =3D preempted ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; #ifdef SMP if (THREAD_CAN_MIGRATE(td) && !THREAD_CAN_SCHED(td, = ts->ts_cpu)) ts->ts_cpu =3D sched_pickcpu(td, 0); #endif if (ts->ts_cpu =3D=3D cpuid) tdq_runq_add(tdq, td, srqflag); else { KASSERT(THREAD_CAN_MIGRATE(td) || (ts->ts_flags & TSF_BOUND) !=3D 0, ("Thread %p shouldn't migrate", td)); mtx =3D sched_switch_migrate(tdq, td, srqflag); } } else { /* This thread must be going to sleep. */ TDQ_LOCK(tdq); mtx =3D thread_lock_block(td); tdq_load_rem(tdq, td); } where sched_switch_migrate also also does thread_lock_block : static struct mtx * sched_switch_migrate(struct tdq *tdq, struct thread *td, int flags) { struct tdq *tdn; =20 tdn =3D TDQ_CPU(td_get_sched(td)->ts_cpu); #ifdef SMP tdq_load_rem(tdq, td); /* * Do the lock dance required to avoid LOR. We grab an extra * spinlock nesting to prevent preemption while we're * not holding either run-queue lock. */ spinlock_enter(); thread_lock_block(td); /* This releases the lock on tdq. */ =20 /* * Acquire both run-queue locks before placing the thread on the = new * run-queue to avoid deadlocks created by placing a thread with = a * blocked lock on the run-queue of a remote processor. The = deadlock * occurs when a third processor attempts to lock the two queues = in * question while the target processor is spinning with its own * run-queue lock held while waiting for the blocked lock to = clear. */ tdq_lock_pair(tdn, tdq); tdq_add(tdn, td, flags); tdq_notify(tdn, td); TDQ_UNLOCK(tdn); spinlock_exit(); #endif return (TDQ_LOCKPTR(tdn)); } (I have not checked for inlining so I allow for it above.) There have been past discussions such as: https://lists.freebsd.org/pipermail/freebsd-arm/2016-January/013120.html that have notes like (from before a fix to an inappropriate indirection for blocked_lock that was later changed): > > cpu_switch() already does what you describe though in a slightly > different > > way. The thread_lock() of a thread being switched out is set to > blocked_lock. > > cpu_switch() on the new CPU will always spin until cpu_switch = updates > > thread_lock of the old thread to point to the proper runq lock after > saving > > its state in the pcb. arm64 does this here: > > > > /* > > * Release the old thread. This doesn't need to be a > store-release > > * as the above dsb instruction will provide release = semantics. > > */ > > str x2, [x0, #TD_LOCK] > > #if defined(SCHED_ULE) && defined(SMP) > > /* Read the value in blocked_lock */ > > ldr x0, =3D_C_LABEL(blocked_lock) > > ldr x2, [x0] > > 1: > > ldar x3, [x1, #TD_LOCK] > > cmp x3, x2 > > b.eq 1b > > #endif > > > > Note the thread_lock_block() call just above the block you noted = from > > sched_switch_migrate() to see where td_lock is set to &blocked_lock. > > > > If the comment about 'dsb' above is wrong that might explain why you = see > > stale state in the PCB after seeing the new value of td_lock. > > > > -- > > John Baldwin Unfortunately I've no hint what causes the race condition debug kernel builds (invariants and possibly witness) get that leads to the variable behavior. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Thu Sep 14 07:23:41 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 59261E04EB4 for ; Thu, 14 Sep 2017 07:23:41 +0000 (UTC) (envelope-from auryn@zirakzigil.org) Received: from mx1.etoilesoft.fr (mx1.etoilesoft.fr [52.57.51.18]) by mx1.freebsd.org (Postfix) with ESMTP id 177236822B for ; Thu, 14 Sep 2017 07:23:40 +0000 (UTC) (envelope-from auryn@zirakzigil.org) Received: from mx1.etoilesoft.fr (localhost [127.0.0.1]) by mx1.etoilesoft.fr (Postfix) with ESMTP id 1F2499D16F; Thu, 14 Sep 2017 07:24:37 +0000 (UTC) Received: from [192.168.43.15] (localhost [127.0.0.1]) (Authenticated sender: auryn@zirakzigil.org) by mx1.etoilesoft.fr (Postfix) with ESMTPA id 507779C946; Thu, 14 Sep 2017 07:24:35 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: devd in jail From: Giulio Ferro X-Mailer: iPhone Mail (14G60) In-Reply-To: <3236AD55-0D14-49A5-B5B9-3147A216D8A5@zirakzigil.org> Date: Thu, 14 Sep 2017 09:23:31 +0200 Cc: freebsd-hackers@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <20170810225439.Horde.1s8Qi_dlNtxgEigsNKbdrer@webmail.leidinger.net> <4a1a99a5-35ea-19c9-7ac8-77875ac6f71f@zirakzigil.org> <20170905151537.Horde.10cHNOX1OVri7mGaUcDeX1l@webmail.leidinger.net> <7ca865ee-b613-2f0c-daf0-d828884b5e74@zirakzigil.org> <1C181EF2-B8B1-4F42-BF80-ABEA0593DD43@dsl-only.net> <20170906122556.Horde.5OdDwtii7HXPNArY77YUyBi@webmail.leidinger.net> <20170906221947.Horde.RITHvdc1wVE9v0-3nBavR0Z@webmail.leidinger.net> <20170909150335.Horde.wBLIPwBuhV3lyQlBxKud39f@webmail.leidinger.net> <27e72cfb-54cf-4af8-b569-85fff089c45f@zirakzigil.org> <20170911161253.Horde.vawLu00EtbbHOVeJRXjp7N0@webmail.leidinger.net> <3236AD55-0D14-49A5-B5B9-3147A216D8A5@zirakzigil.org> To: Alexander Leidinger X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Sep 2017 07:23:41 -0000 Hi Alexander and all, Do you have any idea why putting the new patches parameter in jail.conf fail= s and how it can be solved? I can do test on this machine as you wish. Thanks=20 Giulio=20 > Il giorno 12 set 2017, alle ore 09:50, Giulio Ferro = ha scritto: >=20 > Hi Alexander, >=20 > I don't know how to set your parameter in old style jails in rc.conf >=20 > In there, new parameters are mapped to old style jail_xxx entries, but the= re are none for your parameter unless I'm mistaken. >=20 > Can you please tell me exactly what I should put in rc.conf? I've already m= oved the jail.conf file >=20 > Thanks. >=20 > Giulio=20 >=20 >> Il giorno 11 set 2017, alle ore 16:12, Alexander Leidinger ha scritto: >>=20 >> Quoting Giulio Ferro (from Mon, 11 Sep 2017 06:42:= 01 +0200): >>=20 >>>> On 09/09/2017 15:03, Alexander Leidinger wrote: >>>> Please run this: >>>> strings /boot/kernel/kernel| grep allow.kmem >>>>=20 >>>> If it doesn't print out "allow.kmem_access", then your kernel doesn't c= ontain the patch. >>>>=20 >>>> Bye, >>>> Alexander. >>>>=20 >>>=20 >>> # strings /boot/kernel/kernel | grep allow.kmem >>> allow.kmem_access >>>=20 >>> So it seems the kernel is ok... >>>=20 >>>=20 >>> Maybe I can set this value at boot in /boot/loader.conf? >>=20 >> No, this is not a loader.conf setting. >>=20 >> Can you try to use "old style" jail config =3D settings in rc.conf instea= d of using jail.conf on a test system? It may be that you need to move away t= he jail.conf temporary, I haven't checked what takes precedence when both (r= c.conf and jail.conf) settings are there. >>=20 >> Bye, >> Alexander. >>=20 >> --=20 >> http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF >> http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF >=20 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"= From owner-freebsd-hackers@freebsd.org Fri Sep 15 04:14:59 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BE5FBE23523 for ; Fri, 15 Sep 2017 04:14:59 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-66.reflexion.net [208.70.210.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 789B076C71 for ; Fri, 15 Sep 2017 04:14:59 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 20436 invoked from network); 15 Sep 2017 04:14:52 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 15 Sep 2017 04:14:52 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.2) with SMTP; Fri, 15 Sep 2017 00:14:52 -0400 (EDT) Received: (qmail 19639 invoked from network); 15 Sep 2017 04:14:52 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 15 Sep 2017 04:14:52 -0000 Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 90F37EC91CC; Thu, 14 Sep 2017 21:14:51 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: nondebug+INVARIANTS+INVARIANT_SUPPORT sufficient to boot Date: Thu, 14 Sep 2017 21:14:50 -0700 References: <1C18FF04-6772-4E9C-88C5-B8D5478C5809@dsl-only.net> <6D63486A-E933-4CC2-9A24-0688BE01A0DA@dsl-only.net> To: Emmanuel Vadot , freebsd-arm , freebsd-hackers In-Reply-To: <6D63486A-E933-4CC2-9A24-0688BE01A0DA@dsl-only.net> Message-Id: <8E15A747-3413-4537-9ECA-5EDAD1285351@dsl-only.net> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Sep 2017 04:14:59 -0000 [I've traced the failure back to the bad pointer value that is in place when it was put to use. I omit the prior details of earlier explorations that got me into this area.] Summary of the following investigations: When the witness or invariant failure during: taskqgroup_adjust_softirq(0)... happens it traces back to the condition: pcpu_find(cpu)->pc_curthread =3D=3D NULL in the code: ctd =3D pcpu_find(cpu)->pc_curthread; for cpu =3D=3D 1 in tdq_notify (but inlined). It then attempts to evaluate: ctd->td_priority which gets a data_abort but it is during when blocked_lock is the container lock for the thread. In the witness included case this leads to: panic: acquiring blockable sleep lock with spinlock or critical section = held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710 cpuid =3D 0 time =3D 13 but that is a later consequence of the earlier problem. I'm not sure why pcpu_find(cpu)->pc_curthread is still NULL but since the behavior is intermittent for debug kernel builds it suggests a race for an update that was initiated but not always finished in time. (I've had occasions of hours of reboots to try to get a failure but mostly only needing a few. I've not run into a long sequence of failures to boot for a debug kernel but have had some back-to-back ones.) Supporting detail that lead to the above: int pmap_fault(pmap_t pmap, uint64_t esr, uint64_t far) The far (fault address register) argument to pmap_fault is the=20 rd one (x2 below): ffff000000606954 stp x22, x21, [sp, #-48]! ffff000000606958 stp x20, x19, [sp, #16] ffff00000060695c stp x29, x30, [sp, #32] ffff000000606960 add x29, sp, #0x20 ffff000000606964 mov x20, x2 ffff000000606968 mov x22, x1 ffff00000060696c mov x21, x0 For the failing call sequence far ends up stored on the stack via the x20 save-to-stack in: ffff0000002f8c0c <__mtx_lock_flags> stp x24, x23, [sp, #-64]! ffff0000002f8c10 <__mtx_lock_flags+0x4> stp x22, x21, [sp, #16] ffff0000002f8c14 <__mtx_lock_flags+0x8> stp x20, x19, [sp, #32] ffff0000002f8c18 <__mtx_lock_flags+0xc> stp x29, x30, [sp, #48] ffff0000002f8c1c <__mtx_lock_flags+0x10> add x29, sp, #0x30 Stack segment with a little context: 0xffff000069850470: ffff0000698504b0 ffff0000002f8b80 0xffff000069850480: ffff000000c6a528 0 0xffff000069850490: 96000004 ffff000000c6a658 0xffff0000698504a0: 37e ffff000000c6a670 0xffff0000698504b0: ffff0000698504e0 ffff000000606998 So it appears: pmap_fault`esr =3D=3D 0x96000004 pmap_fault`pmap =3D=3D 0xffff000000c6a658 (vmspace0+0x130) pmap_fault`far =3D=3D 0x37e I'll note that 0x37e =3D 894 so it matches up with x8 =3D=3D 0x0 for the likes of: elr 0xffff00000033f0dc sched_switch+0x2bc . . . ssched_switch+0x2b8: ldrb w9, [x8, #894] matching: db> show reg . . . x8 0 . . . So apparently sched_switch tried to access 0x37e db> x/gx 0xffff000000606998 =20 pmap_fault+0x44: f100111f927e0ec8 Part of the back trace is (for the example debug kernel): kassert_panic() at witness_checkorder+0x160 pc =3D 0xffff0000003174e4 lr =3D 0xffff000000374990 sp =3D 0xffff0000698503f0 fp =3D 0xffff000069850470 witness_checkorder() at __mtx_lock_flags+0xa8 pc =3D 0xffff000000374990 lr =3D 0xffff0000002f8b7c sp =3D 0xffff000069850480 fp =3D 0xffff0000698504b0 __mtx_lock_flags() at pmap_fault+0x40 pc =3D 0xffff0000002f8b7c lr =3D 0xffff000000606994 sp =3D 0xffff0000698504c0 fp =3D 0xffff0000698504e0 =20 pmap_fault() at data_abort+0xb8 pc =3D 0xffff000000606994 lr =3D 0xffff000000608a9c sp =3D 0xffff0000698504f0 fp =3D 0xffff0000698505a0 data_abort() at do_el1h_sync+0xfc pc =3D 0xffff000000608a9c lr =3D 0xffff0000006088f0 sp =3D 0xffff0000698505b0 fp =3D 0xffff0000698505e0 do_el1h_sync() at handle_el1h_sync+0x74 pc =3D 0xffff0000006088f0 lr =3D 0xffff0000005f1874 sp =3D 0xffff0000698505f0 fp =3D 0xffff000069850700 handle_el1h_sync() at sched_switch+0x2a8 pc =3D 0xffff0000005f1874 lr =3D 0xffff00000033f0c8 sp =3D 0xffff000069850710 fp =3D 0xffff0000698507f0 sched_switch() at mi_switch+0x1b8 pc =3D 0xffff00000033f0c8 lr =3D 0xffff00000032161c sp =3D 0xffff000069850800 fp =3D 0xffff000069850820 mi_switch() at taskqgroup_binder+0x7c pc =3D 0xffff00000032161c lr =3D 0xffff00000035510c sp =3D 0xffff000069850830 fp =3D 0xffff000069850860 taskqgroup_binder() at gtaskqueue_run_locked+0x104 pc =3D 0xffff00000035510c lr =3D 0xffff000000354f74 sp =3D 0xffff000069850870 fp =3D 0xffff0000698508e0 =20 gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c pc =3D 0xffff000000354f74 lr =3D 0xffff000000354d10 sp =3D 0xffff0000698508f0 fp =3D 0xffff000069850910 gtaskqueue_thread_loop() at fork_exit+0x7c pc =3D 0xffff000000354d10 lr =3D 0xffff0000002dbd3c sp =3D 0xffff000069850920 fp =3D 0xffff000069850950 fork_exit() at fork_trampoline+0x10 pc =3D 0xffff0000002dbd3c lr =3D 0xffff000000608664 sp =3D 0xffff000069850960 fp =3D 0x0000000000000000 Note: ffff00000033f0bc ldr w0, [x19, #1292] ffff00000033f0c0 ldrb w27, [x19, #894] ffff00000033f0c4 str w0, [sp, #12] ffff00000033f0c8 bl ffff000000359708 = ffff00000033f0cc ldr x8, [x0] ffff00000033f0d0 mov w11, w27 ffff00000033f0d4 adrp x27, ffff000000c85000 = ffff00000033f0d8 ldrb w9, [x8, #894] ffff00000033f0dc cmp w11, w9 ffff00000033f0e0 b.cs ffff00000033f150 = // b.hs, b.nlast This is code for the later part of what is shown below: static void tdq_notify(struct tdq *tdq, struct thread *td) { struct thread *ctd; int pri; int cpu; if (tdq->tdq_ipipending) return; cpu =3D td_get_sched(td)->ts_cpu; pri =3D td->td_priority; ctd =3D pcpu_find(cpu)->pc_curthread; if (!sched_shouldpreempt(pri, ctd->td_priority, 1)) return; . . . } (Where: sched_shouldpreempt has been inlined and some of it is interlaced.) The failing [x8, #894] is the attempt to access: ctd->td_priority In other words: ctd =3D=3D NULL resulted from the pcpu_find (i.e., x8 =3D=3D 0 ). As for how it got to be zero: db> show reg spsr 0x9600000440000085 x0 0xffff000000ac1000 __pcpu+0x200 . . . db> x/gx cpuid_to_pcpu,32 cpuid_to_pcpu: ffff000000ac0e00 ffff000000ac1000 cpuid_to_pcpu+0x10: ffff000000ac1200 ffff000000ac1400 cpuid_to_pcpu+0x20: 0 0 . . . (So cpu =3D=3D 1 .) db> x/gx 0xffff000000ac1000,8 __pcpu+0x200: 0 fffffd00005dda80 __pcpu+0x210: 0 0 __pcpu+0x220: 0 0 __pcpu+0x230: 100000000 ffff000000ac1200 Thus it seems that at the time for cpu=3D=3D1 : pcpu_find(cpu)->pc_curthread =3D=3D NULL (at __pcpu+0x200). =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Sat Sep 16 22:17:35 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3D912E0C35F for ; Sat, 16 Sep 2017 22:17:35 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-66.reflexion.net [208.70.210.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F3A2165947 for ; Sat, 16 Sep 2017 22:17:34 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 15349 invoked from network); 16 Sep 2017 22:17:27 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 16 Sep 2017 22:17:27 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.3) with SMTP; Sat, 16 Sep 2017 18:17:27 -0400 (EDT) Received: (qmail 6091 invoked from network); 16 Sep 2017 22:17:27 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 16 Sep 2017 22:17:27 -0000 Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 78A6DEC770C; Sat, 16 Sep 2017 15:17:26 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: "APs not started" for failure cases only, possible missing atomic_load_acq_int's? Date: Sat, 16 Sep 2017 15:17:25 -0700 References: <1C18FF04-6772-4E9C-88C5-B8D5478C5809@dsl-only.net> <6D63486A-E933-4CC2-9A24-0688BE01A0DA@dsl-only.net> <8E15A747-3413-4537-9ECA-5EDAD1285351@dsl-only.net> To: Emmanuel Vadot , freebsd-arm , freebsd-hackers In-Reply-To: <8E15A747-3413-4537-9ECA-5EDAD1285351@dsl-only.net> Message-Id: <256CF612-1D52-4BCC-981B-E476F6EEC9AB@dsl-only.net> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Sep 2017 22:17:35 -0000 A new finding: When verbose boot messages are enabled there is an earlier contrast between when booting works overall vs. when it later fails: When it works: subsystem f000000 release_aps(0)... Release APs done. When it fails:=20 subsystem f000000 release_aps(0)... Release APs APs not started done. And it well explains why ->pc_curthread ends up NULL for secondaries (in particular cpu =3D=3D 1), init_secondary had never executed the assignments show below:=20 while (!aps_ready) __asm __volatile("wfe"); /* Initialize curthread */ KASSERT(PCPU_GET(idlethread) !=3D NULL, ("no idle thread")); pcpup->pc_curthread =3D pcpup->pc_idlethread; pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; The subsystem messages are from: static void release_aps(void *dummy __unused) { =20 int i; =20 /* Only release CPUs if they exist */ if (mp_ncpus =3D=3D 1) return; intr_pic_ipi_setup(IPI_AST, "ast", ipi_ast, NULL); intr_pic_ipi_setup(IPI_PREEMPT, "preempt", ipi_preempt, NULL); intr_pic_ipi_setup(IPI_RENDEZVOUS, "rendezvous", ipi_rendezvous, = NULL); intr_pic_ipi_setup(IPI_STOP, "stop", ipi_stop, NULL); intr_pic_ipi_setup(IPI_STOP_HARD, "stop hard", ipi_stop, NULL); intr_pic_ipi_setup(IPI_HARDCLOCK, "hardclock", ipi_hardclock, = NULL); atomic_store_rel_int(&aps_ready, 1); /* Wake up the other CPUs */ __asm __volatile("sev"); printf("Release APs\n"); for (i =3D 0; i < 2000; i++) { if (smp_started) return; DELAY(1000); } =20 printf("APs not started\n"); } =20 SYSINIT(start_aps, SI_SUB_SMP, SI_ORDER_FIRST, release_aps, NULL); init_secondary has an example or two of not using atomic_load_acq_int when atomic_store_rel_int is in use. One is: while (!aps_ready) __asm __volatile("wfe"); /* Initialize curthread */ KASSERT(PCPU_GET(idlethread) !=3D NULL, ("no idle thread")); pcpup->pc_curthread =3D pcpup->pc_idlethread; pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; where aps_ready was declared via: /* Set to 1 once we're ready to let the APs out of the pen. */ volatile int aps_ready =3D 0; where release_aps has the use of atomic_store_rel_int: atomic_store_rel_int(&aps_ready, 1); /* Wake up the other CPUs */ __asm __volatile("sev"); There is also in init_secondary: atomic_add_rel_32(&smp_cpus, 1); if (smp_cpus =3D=3D mp_ncpus) { /* enable IPI's, tlb shootdown, freezes etc */ atomic_store_rel_int(&smp_started, 1); } where smp_cpus is accessed without being explicitly atomic. mp_ncpus seems to have no atomic use at all. Where: /usr/src/sys/sys/smp.h:extern int smp_cpus; /usr/src/sys/kern/subr_smp.c:int smp_cpus =3D 1; /* how many cpu's = running */ So no "volatile", unlike the earlier example. /usr/src/sys/kern/kern_umtx.c: if (smp_cpus > 1) { /usr/src/sys/kern/subr_smp.c:SYSCTL_INT(_kern_smp, OID_AUTO, cpus, = CTLFLAG_RD|CTLFLAG_CAPRD, &smp_cpus, 0, /usr/src/sys/sys/smp.h:extern int mp_ncpus; /usr/src/sys/kern/subr_smp.c:int mp_ncpus; The smp_started is not explicitly accessed as atomic in release_aps but in init_secondary has its update to 1 via: mtx_lock_spin(&ap_boot_mtx); atomic_add_rel_32(&smp_cpus, 1); if (smp_cpus =3D=3D mp_ncpus) { /* enable IPI's, tlb shootdown, freezes etc */ atomic_store_rel_int(&smp_started, 1); } mtx_unlock_spin(&ap_boot_mtx); where: /usr/src/sys/sys/smp.h:extern volatile int smp_started; /usr/src/sys/kern/subr_smp.c:volatile int smp_started; ("volatile" again for this context.) I'll also note that for the sparc64 architecture there is some code like: if (__predict_false(atomic_load_acq_int(&smp_started) =3D=3D 0)) that is explicitly matched to the atomic_store_rel_int in its mp_machdep.c . I do not have enough background aarch64 knowledge to know if it is provable that atomic_load_acq_int is not needed in some of these cases. But getting "APs not started" at least sometimes suggests an intermittent failure of the code as it is. Another difference is lack of explicit initialization of smp_started but explicit initialization of aps_ready and smp_cpus . I have no clue if the boot sequence is supposed to handle "APs not started" by reverting to not being a symmetric multiprocessing boot or some other specific way instead of trying to avoiding use of what was not initialized by: pcpup->pc_curthread =3D pcpup->pc_idlethread; pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; in init_secondary. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-hackers@freebsd.org Sat Sep 16 23:08:09 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0A8AAE0EE37 for ; Sat, 16 Sep 2017 23:08:09 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-66.reflexion.net [208.70.210.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C0E2F66ED5 for ; Sat, 16 Sep 2017 23:08:08 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 23660 invoked from network); 16 Sep 2017 23:08:07 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 16 Sep 2017 23:08:07 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.3) with SMTP; Sat, 16 Sep 2017 19:08:07 -0400 (EDT) Received: (qmail 3174 invoked from network); 16 Sep 2017 23:08:07 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 16 Sep 2017 23:08:07 -0000 Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id AA922EC770C; Sat, 16 Sep 2017 16:08:06 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: "APs not started" for failure cases only, possible missing atomic_load_acq_int's? Date: Sat, 16 Sep 2017 16:08:06 -0700 References: <1C18FF04-6772-4E9C-88C5-B8D5478C5809@dsl-only.net> <6D63486A-E933-4CC2-9A24-0688BE01A0DA@dsl-only.net> <8E15A747-3413-4537-9ECA-5EDAD1285351@dsl-only.net> <256CF612-1D52-4BCC-981B-E476F6EEC9AB@dsl-only.net> To: Emmanuel Vadot , freebsd-arm , freebsd-hackers In-Reply-To: <256CF612-1D52-4BCC-981B-E476F6EEC9AB@dsl-only.net> Message-Id: <2FC8A531-5E8F-4765-A1F3-A8D6A6AA0C14@dsl-only.net> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Sep 2017 23:08:09 -0000 [Adding a couple of numbers that help interpret what I found as far as what specifically did not work as expected.] On 2017-Sep-16, at 3:17 PM, Mark Millard wrote: > A new finding: >=20 > When verbose boot messages are enabled > there is an earlier contrast between when > booting works overall vs. when it later > fails: >=20 > When it works: >=20 > subsystem f000000 > release_aps(0)... Release APs > done. >=20 > When it fails:=20 >=20 > subsystem f000000 > release_aps(0)... Release APs > APs not started > done. >=20 > And it well explains why ->pc_curthread > ends up NULL for secondaries (in particular > cpu =3D=3D 1), init_secondary had never executed > the assignments show below:=20 >=20 > while (!aps_ready) > __asm __volatile("wfe"); >=20 > /* Initialize curthread */ > KASSERT(PCPU_GET(idlethread) !=3D NULL, ("no idle thread")); > pcpup->pc_curthread =3D pcpup->pc_idlethread; > pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; >=20 > The subsystem messages are from: >=20 > static void > release_aps(void *dummy __unused) > { =20 > int i; >=20 > /* Only release CPUs if they exist */ > if (mp_ncpus =3D=3D 1) > return; >=20 > intr_pic_ipi_setup(IPI_AST, "ast", ipi_ast, NULL); > intr_pic_ipi_setup(IPI_PREEMPT, "preempt", ipi_preempt, NULL); > intr_pic_ipi_setup(IPI_RENDEZVOUS, "rendezvous", = ipi_rendezvous, NULL); > intr_pic_ipi_setup(IPI_STOP, "stop", ipi_stop, NULL); > intr_pic_ipi_setup(IPI_STOP_HARD, "stop hard", ipi_stop, NULL); > intr_pic_ipi_setup(IPI_HARDCLOCK, "hardclock", ipi_hardclock, = NULL); >=20 > atomic_store_rel_int(&aps_ready, 1); > /* Wake up the other CPUs */ > __asm __volatile("sev"); >=20 > printf("Release APs\n"); >=20 > for (i =3D 0; i < 2000; i++) { > if (smp_started) > return; > DELAY(1000); > } >=20 > printf("APs not started\n"); > } =20 > SYSINIT(start_aps, SI_SUB_SMP, SI_ORDER_FIRST, release_aps, NULL); >=20 >=20 > init_secondary has an example or two of not using > atomic_load_acq_int when atomic_store_rel_int is in > use. One is: >=20 > while (!aps_ready) > __asm __volatile("wfe"); >=20 > /* Initialize curthread */ > KASSERT(PCPU_GET(idlethread) !=3D NULL, ("no idle thread")); > pcpup->pc_curthread =3D pcpup->pc_idlethread; > pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; >=20 > where aps_ready was declared via: >=20 > /* Set to 1 once we're ready to let the APs out of the pen. */ > volatile int aps_ready =3D 0; >=20 > where release_aps has the use of atomic_store_rel_int: >=20 > atomic_store_rel_int(&aps_ready, 1); > /* Wake up the other CPUs */ > __asm __volatile("sev"); >=20 > There is also in init_secondary: >=20 > atomic_add_rel_32(&smp_cpus, 1); >=20 > if (smp_cpus =3D=3D mp_ncpus) { > /* enable IPI's, tlb shootdown, freezes etc */ > atomic_store_rel_int(&smp_started, 1); > } >=20 > where smp_cpus is accessed without being explicitly > atomic. mp_ncpus seems to have no atomic use at all. >=20 > Where: >=20 > /usr/src/sys/sys/smp.h:extern int smp_cpus; > /usr/src/sys/kern/subr_smp.c:int smp_cpus =3D 1; /* how many cpu's = running */ >=20 > So no "volatile", unlike the earlier example. >=20 > /usr/src/sys/kern/kern_umtx.c: if (smp_cpus > 1) { > /usr/src/sys/kern/subr_smp.c:SYSCTL_INT(_kern_smp, OID_AUTO, cpus, = CTLFLAG_RD|CTLFLAG_CAPRD, &smp_cpus, 0, >=20 > /usr/src/sys/sys/smp.h:extern int mp_ncpus; > /usr/src/sys/kern/subr_smp.c:int mp_ncpus; >=20 >=20 > The smp_started is not explicitly accessed as atomic > in release_aps but in init_secondary has its update > to 1 via: >=20 > mtx_lock_spin(&ap_boot_mtx); >=20 > atomic_add_rel_32(&smp_cpus, 1); >=20 > if (smp_cpus =3D=3D mp_ncpus) { > /* enable IPI's, tlb shootdown, freezes etc */ > atomic_store_rel_int(&smp_started, 1); > } >=20 > mtx_unlock_spin(&ap_boot_mtx); >=20 > where: >=20 > /usr/src/sys/sys/smp.h:extern volatile int smp_started; > /usr/src/sys/kern/subr_smp.c:volatile int smp_started; >=20 > ("volatile" again for this context.) >=20 > I'll also note that for the sparc64 architecture > there is some code like: >=20 > if (__predict_false(atomic_load_acq_int(&smp_started) =3D=3D 0)) >=20 > that is explicitly matched to the atomic_store_rel_int > in its mp_machdep.c . >=20 > I do not have enough background aarch64 knowledge to > know if it is provable that atomic_load_acq_int is not > needed in some of these cases. >=20 > But getting "APs not started" at least sometimes > suggests an intermittent failure of the code as > it is. >=20 >=20 > Another difference is lack of explicit initialization > of smp_started but explicit initialization of aps_ready > and smp_cpus . >=20 >=20 >=20 > I have no clue if the boot sequence is supposed to > handle "APs not started" by reverting to not being > a symmetric multiprocessing boot or some other > specific way instead of trying to avoiding use of > what was not initialized by: >=20 > pcpup->pc_curthread =3D pcpup->pc_idlethread; > pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; >=20 > in init_secondary. Example mp_ncpus and smp_cpus figures from a failed Pine64+ 2GB boot: db> print/x *smp_cpus 1 db> print/x *mp_ncpus 138800000004 But that should be a 4 byte width. Showing some context for reference: db> x/bx mp_ncpus-4,4=20 rebooting: 0 0 0 0 db> x/bx mp_ncpus,4 mp_ncpus: 4 0 0 0 db> x/bx mp_ncpus+4,4=20 scsi_delay: 88 13 0 0 For completeness: db> x/bx smp_cpus-4,4 sysctl___kern_smp_disabled+0x5c: 0 0 0 0 db> x/bx smp_cpus,4 smp_cpus: 1 0 0 0 db> x/bx smp_cpus+4,4=20 smp_cpus+0x4: 0 0 0 0 So smp_cpus was not incremented in memory. This goes along with no occurances of: pcpup->pc_curthread =3D pcpup->pc_idlethread; pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; updates happening in init_secondary: /* Spin until the BSP releases the APs */ while (!aps_ready) __asm __volatile("wfe"); =20 /* Initialize curthread */ KASSERT(PCPU_GET(idlethread) !=3D NULL, ("no idle thread")); pcpup->pc_curthread =3D pcpup->pc_idlethread; pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; . . . mtx_lock_spin(&ap_boot_mtx); atomic_add_rel_32(&smp_cpus, 1); if (smp_cpus =3D=3D mp_ncpus) { /* enable IPI's, tlb shootdown, freezes etc */ atomic_store_rel_int(&smp_started, 1); } mtx_unlock_spin(&ap_boot_mtx); Which seems to imply that the aps_ready update: atomic_store_rel_int(&aps_ready, 1); /* Wake up the other CPUs */ __asm __volatile("sev"); in release_aps was not seen in the: while (!aps_ready) __asm __volatile("wfe"); in init_secondary. My guess is that "while (!aps_ready)" needs to be explicit about its atomic status. aps_ready is already volatile but apparently that is not enough for this context to be reliable. The other potential needs for explicit atomics are for later execution but may be required overall as well. =3D=3D=3D Mark Millard markmi at dsl-only.net