From owner-freebsd-hackers@freebsd.org Sun Dec 4 02:59:12 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 354CBC663AB for ; Sun, 4 Dec 2016 02:59:12 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from mail.metricspace.net (207-172-209-83.c3-0.arl-ubr1.sbo-arl.ma.static.cable.rcn.com [207.172.209.83]) by mx1.freebsd.org (Postfix) with ESMTP id 06ACA1EAF for ; Sun, 4 Dec 2016 02:59:11 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from [IPv6:2001:470:1f11:617:3210:b3ff:fe77:ca3f] (unknown [IPv6:2001:470:1f11:617:3210:b3ff:fe77:ca3f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: eric) by mail.metricspace.net (Postfix) with ESMTPSA id 8BA941071; Sun, 4 Dec 2016 02:59:05 +0000 (UTC) Subject: Re: CFT EFI Boot Refactoring To: Ben Woods References: <675cb468-f599-a31b-a82c-c0f892136cfc@metricspace.net> From: Eric McCorkle Cc: freebsd-hackers@FreeBSD.org Message-ID: Date: Sat, 3 Dec 2016 21:59:00 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="arQd7PRswDidkEC78EiEHoTv6NEWfSimg" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Dec 2016 02:59:12 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --arQd7PRswDidkEC78EiEHoTv6NEWfSimg Content-Type: multipart/mixed; boundary="Pi0td10lcdGV92dIcHJvWDKh4mWPcrDfu"; protected-headers="v1" From: Eric McCorkle To: Ben Woods Cc: freebsd-hackers@FreeBSD.org Message-ID: Subject: Re: CFT EFI Boot Refactoring References: <675cb468-f599-a31b-a82c-c0f892136cfc@metricspace.net> In-Reply-To: --Pi0td10lcdGV92dIcHJvWDKh4mWPcrDfu Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable (Re-adding -hackers, to document the diagnosis) You're freezing at the same point as where tsoome saw it freeze. Basically, what's happening at that point in the code is it's probing all device handles, trying to attach filesystem drivers. Hanging here means that it's crashing somewhere in that process. The boot loader is going to hit every single partition and try all the filesystem drivers until one works. It will bail on the ESP, since that will have an EFI_SIMPLE_FILESYSTEM_PROTOCOL driver courtesy of the EFI firmware. Here's the diagnostic information I have so far: * My setup is a positive example (works): two disks, disk 1 has ESP and freebsd-zfs, disk 2 has a partition that's held both an msdosfs and a UFS (it was originally a swap partition that I've been creating various filesystems on to test the boot process), a zfs intent log, and a zfs l2arc cache. All ZFS stuff is part of the same pool, which holds the root fs. * Your setup is a negative example (fails). * tsoome's setup is also a negative example: "The VM has 4 disks, illumos mirror zfs on first 2, third has fbsd, 4th (for this test) has EFI system partition, and zfs and ufs partitions, zfs partition has bootfs with /boot directory." You don't have a UFS filesystem anywhere, so we can rule that out. It might be tsoome's bug, and it might just be that the bug is sporadic, which would explain why I'm not seeing it on my setup with a dosfs. The only other obvious commonality between you and tsoome that doesn't overlap my setup is multiple ZFS datasets, or ZFS data vdevs (mirrors, stripes, etc) spread across multiple disks (my setup only has a log and a cache on the ssd). Here's what I'll do. I'll create an "extra_logging" branch off of efize_new in my github repo, wherein I'll add a bunch of extra logging into the detection process. It ought to be enough to print out device paths and filesystem drivers just before it tries them. On 12/03/2016 21:06, Ben Woods wrote: > Hi Eric, >=20 > That is correct. Each drive has an EFI partition, which uses msdosfs. M= y > FreeBSD drive does not have a UFS partition, as I solely use ZFS on roo= t. >=20 > One thing to note is that in the process of testing your efize_new, I > did not touch my freebsd-boot partition, and I am not even sure if it i= s > being used by the loader. My /boot lives in my ZFS zroot partition, and= > freebsd-boot is not mounted in anyway. Could this have an impact on the= > testing? >=20 > Regards, > Ben >=20 > -- > From: Benjamin Woods > woodsb02@gmail.com >=20 > On 4 December 2016 at 06:06, Eric McCorkle > wrote: >=20 > So, you have no UFS partitions, but you do have an MSDOSFS partitio= n on > each drive? >=20 > If so, that lends support to tsoome's suspicion that it may be a bu= g > he's working on. >=20 --Pi0td10lcdGV92dIcHJvWDKh4mWPcrDfu-- --arQd7PRswDidkEC78EiEHoTv6NEWfSimg Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iF4EARYIAAYFAlhDhnQACgkQVsKIQKqABI3z/wEA4e19ke22SBFsjW9/MfQiPqjK 603q38nV9BwYw3jcHNwA/0yIXrqQvxflkqbWa38VOzHMcL+Gsu8p+Zo/IID2bcsB =GjBu -----END PGP SIGNATURE----- --arQd7PRswDidkEC78EiEHoTv6NEWfSimg-- From owner-freebsd-hackers@freebsd.org Sun Dec 4 18:32:34 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8DD69C66646 for ; Sun, 4 Dec 2016 18:32:34 +0000 (UTC) (envelope-from embaudarm@gmail.com) Received: from mail-qk0-x230.google.com (mail-qk0-x230.google.com [IPv6:2607:f8b0:400d:c09::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 508E110D1 for ; Sun, 4 Dec 2016 18:32:34 +0000 (UTC) (envelope-from embaudarm@gmail.com) Received: by mail-qk0-x230.google.com with SMTP id n204so326659743qke.2 for ; Sun, 04 Dec 2016 10:32:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=XL/Rce53lYqLgNsuEiZSzP3a4UQ6QK/nZCHtlmJkWRY=; b=t5ho+T9Cp9LSiq21Eu5+70PH/ICjh1pgaEIe4aKq2/xiddh9NUKHZjhDBKXZrmfN2p sdsHC4XYFuU5I5IP850RHf6nu5vqR6rukZhCH0x4kSdqOr74orn6oNbKUwGAH12LecHO EMuxOZJNoee9OUzQvywwgFB6dqnBy789dU20g2EPUoGfX8KCporFha/vf3CR4AD0IXWT 2kHu1ErQMT9H5SlFnAL0u/eIjAmiA4d0ORxyuhlXBmG+DgIE9vt7dyDxvZ3Vzk3mhqrx s+Cn2G12Vau+kUR5KWzOU7PrdOZ/X+cGP0v3ed7VH8K2n/B4YJfecf58yPOXmF4+McqA q3Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=XL/Rce53lYqLgNsuEiZSzP3a4UQ6QK/nZCHtlmJkWRY=; b=cz3t3OxZ5RosYCrZGlcM5rZd3ZuDdKCyCg3YHbtqrGQDPDw/ajSmPIaQm88X78l0uw e41T8Fp3aOrrwnIJdL0yxLArhKXlkg7E6cv5aDq0Z9+2lilFX08kFt7A0PFnaEdFeyO8 gFnk0faOc7TuRU9SYiw7mYaLO4ny3ueb3mlSJwJuLInj3euYzGQ3A1Yz4BVagy6lE1yo 5TChOJTjd5Hn3Rx7CspOWhmwrVMxUtE0jNvFsQNbWKsrNfbyJb3JtwuXWJLT7L78KmP+ 1aO15565GidSMraa2uTjgzhQipt7YNslvuffjFsIWgyDN8d00qa9rrGwZzUwpyekl4sj zmJg== X-Gm-Message-State: AKaTC01mUleoJUwcQDa3hZoLjbBVrnQEBbGcIEgshaeJ8daE9B/91ggxFsLyNSC6HfRQbhHg1RX6p6MGlx6fMw== X-Received: by 10.55.154.205 with SMTP id c196mr44264469qke.25.1480876353163; Sun, 04 Dec 2016 10:32:33 -0800 (PST) MIME-Version: 1.0 Received: by 10.237.54.225 with HTTP; Sun, 4 Dec 2016 10:32:32 -0800 (PST) From: Lee D Date: Sun, 4 Dec 2016 13:32:32 -0500 Message-ID: Subject: Please help me understand "Translation Fault" in custom device drivers, and how to debug To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Dec 2016 18:32:34 -0000 Hello, I need help understanding what a translation fault is, and how to debug it. I have googled like crazy but can't seem to find any detailed information. I am working on an embedded system using an ARM processor, and consequently am writing a bunch of device device drivers for my custom hardware. I am having a problem with occasional crashes when kldload'ing my modules in a boot script. I get various errors, including "Translation Fault" (L1 or L2), "Alignment Fault", "vm_fault", and "undefined instruction in kernel". My code works 95% of the time though. I never see any crashes while running, so I don't think this is a flaky hardware problem. Any suggestions on what kernel debugger commands to enter to gather information would also be helpful. Here are the commands I am currently recording the output of when I get a crash: db> bt db> ps db> show intr db> show proc 618 db> show allpcpu db> show allrman db> show intrcnt db> show proc db> show procvm For a single concrete example, here is a backtrace of a device driver that failed with a translation fault on kldload. This BT is unique in that it actually seems to contain useful information. Most of the backtraces just show some abort/exeception related calls and then say "Unable to unwind into user space" (paraphrased), leaving me no info about where my crash happened. FreeBSD 10.3 Thanks, Lee db> bt Tracing pid 622 tid 100079 td 0xc2d68000 db_trace_self() at db_trace_self pc = 0xc057a1e4 lr = 0xc0137c68 (db_stack_trace+0x108) sp = 0xde966670 fp = 0xde966688 r10 = 0xc074b240 db_stack_trace() at db_stack_trace+0x108 pc = 0xc0137c68 lr = 0xc013760c (db_command+0x294) sp = 0xde966690 fp = 0xde966730 r4 = 0x00000000 r5 = 0x00000000 r6 = 0x00000000 db_command() at db_command+0x294 pc = 0xc013760c lr = 0xc0137364 (db_command_loop+0x78) sp = 0xde966738 fp = 0xde966748 r4 = 0xc05c7ed4 r5 = 0xc05dd87c r6 = 0xc074b22c r7 = 0xde966978 r8 = 0x00000001 r9 = 0xc0673520 r10 = 0xc0740f44 db_command_loop() at db_command_loop+0x78 pc = 0xc0137364 lr = 0xc0139e6c (db_trap+0x108) sp = 0xde966750 fp = 0xde966870 r4 = 0x00000000 r5 = 0xc074b238 r6 = 0xc0740f70 db_trap() at db_trap+0x108 pc = 0xc0139e6c lr = 0xc02ec8f8 (kdb_trap+0x188) sp = 0xde966878 fp = 0xde966898 r4 = 0x00000000 r5 = 0x00000017 r6 = 0xc0740f70 r7 = 0xde966978 kdb_trap() at kdb_trap+0x188 pc = 0xc02ec8f8 lr = 0xc05919ec (abort_fatal+0x1d4) sp = 0xde9668a0 fp = 0xde9668b8 r4 = 0xde966978 r5 = 0x00000013 r6 = 0x00000004 r7 = 0x00000007 r8 = 0x00000017 r9 = 0x00000004 r10 = 0x00000000 abort_fatal() at abort_fatal+0x1d4 pc = 0xc05919ec lr = 0xc0591818 (abort_fatal) sp = 0xde9668c0 fp = 0xde966970 r4 = 0xde966978 r5 = 0x00000007 r6 = 0x00000013 r7 = 0x00000017 r8 = 0x00000000 abort_fatal() at abort_fatal pc = 0xc0591818 lr = 0xc057bf20 (exception_exit) sp = 0xde966978 fp = 0xde966a00 r4 = 0x00000000 r5 = 0x00000000 r6 = 0x00000000 r7 = 0xc2643440 r8 = 0xffffffec exception_exit() at exception_exit pc = 0xc057bf20 lr = 0xc02866c0 (free+0xc0) sp = 0xde9669c8 fp = 0xde966a00 r0 = 0x00000000 r1 = 0x00000001 r2 = 0xffffffec r3 = 0x00000000 r4 = 0xc26b2900 r5 = 0xc0740d50 r6 = 0x00000000 r7 = 0x00000000 r8 = 0x00000000 r9 = 0xc2643440 r10 = 0xffffffec r12 = 0x00000002 device_probe_child() at device_probe_child+0x298 pc = 0xc02e1110 lr = 0xc02e1d00 (device_probe+0x40) sp = 0xde966a08 fp = 0xde966a18 r4 = 0xc26b2900 r5 = 0xffffffff r6 = 0x00000000 r7 = 0xc26b2d00 r8 = 0xc06869f8 r9 = 0xc0692ec0 r10 = 0x00000000 device_probe() at device_probe+0x40 pc = 0xc02e1d00 lr = 0xc02e389c (bus_generic_driver_added+0x88) sp = 0xde966a20 fp = 0xde966a28 r4 = 0xc26b2900 r5 = 0xc2e2ff14 r6 = 0x00000000 bus_generic_driver_added() at bus_generic_driver_added+0x88 pc = 0xc02e389c lr = 0xc02e02a0 (devclass_driver_added+0x80) sp = 0xde966a30 fp = 0xde966a48 r4 = 0xc2e2ff14 r5 = 0xc2643440 devclass_driver_added() at devclass_driver_added+0x80 pc = 0xc02e02a0 lr = 0xc02e0208 (devclass_add_driver+0x12c) sp = 0xde966a50 fp = 0xde966a70 r4 = 0xc2e2ff14 r5 = 0xc2e2ff90 r6 = 0x7fffffff r7 = 0xc274d520 r8 = 0xc2643440 devclass_add_driver() at devclass_add_driver+0x12c pc = 0xc02e0208 lr = 0xc02e5224 (driver_module_handler+0x1ec) sp = 0xde966a78 fp = 0xde966a98 r4 = 0xc2e2fefc r5 = 0xc0692340 r6 = 0xc2c7fd00 r7 = 0x00000000 r8 = 0xc074cbac r9 = 0xc2c7fd00 r10 = 0xc2643440 driver_module_handler() at driver_module_handler+0x1ec pc = 0xc02e5224 lr = 0xc0289a8c (module_register_init+0x1fc) sp = 0xde966aa0 fp = 0xde966ad0 r4 = 0xc074cb80 r5 = 0xc0692340 r6 = 0xc2c7fd00 r7 = 0xc2e27970 r8 = 0xc074cbac r9 = 0xc0730ea8 r10 = 0xc2e2fec0 module_register_init() at module_register_init+0x1fc pc = 0xc0289a8c lr = 0xc027b430 (linker_load_module+0xc78) sp = 0xde966ad8 fp = 0xde966d38 r4 = 0xc074cbac r5 = 0xc0692340 r6 = 0xc072f9e0 r7 = 0xc2e27d7c r8 = 0xc2c7fd00 r9 = 0xc274d8c0 r10 = 0xc072f9b0 linker_load_module() at linker_load_module+0xc78 pc = 0xc027b430 lr = 0xc027d398 (kern_kldload+0x128) sp = 0xde966d40 fp = 0xde966d70 r4 = 0xde966d78 r5 = 0x00000000 r6 = 0xc26d5800 r7 = 0x00000001 r8 = 0xc072f9b0 r9 = 0xc072f9e0 r10 = 0x00000000 kern_kldload() at kern_kldload+0x128 pc = 0xc027d398 lr = 0xc027d508 (sys_kldload+0x64) sp = 0xde966d78 fp = 0xde966d88 r4 = 0xc2d68000 r5 = 0xc26d5800 r6 = 0x00000000 r7 = 0x00000000 r8 = 0xde966df0 r9 = 0xc2daa670 sys_kldload() at sys_kldload+0x64 pc = 0xc027d508 lr = 0xc05908fc (swi_handler+0x5e8) sp = 0xde966d90 fp = 0xde966e48 r4 = 0xc2d68000 r5 = 0xde966e50 r6 = 0xbffffe58 swi_handler() at swi_handler+0x5e8 pc = 0xc05908fc lr = 0xc057beb0 (swi_exit) sp = 0xde966e50 fp = 0xbffffe18 r4 = 0xbfffff42 r5 = 0x00000000 r6 = 0xbffffe58 r7 = 0x00000130 r8 = 0x00000000 r9 = 0xbffff9dc r10 = 0x00000000 swi_exit() at swi_exit pc = 0xc057beb0 lr = 0xc057beb0 (swi_exit) sp = 0xde966e50 fp = 0xbffffe18 Unable to unwind further From owner-freebsd-hackers@freebsd.org Sun Dec 4 19:13:44 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89CD5C671C9 for ; Sun, 4 Dec 2016 19:13:44 +0000 (UTC) (envelope-from gonzo@id.bluezbox.com) Received: from id.bluezbox.com (id.bluezbox.com [45.55.20.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5B10D10D for ; Sun, 4 Dec 2016 19:13:44 +0000 (UTC) (envelope-from gonzo@id.bluezbox.com) Received: from [136.179.10.143] (helo=[10.140.230.85]) by id.bluezbox.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.87 (FreeBSD)) (envelope-from ) id 1cDcED-0009fM-Ua; Sun, 04 Dec 2016 11:13:38 -0800 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.1 \(3251\)) Subject: Re: Please help me understand "Translation Fault" in custom device drivers, and how to debug From: Oleksandr Tymoshenko In-Reply-To: Date: Sun, 4 Dec 2016 11:13:06 -0800 Cc: freebsd-hackers@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <85666618-B6A5-4577-86B9-914DEDE84ACD@bluezbox.com> References: To: Lee D X-Mailer: Apple Mail (2.3251) Sender: gonzo@id.bluezbox.com X-Spam-Level: -- X-Spam-Report: Spam detection software, running on the system "id.bluezbox.com", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see The administrator of that system for details. Content preview: > On Dec 4, 2016, at 10:32 AM, Lee D wrote: > > Hello, > > I need help understanding what a translation fault is, and how to debug > it. I have googled like crazy but can't seem to find any detailed > information. > > I am working on an embedded system using an ARM processor, and consequently > am writing a bunch of device device drivers for my custom hardware. > > I am having a problem with occasional crashes when kldload'ing my modules > in a boot script. I get various errors, including "Translation Fault" (L1 > or L2), "Alignment Fault", "vm_fault", and "undefined instruction in > kernel". My code works 95% of the time though. > > I never see any crashes while running, so I don't think this is a flaky > hardware problem. > > Any suggestions on what kernel debugger commands to enter to gather > information would also be helpful. Here are the commands I am currently > recording the output of when I get a crash: > > db> bt > db> ps > db> show intr > db> show proc 618 > db> show allpcpu > db> show allrman > db> show intrcnt > db> show proc > db> show procvm > > For a single concrete example, here is a backtrace of a device driver that > failed with a translation fault on kldload. This BT is unique in that it > actually seems to contain useful information. Most of the backtraces just > show some abort/exeception related calls and then say "Unable to unwind > into user space" (paraphrased), leaving me no info about where my crash > happened. > > FreeBSD 10.3 [...] Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Dec 2016 19:13:44 -0000 > On Dec 4, 2016, at 10:32 AM, Lee D wrote: >=20 > Hello, >=20 > I need help understanding what a translation fault is, and how to = debug > it. I have googled like crazy but can't seem to find any detailed > information. >=20 > I am working on an embedded system using an ARM processor, and = consequently > am writing a bunch of device device drivers for my custom hardware. >=20 > I am having a problem with occasional crashes when kldload'ing my = modules > in a boot script. I get various errors, including "Translation Fault" = (L1 > or L2), "Alignment Fault", "vm_fault", and "undefined instruction in > kernel". My code works 95% of the time though. >=20 > I never see any crashes while running, so I don't think this is a = flaky > hardware problem. >=20 > Any suggestions on what kernel debugger commands to enter to gather > information would also be helpful. Here are the commands I am = currently > recording the output of when I get a crash: >=20 > db> bt > db> ps > db> show intr > db> show proc 618 > db> show allpcpu > db> show allrman > db> show intrcnt > db> show proc > db> show procvm >=20 > For a single concrete example, here is a backtrace of a device driver = that > failed with a translation fault on kldload. This BT is unique in that = it > actually seems to contain useful information. Most of the backtraces = just > show some abort/exeception related calls and then say "Unable to = unwind > into user space" (paraphrased), leaving me no info about where my = crash > happened. >=20 > FreeBSD 10.3 Hi Lee, Random crashes during kldload sounds like missing or incomplete icache sync to me. You can take a look at icache-realted fixes in HEAD=E2=80=99s = sys/arm and try to backport them to 10.3.=20= From owner-freebsd-hackers@freebsd.org Sun Dec 4 19:53:23 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CA4AAC67C2F for ; Sun, 4 Dec 2016 19:53:23 +0000 (UTC) (envelope-from embaudarm@gmail.com) Received: from mail-qk0-x22a.google.com (mail-qk0-x22a.google.com [IPv6:2607:f8b0:400d:c09::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 852BC16EB for ; Sun, 4 Dec 2016 19:53:23 +0000 (UTC) (envelope-from embaudarm@gmail.com) Received: by mail-qk0-x22a.google.com with SMTP id x190so327715129qkb.0 for ; Sun, 04 Dec 2016 11:53:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=80nEEhl/bMk9Xp6uUDfSXn9YyJaOgkD0vkFVRV/FVFo=; b=XttLWLFdSoKLZifrZk48aZSkYflG69RF1CuBDkeSGD++z1iH76QqQO9CLJsdew95/C CENEHAdKnJ+8ZvTETYyqMrc4C26tS6sf2AgdigTWhFNNlC9Kxvn84FjkzQoBshTWq3Sb yRXlVHdC1oS087jTxzaqvbQifzJtLS3sKVME8PCHsW4LuniiIcxIg3cWDYwxz3chjYPF /FtYldCucwzudxRBb0M3rK9OwYtgFZTos5ihKg1DZuE/BY17ZOIRJkzW8y6j2FZVYlj3 n/a++D5el9yu6uah+nHbcw6W4Pun8EbeJt/nqAsjENCvKU6GxLr2ioI8SOWwuRwVGjoX J/JA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=80nEEhl/bMk9Xp6uUDfSXn9YyJaOgkD0vkFVRV/FVFo=; b=ieUtAxQkUYyiBqSEbDu+OfvFRcsa+3IwVnJEzZLYs+5dRAh4FTrtMq7FYHVuE1w/Kg Ct4GY/f8Gel8VQQJ/c4XAGQ5tHvui1yDxoCo0+z7Ge9smNR4jBI9RFLjH6rlu+rZHMnH q9ZNxpv2beCp0MB6NxliGWVRBpkYqzSv3jSMxM8HOtJsZj8ERJ3FQy9Pz9Jtr3bJesKO 4X5aeyJ/bg7P9a7et7gSJY2hGaYrpwBaisn934q6kDZgw0SiPDEnF5wMn9A6eo2hcfrn UZjkPNQuhpJ36WlRnktB/NAeoicLbJuyZ4SqOqNFDj3gBIzqAV/8Opyq28hBQvVj1sco ireg== X-Gm-Message-State: AKaTC01VDwI3ouJNUja7RTKzRo9Q0kIPy5fSYlyCbIv3G3toNZRPW3LJPzHnlxJZEowGCiki+qMnobH4m1yp8w== X-Received: by 10.55.163.134 with SMTP id m128mr53074034qke.180.1480881202534; Sun, 04 Dec 2016 11:53:22 -0800 (PST) MIME-Version: 1.0 Received: by 10.237.54.225 with HTTP; Sun, 4 Dec 2016 11:52:42 -0800 (PST) In-Reply-To: <85666618-B6A5-4577-86B9-914DEDE84ACD@bluezbox.com> References: <85666618-B6A5-4577-86B9-914DEDE84ACD@bluezbox.com> From: Lee D Date: Sun, 4 Dec 2016 14:52:42 -0500 Message-ID: Subject: Re: Please help me understand "Translation Fault" in custom device drivers, and how to debug To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Dec 2016 19:53:23 -0000 On Sun, Dec 4, 2016 at 2:13 PM, Oleksandr Tymoshenko wrote: > > > On Dec 4, 2016, at 10:32 AM, Lee D wrote: > > > > Hello, > > > > I need help understanding what a translation fault is, and how to debug > > it. I have googled like crazy but can't seem to find any detailed > > information. > > > > I am working on an embedded system using an ARM processor, and > consequently > > am writing a bunch of device device drivers for my custom hardware. > > > > I am having a problem with occasional crashes when kldload'ing my modul= es > > in a boot script. I get various errors, including "Translation Fault" > (L1 > > or L2), "Alignment Fault", "vm_fault", and "undefined instruction in > > kernel". My code works 95% of the time though. > > > > I never see any crashes while running, so I don't think this is a flaky > > hardware problem. > > > > Any suggestions on what kernel debugger commands to enter to gather > > information would also be helpful. Here are the commands I am currentl= y > > recording the output of when I get a crash: > > > > db> bt > > db> ps > > db> show intr > > db> show proc 618 > > db> show allpcpu > > db> show allrman > > db> show intrcnt > > db> show proc > > db> show procvm > > > > For a single concrete example, here is a backtrace of a device driver > that > > failed with a translation fault on kldload. This BT is unique in that = it > > actually seems to contain useful information. Most of the backtraces > just > > show some abort/exeception related calls and then say "Unable to unwind > > into user space" (paraphrased), leaving me no info about where my crash > > happened. > > > > FreeBSD 10.3 > > Hi Lee, > > Random crashes during kldload sounds like missing or incomplete icache > sync to me. You can take a look at icache-realted fixes in HEAD=E2=80=99s= sys/arm > and try to backport them to 10.3. Oleksandr, Thanks, I will take a look. Maybe moving to 11.0 is the best thing to do. But I'm only seeing crashes in a couple my modules, not all of them and not anything I didn't write. And (seemingly) only when they are started from a script in /etc/rc,d/ at boot time. Clearly I've messed something up. Lee From owner-freebsd-hackers@freebsd.org Mon Dec 5 14:31:31 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7D04CC67F7B for ; Mon, 5 Dec 2016 14:31:31 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from mail.metricspace.net (mail.metricspace.net [IPv6:2001:470:1f11:617::107]) by mx1.freebsd.org (Postfix) with ESMTP id 4CDEB1C4C for ; Mon, 5 Dec 2016 14:31:31 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from [IPv6:2001:470:1f11:617:3210:b3ff:fe77:ca3f] (unknown [IPv6:2001:470:1f11:617:3210:b3ff:fe77:ca3f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: eric) by mail.metricspace.net (Postfix) with ESMTPSA id 8492A1355; Mon, 5 Dec 2016 14:31:30 +0000 (UTC) Subject: Re: CFT EFI Boot Refactoring To: Ben Woods References: <675cb468-f599-a31b-a82c-c0f892136cfc@metricspace.net> Cc: freebsd-hackers@freebsd.org From: Eric McCorkle Message-ID: <6a6e96bf-3380-a28b-622f-5d6777b4afd6@metricspace.net> Date: Mon, 5 Dec 2016 09:31:25 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="P3NHmWQv9U0nrpJKO1AeXWXqU83OhhxdP" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Dec 2016 14:31:31 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --P3NHmWQv9U0nrpJKO1AeXWXqU83OhhxdP Content-Type: multipart/mixed; boundary="Q2Bp1fUFFeUdNbmsOTOMuiTa0bOqUQ8bi"; protected-headers="v1" From: Eric McCorkle To: Ben Woods Cc: freebsd-hackers@freebsd.org Message-ID: <6a6e96bf-3380-a28b-622f-5d6777b4afd6@metricspace.net> Subject: Re: CFT EFI Boot Refactoring References: <675cb468-f599-a31b-a82c-c0f892136cfc@metricspace.net> In-Reply-To: --Q2Bp1fUFFeUdNbmsOTOMuiTa0bOqUQ8bi Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/05/2016 08:43, Ben Woods wrote: > Ok, I have tested with the extra_logging branch, and can report that th= e > text on the screen at the time of the hang was: >=20 >>> FreeBSD EFI boot block > Loader path: /boot/loader.efi >=20 > Initializing modules: FS BackendProbing all handles for ZFS > Done > Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85bff18 > Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85bfc18 > Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85bf998 > Probing all filesystems > Probing dosfs > _ >=20 That would strongly suggest that tsoome's bug is the actual cause. >=20 > Following that, I tried a combination of the old (unmodified) > BOOTX64.EFI with the new (modified) files in /boot/. > This produced the following output on the screen before it also hung: >=20 >>> FreeBSD EFI boot block > Loader path: /boot/loader.efi >=20 > Initializing modules: ZFS UFS > Probing 10 block devices............* done > ZFS found the following pools: zroot > UFS found no paritions > Consoles: EFI console > Probing all handles for ZFS > Done > Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85c9718 > Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85c9418 > Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85c9198 > Probing all filesystems > Probing dosfs > _ > That's to be expected. The new loader.efi and boot1.efi both use the same framework for drivers now. >=20 > Hope this is helpful for debugging this issue. Note that once again I > did NOT run any command to update my freebsd-boot partition (such as > gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada1). Is this > required? That stuff is only for BIOS boot. EFI doesn't use it. On a pure EFI system, you don't even need a freebsd-boot partition. This actually makes a lot of sense. The new EFI code tries to bind filesystem interfaces to everything, whereas the old code just probed and went with the first thing it found. So it never would have attempted to use the dosfs driver, so the bug that was lurking there wouldn't have been turned up. Your freebsd-boot partition is a partition, but it doesn't have an fs. So it's going to try everything and fail, so it would definitely turn up any existing bugs in filesystem drivers. This also explains why I couldn't reproduce it: my UFS partition got detected at UFS, and the msdosfs partition got detected by the UEFI firmware, so it would have already had an EFI_SIMPLE_FILESYSTEM interface by the time that the boot loader ran. This raises another question, though: since the UEFI spec guarantees that a driver exists which will bind an EFI_SIMPLE_FILESYSTEM_INTERFACE to any FAT32 filesystem, do we really need our own dosfs driver in the EFI boot loader? I would suspect not. --Q2Bp1fUFFeUdNbmsOTOMuiTa0bOqUQ8bi-- --P3NHmWQv9U0nrpJKO1AeXWXqU83OhhxdP Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iF4EARYIAAYFAlhFekIACgkQVsKIQKqABI2dDQEA/y7sWuj+D2g4PsuMg8DeME/6 d0auZW1hVGEjrCaQmt4BAM32lHMWmp3mjtAb9m+1jlX5/WHC+YkduSa7FAka/VUD =ar2n -----END PGP SIGNATURE----- --P3NHmWQv9U0nrpJKO1AeXWXqU83OhhxdP-- From owner-freebsd-hackers@freebsd.org Mon Dec 5 13:43:05 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3F622C66F52 for ; Mon, 5 Dec 2016 13:43:05 +0000 (UTC) (envelope-from woodsb02@gmail.com) Received: from mail-io0-x22b.google.com (mail-io0-x22b.google.com [IPv6:2607:f8b0:4001:c06::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 05F551E43 for ; Mon, 5 Dec 2016 13:43:05 +0000 (UTC) (envelope-from woodsb02@gmail.com) Received: by mail-io0-x22b.google.com with SMTP id j65so596039680iof.0 for ; Mon, 05 Dec 2016 05:43:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=yMNJwq65JsVow6xXWZvAtjKzeMfYxsooZPXLq8pTjRE=; b=mAcKHYoqC6UIBLqd0WqwOAQJNKI1aTgr+zrRYN+6jjurwB/uMO2YC6oyQIBB9sxEfQ i+51G9m/6PxTQnQ3LuaV7zl3YS0VCYJCdzyn9hHI52AAHgJ6sdidIgPbfvPHIJQCBkH7 RJce68gZ+og8BVexdpeS7Z+alaQJSfTqE249R2v08yNVmPEhMBbNZSobntWCGqGKCF/Q 5o8+otWn6lTTQuryYZKocLpDWtib4BU4D7cyO2rr9y9BvPPwTpShRWMAnP5rlTasLJ8n 5IU47M3PyQpCvjYVMTyau+wGLEFZ4F+cULh8ZMJoRBcciFIiMK7kqjp5TzsOk0gLpURo JX0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=yMNJwq65JsVow6xXWZvAtjKzeMfYxsooZPXLq8pTjRE=; b=O0t4cHX8Hn/onVZzby9VM1SJSc67ETdnRODo9kYwIsJTlxSVHM0PlbVjAqfDw7XFwJ yx5joeE0bSQ8Q2S6dlYOuPVRnCnBxJgXCu7xopQW4oe6OtwFF6lu9oUFTRZFptCWwBkk zCPM3WO9k8I7dHYqqZtiN7wKysbq124qyjXinRA/1MYfdqXAO4V0K14uycfQvxh7743r JdkQ0CH7UKhowtdiiXTuUcci3H6bOmPl/6zAiEVmDh5q/igMnFh+VSQlURdOmHOG8LJr KchNbLyyLYVRjkNR62yvF3fukwD71g7vTaV73TrXymQV091e6x9nsSsI8rAQzuAxMcI0 IjJA== X-Gm-Message-State: AKaTC00YB5No89ae9oav3alyVyfJkeysWmKNB/1f36yeVJgPejYa9Gy2r1kwdhtgPIQ5kIzO0wfSNIUwVBxxCQ== X-Received: by 10.107.57.131 with SMTP id g125mr18457410ioa.108.1480945384327; Mon, 05 Dec 2016 05:43:04 -0800 (PST) MIME-Version: 1.0 Received: by 10.79.136.197 with HTTP; Mon, 5 Dec 2016 05:43:03 -0800 (PST) In-Reply-To: References: <675cb468-f599-a31b-a82c-c0f892136cfc@metricspace.net> From: Ben Woods Date: Mon, 5 Dec 2016 21:43:03 +0800 Message-ID: Subject: Re: CFT EFI Boot Refactoring To: Eric McCorkle Cc: freebsd-hackers@freebsd.org X-Mailman-Approved-At: Mon, 05 Dec 2016 14:40:40 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Dec 2016 13:43:05 -0000 On 4 December 2016 at 10:59, Eric McCorkle wrote: > You don't have a UFS filesystem anywhere, so we can rule that out. It > might be tsoome's bug, and it might just be that the bug is sporadic, > which would explain why I'm not seeing it on my setup with a dosfs. > > The only other obvious commonality between you and tsoome that doesn't > overlap my setup is multiple ZFS datasets, or ZFS data vdevs (mirrors, > stripes, etc) spread across multiple disks (my setup only has a log and > a cache on the ssd). > > > Here's what I'll do. I'll create an "extra_logging" branch off of > efize_new in my github repo, wherein I'll add a bunch of extra logging > into the detection process. It ought to be enough to print out device > paths and filesystem drivers just before it tries them. Ok, I have tested with the extra_logging branch, and can report that the text on the screen at the time of the hang was: >> FreeBSD EFI boot block Loader path: /boot/loader.efi Initializing modules: FS BackendProbing all handles for ZFS Done Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85bff18 Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85bfc18 Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85bf998 Probing all filesystems Probing dosfs _ Following that, I tried a combination of the old (unmodified) BOOTX64.EFI with the new (modified) files in /boot/. This produced the following output on the screen before it also hung: >> FreeBSD EFI boot block Loader path: /boot/loader.efi Initializing modules: ZFS UFS Probing 10 block devices............* done ZFS found the following pools: zroot UFS found no paritions Consoles: EFI console Probing all handles for ZFS Done Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85c9718 Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85c9418 Binding SIMPLE_FILESYSTEM_PROTOCOL to 0xb85c9198 Probing all filesystems Probing dosfs _ Hope this is helpful for debugging this issue. Note that once again I did NOT run any command to update my freebsd-boot partition (such as gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada1). Is this required? Regards, Ben -- From: Benjamin Woods woodsb02@gmail.com From owner-freebsd-hackers@freebsd.org Mon Dec 5 22:50:39 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 782CFC68078 for ; Mon, 5 Dec 2016 22:50:39 +0000 (UTC) (envelope-from j.deboynepollard-newsgroups@ntlworld.com) Received: from know-smtprelay-omc-6.server.virginmedia.net (know-smtprelay-omc-6.server.virginmedia.net [80.0.253.70]) by mx1.freebsd.org (Postfix) with ESMTP id EDE99ED for ; Mon, 5 Dec 2016 22:50:38 +0000 (UTC) (envelope-from j.deboynepollard-newsgroups@ntlworld.com) Received: from [192.168.1.100] ([86.10.211.13]) by know-smtprelay-6-imp with bizsmtp id GNpT1u00B0HtmFq01NpUuT; Mon, 05 Dec 2016 22:49:28 +0000 X-Originating-IP: [86.10.211.13] X-Spam: 0 X-Authority: v=2.1 cv=H94muLsi c=1 sm=1 tr=0 a=SB7hr1IvJSWWr45F2gQiKw==:117 a=SB7hr1IvJSWWr45F2gQiKw==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=IkcTkHD0fZMA:10 a=2rVjqWD_AAAA:8 a=itly7gIdAAAA:8 a=xNUUt9kjeGnzvt2K_IkA:9 a=QEXdDO2ut3YA:10 a=-FEs8UIgK8oA:10 a=NWVoK91CQyQA:10 a=ULaUcM2Ibn9MdPUUwucP:22 a=1RpNR2E4bTkVPcsa2RFZ:22 To: FreeBSD Hackers , "supervision@list.skarnet.org" , Debian users From: Jonathan de Boyne Pollard Subject: djbwares version 4 Message-ID: Date: Mon, 5 Dec 2016 22:49:20 +0000 User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Dec 2016 22:50:39 -0000 In celebration of the forthcoming leap second, djbwares is now at version 4. * http://jdebp.eu./Softwares/djbwares/ * http://jdebp.info./Softwares/djbwares/ I've added in the rest of M. Bernstein's public domain libtai library, parts of which were already included by some of the tools. This has added the easter, nowutc, and yearcal commands, which are packaged up alongside libtai.a, the libtai C language headers, and the libtai manual pages in a new libtai package. More importantly, it has added the leapsecs command, and the /usr/local/etc/leapsecs.dat file is now generated from leapsecs.txt rather than included as a binary in the source as it was before. The sharp-eyed will also note that support for /usr/local/etc/leapsecs.dat (as an alternative to /etc/leapsecs.dat for systems that like non-operating system files in /usr/local/etc) has also been added. The leapsecs.txt is the Bernstein 2015-06-30 version (which is still the latest published by M. Bernstein) patched with the forthcoming leap second. The libtai package does not include /usr/local/etc/leapsecs.dat . Rather, that is packaged in a separate leapsecs package, to allow updated versions to be substituted with ease when they come along, as well as to permit installing only that without the rest of libtai. From owner-freebsd-hackers@freebsd.org Mon Dec 5 23:43:13 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 911BBC693A7 for ; Mon, 5 Dec 2016 23:43:13 +0000 (UTC) (envelope-from torek@torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7E4191249 for ; Mon, 5 Dec 2016 23:43:12 +0000 (UTC) (envelope-from torek@torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.14.9/8.14.9) with ESMTP id uB5Nh5Dm078198 for ; Mon, 5 Dec 2016 15:43:06 -0800 (PST) (envelope-from torek@torek.net) Message-Id: <201612052343.uB5Nh5Dm078198@elf.torek.net> From: Chris Torek To: freebsd-hackers@freebsd.org Subject: kernel ioctl aggregator script, might be generally useful MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <78196.1480981385.1@elf.torek.net> Content-Transfer-Encoding: quoted-printable Date: Mon, 05 Dec 2016 15:43:05 -0800 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (elf.torek.net [127.0.0.1]); Mon, 05 Dec 2016 15:43:06 -0800 (PST) X-Mailman-Approved-At: Tue, 06 Dec 2016 00:16:21 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Dec 2016 23:43:13 -0000 I wrote a little Python script to let me view existing kernel _IO*(group, number, type) ioctls. Here's a sample of the kind of data it prints: group 'F': 0: _IOR('F', 0, struct fbtype) sys/fbio.h:106 2: _IOR('F', 2, struct fbinfo) sys/fbio.h:178 3: _IOW('F', 3, struct fbcmap) sys/fbio.h:191 4: _IOW('F', 4, struct fbcmap) sys/fbio.h:192 5: _IOW('F', 5, struct fbsattr) sys/fbio.h:216 6: _IOR('F', 6, struct fbgattr) sys/fbio.h:217 [snip] group 'I': 0: _IOWR('I', 0, struct iodev_pio_req) dev/io/iodev.h:42 1: _IOR('I', 1, struct iscsi_daemon_request) dev/iscsi/iscsi_ioctl.h:125 _IOR('I', 1, struct autofs_daemon_request) fs/autofs/autofs_ioctl.h:1= 12 _IOWR('I', 1, struct iodev_efivar_req) ia64/include/iodev.h:53 2: _IOW('I', 2, struct iscsi_daemon_handoff) dev/iscsi/iscsi_ioctl.h:126 _IOW('I', 2, struct autofs_daemon_done_101) fs/autofs/autofs_ioctl.h:= 113 3: _IOW('I', 3, struct iscsi_daemon_fail) dev/iscsi/iscsi_ioctl.h:127 _IOW('I', 3, struct autofs_daemon_done) fs/autofs/autofs_ioctl.h:114 4: _IOWR('I', 4, struct iscsi_daemon_connect) dev/iscsi/iscsi_ioctl.h:17= 9 [snip] which lets us see that there are overlapping uses of some of the group-'I' ioctls (overlapping uses are OK as long as they don't really collide, of course, this just lets you check for potential issues). (Also it does not know about #ifdef so it finds things like these: group r: 60: _IOW(r, 60, struct ipfobj) contrib/ipfilter/netinet/ip_fi= l.h:88 which is inside the "#else" part of a __STDC__ ifdef.) Anyway, if anyone wants it for whatever purpose, I put it up for public access at https://github.com/chris3torek/scripts (https://raw.githubusercontent.com/chris3torek/scripts/master/sysioctl.py) Chris From owner-freebsd-hackers@freebsd.org Tue Dec 6 02:29:33 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 10B8EC686DF for ; Tue, 6 Dec 2016 02:29:33 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from smtp.vangyzen.net (hotblack.vangyzen.net [IPv6:2607:fc50:1000:7400:216:3eff:fe72:314f]) by mx1.freebsd.org (Postfix) with ESMTP id F20FE1ADF for ; Tue, 6 Dec 2016 02:29:32 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from ford.home.vangyzen.net (unknown [76.164.15.242]) by smtp.vangyzen.net (Postfix) with ESMTPSA id 6D1F35648E for ; Mon, 5 Dec 2016 20:29:32 -0600 (CST) To: FreeBSD Hackers From: Eric van Gyzen Subject: kinfo_proc::ki_tdname truncation Message-ID: <3379ac69-16e0-4071-8e69-4fb05b011ba7@FreeBSD.org> Date: Mon, 5 Dec 2016 20:29:29 -0600 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 02:29:33 -0000 I just noticed that kinfo_proc::ki_tdname is three characters shorter than thread::td_name. Would anyone object if I steal 4 bytes from ki_sparestrings to add a field for these three extra characters (and fix all consumers accordingly)? Yes, I care enough about those three characters to do this. Thanks, Eric From owner-freebsd-hackers@freebsd.org Tue Dec 6 09:52:34 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9950DC69070 for ; Tue, 6 Dec 2016 09:52:34 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 317E41560; Tue, 6 Dec 2016 09:52:34 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id uB69qTLl075490 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 6 Dec 2016 11:52:29 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua uB69qTLl075490 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id uB69qTIN075489; Tue, 6 Dec 2016 11:52:29 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 6 Dec 2016 11:52:29 +0200 From: Konstantin Belousov To: Eric van Gyzen Cc: FreeBSD Hackers Subject: Re: kinfo_proc::ki_tdname truncation Message-ID: <20161206095229.GL54029@kib.kiev.ua> References: <3379ac69-16e0-4071-8e69-4fb05b011ba7@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3379ac69-16e0-4071-8e69-4fb05b011ba7@FreeBSD.org> User-Agent: Mutt/1.7.1 (2016-10-04) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 09:52:34 -0000 On Mon, Dec 05, 2016 at 08:29:29PM -0600, Eric van Gyzen wrote: > I just noticed that kinfo_proc::ki_tdname is three characters shorter than > thread::td_name. Would anyone object if I steal 4 bytes from ki_sparestrings to > add a field for these three extra characters (and fix all consumers > accordingly)? Yes, I care enough about those three characters to do this. > You also should update kinfo_proc32 and freebsd32_kinfo_proc_out() then. From owner-freebsd-hackers@freebsd.org Tue Dec 6 12:31:20 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95933C6860E for ; Tue, 6 Dec 2016 12:31:20 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x234.google.com (mail-wm0-x234.google.com [IPv6:2a00:1450:400c:c09::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 346D01CC6 for ; Tue, 6 Dec 2016 12:31:19 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x234.google.com with SMTP id t79so123737647wmt.0 for ; Tue, 06 Dec 2016 04:31:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=to:from:subject:message-id:date:user-agent:mime-version; bh=ZDn9acPamkluGcxMfiezGJ8mbYwT3+8y73ujnoDfoSo=; b=HTvhInjQgMdJlIxT1HuvMeEqtiK9SAysK8hgx5gw5QJxcApGDgmyAMkhaFDYbYaDF3 Rbz70TZv/0i6sX1GJylzS63hLxZTxhstxw9RONOXcQGuqAHImoH5hmEDdMdHCIg6B7n2 K2veRiuyCA2OHaU6+ElBzQeDsUgQVQF503YDhMfhH6y/EyyYT1zWkh2gDjFvq8IwdDyL GK3Ug6fU32zZfxt2DU1CpZqOc//sE2pq79EJS+bCbYkVGDACnxyJzBrlcXEfSg/4Zavl 2Fm+wkX8HOl+t8NYwXIlFmXmS0vuNg6Or8CgMBYGD+luu/nhMboarBdAIGa4ZiRldcjC NCnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version; bh=ZDn9acPamkluGcxMfiezGJ8mbYwT3+8y73ujnoDfoSo=; b=k+TJUReEZ8D4ac72548eOdzgBLKf9vOLKS2O9JOHZkVGFsIwKPUJ2wBIdTiIABh9EX w5mq95Ukw2AZr3JDjRE5jWjyaSRTUZmtLs8VMTdw4Qxd2zeTFZEIxfNkZ9epQxvTvagW sUUS49Zw155soUxgWpJV9KFgo6sfjn9jdRmT7j9RPT7xrgrTr/7WT2eISWXYob0QCZbp N5FAuCGVoCvFt5FRc6MOEbOapbm/RMx520fxQtpcTJOQYtxmbZs8YpEzGQ9e8S/npcRi ruM4bntffgFVF1OICe+zvWnel59KZPGNyRYXbmjGl3xplhGdDVemuKW9cq/Rrn9OlVIH 25YA== X-Gm-Message-State: AKaTC00RwPU3wzfiWW04Q97waytLbeYF2z/gMOPslKBZAU3X8bCrin/3BQM6qT9Kidli/9wK X-Received: by 10.28.73.136 with SMTP id w130mr2382269wma.82.1481027477714; Tue, 06 Dec 2016 04:31:17 -0800 (PST) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id b3sm25534448wjy.40.2016.12.06.04.31.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Dec 2016 04:31:16 -0800 (PST) To: "freebsd-hackers@freebsd.org" From: Steven Hartland Subject: Help needed to identify golang fork / memory corruption issue on FreeBSD Message-ID: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> Date: Tue, 6 Dec 2016 12:31:47 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 12:31:20 -0000 Hi guys I'm trying to help identify / fix an issue with golang where by fork results in memory corruption. Details of the issue can be found here: https://github.com/golang/go/issues/15658 In summary when a fork is done in golang is has a chance of causing memory corruption in the parent resulting in a process crash once detected. Its believed that this only effects FreeBSD. This has similarities to other reported issues such as this one which impacted perl during 10.x: https://rt.perl.org/Public/Bug/Display.html?id=122199 And more recently the issue with nginx on 11.x: https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html Its possible, some believe likely, that this is a kernel bug around fork / vm that golang stresses, but I've not been able to confirm. I can reproduce the issue at will, takes between 5mins and 1hour using 16 threads, and it definitely seems like an interaction between fork and other memory operations. I've tried reproducing the issue in C but also no joy (captured in the bug). For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix (#306350). Any advice / help would be most appreciated. Regards Steve From owner-freebsd-hackers@freebsd.org Tue Dec 6 11:35:50 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B96BC6960F for ; Tue, 6 Dec 2016 11:35:50 +0000 (UTC) (envelope-from bugs@gnu.support) Received: from stw1.rcdrun.com (stw1.rcdrun.com [217.170.207.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C5DF4172C for ; Tue, 6 Dec 2016 11:35:48 +0000 (UTC) (envelope-from bugs@gnu.support) Received: from protected.rcdrun.com (localhost [::1]) (AUTH: PLAIN securesender, TLS: TLSv1/SSLv3,256bits,AES256-SHA) by stw1.rcdrun.com with ESMTPSA; Tue, 06 Dec 2016 04:31:14 -0700 id 0000000000082657.000000005846A182.00005C22 Received: from localhost (localhost [127.0.0.1]) (uid 1001) by protected.rcdrun.com with local; Tue, 06 Dec 2016 14:29:10 +0300 id 00000000000E0223.000000005846A106.00007687 Date: Tue, 6 Dec 2016 14:29:10 +0300 From: Jean Louis To: Jonathan de Boyne Pollard Cc: FreeBSD Hackers , "supervision@list.skarnet.org" , Debian users Subject: Re: djbwares version 4 Message-ID: <20161206112910.GC28995@protected.rcdrun.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: X-Mailman-Approved-At: Tue, 06 Dec 2016 12:47:44 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 11:35:50 -0000 On Mon, Dec 05, 2016 at 10:49:20PM +0000, Jonathan de Boyne Pollard wrote: > In celebration of the forthcoming leap second, djbwares is now at version 4. > > * http://jdebp.eu./Softwares/djbwares/ > * http://jdebp.info./Softwares/djbwares/ http://jdebp.info./Softwares/djbwares is not working: "access denied" and I instinctively tried that one first, as to avoid .eu (even it makes no sense). From owner-freebsd-hackers@freebsd.org Tue Dec 6 12:59:30 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 042E1C6954F for ; Tue, 6 Dec 2016 12:59:30 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7694117BE for ; Tue, 6 Dec 2016 12:59:29 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id uB6CxJ79085815 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 6 Dec 2016 14:59:20 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua uB6CxJ79085815 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id uB6CxJH9085814; Tue, 6 Dec 2016 14:59:19 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 6 Dec 2016 14:59:19 +0200 From: Konstantin Belousov To: Steven Hartland Cc: "freebsd-hackers@freebsd.org" Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD Message-ID: <20161206125919.GQ54029@kib.kiev.ua> References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> User-Agent: Mutt/1.7.1 (2016-10-04) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 12:59:30 -0000 On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote: > Hi guys I'm trying to help identify / fix an issue with golang where by > fork results in memory corruption. > > Details of the issue can be found here: > https://github.com/golang/go/issues/15658 > > In summary when a fork is done in golang is has a chance of causing > memory corruption in the parent resulting in a process crash once detected. > > Its believed that this only effects FreeBSD. > > This has similarities to other reported issues such as this one which > impacted perl during 10.x: > https://rt.perl.org/Public/Bug/Display.html?id=122199 I cannot judge about any similarilities when all the description provided is 'memory corruption'. BTW, the perl issue described, where child segfaults after the fork, is more likely to be caused by the set of problems referenced in the FreeBSD-EN-16:17.vm. > > And more recently the issue with nginx on 11.x: > https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html Which does not affect anything unless aio is used on Sandy/Ivy. > > Its possible, some believe likely, that this is a kernel bug around fork > / vm that golang stresses, but I've not been able to confirm. > > I can reproduce the issue at will, takes between 5mins and 1hour using > 16 threads, and it definitely seems like an interaction between fork and > other memory operations. Which arch is the kernel and the process which demonstrates the behaviour ? I mean i386/amd64. > > I've tried reproducing the issue in C but also no joy (captured in the bug). > > For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix > (#306350). Switch to HEAD kernel, for start. Show the memory map of the failed process. Are you able to take ktrace of the process while still producing the bug ? Where is the memory corruption happen ? Is it in go runtime structures, or in the application data ? Can somebody knowledgable of either the go runtime or the app, try to identify the initial corrupted userspace data ? From owner-freebsd-hackers@freebsd.org Tue Dec 6 13:18:28 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4577AC69CC0 for ; Tue, 6 Dec 2016 13:18:28 +0000 (UTC) (envelope-from j.deboynepollard-newsgroups@ntlworld.com) Received: from know-smtprelay-omc-6.server.virginmedia.net (know-smtprelay-omc-6.server.virginmedia.net [80.0.253.70]) by mx1.freebsd.org (Postfix) with ESMTP id B96E21C8 for ; Tue, 6 Dec 2016 13:18:27 +0000 (UTC) (envelope-from j.deboynepollard-newsgroups@ntlworld.com) Received: from [192.168.1.100] ([86.10.211.13]) by know-smtprelay-6-imp with bizsmtp id GdJQ1u0060HtmFq01dJQQZ; Tue, 06 Dec 2016 13:18:24 +0000 X-Originating-IP: [86.10.211.13] X-Spam: 0 X-Authority: v=2.1 cv=H94muLsi c=1 sm=1 tr=0 a=SB7hr1IvJSWWr45F2gQiKw==:117 a=SB7hr1IvJSWWr45F2gQiKw==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=IkcTkHD0fZMA:10 a=2rVjqWD_AAAA:8 a=itly7gIdAAAA:8 a=agkgnfeDeLt8UnmRe1oA:9 a=QEXdDO2ut3YA:10 a=-FEs8UIgK8oA:10 a=NWVoK91CQyQA:10 a=ULaUcM2Ibn9MdPUUwucP:22 a=1RpNR2E4bTkVPcsa2RFZ:22 Subject: Re: djbwares version 4 References: <20161206112910.GC28995@protected.rcdrun.com> To: FreeBSD Hackers , "supervision@list.skarnet.org" , Debian users From: Jonathan de Boyne Pollard Message-ID: Date: Tue, 6 Dec 2016 13:18:14 +0000 User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20161206112910.GC28995@protected.rcdrun.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 13:18:28 -0000 Jonathan de Boyne Pollard: > In celebration of the forthcoming leap second, djbwares is now at > version 4. > > * http://jdebp.eu./Softwares/djbwares/ > * http://jdebp.info./Softwares/djbwares/ > Jean Louis: > http://jdebp.info./Softwares/djbwares > > is not working: "access denied" and I instinctively tried that one > first, as to avoid .eu (even it makes no sense). > You should have just tried the URL that I gave to you, without your changing it to something different. Ironically, Bernstein publicfile is part of the package at hand, and this is the documented behaviour of publicfile, in its original Bernstein manual: > A request for http://v/f refers to the file named ./v/f inside the root directory hierarchy, if f does not end with a slash. > httpd will refuse to read a file if the file [...] is anything other than a regular file: a directory, socket, device, etc. publicfile isn't going to let you read the WWW server's directories directly with URL tricks. You attempt that in vain. (-: For *not* trying to trick the WWW server, and simply reading the blurb and the download instructions, just use the actual URL that I gave. From owner-freebsd-hackers@freebsd.org Tue Dec 6 13:53:25 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 65754C63E7C for ; Tue, 6 Dec 2016 13:53:25 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wj0-x22e.google.com (mail-wj0-x22e.google.com [IPv6:2a00:1450:400c:c01::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1A97D19F3 for ; Tue, 6 Dec 2016 13:53:25 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wj0-x22e.google.com with SMTP id tg4so66534560wjb.1 for ; Tue, 06 Dec 2016 05:53:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=G5hYTCZ6Vwa/0Di0hyooLcb0SlD5Dt2MANOmCxa0HBs=; b=fimg4T9e8smnUn8yqJhUlWTzje3aPrBmx/SASrF/YbT8UfV5hQ/sbXeTR0HU5/wFZs cnqxYAu+usuKXocnCzFSOq1JB+W0P3HOwQu1kcoNYdBmbYKKqe94oB3Q6TDFCTnFXynd UBnKQ7XeL76gQZMMinH0jxO/FO3DjmYHfqi1ZmCbksu30BREKNcOoa2K3k3acZCer8bu JB5F5WGbGWdN8sXlWMeiNqec59akpJdSMuWbrOjsWFwwFLnWRY0z8BOMhTVwqS5NShrz TsqDryGPsNOjLxjW0dXY5bUGlMkeWkJSolJh0KrTk/OU3r/Bk8XmKhebaESdnZXcMVQ2 ZqLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=G5hYTCZ6Vwa/0Di0hyooLcb0SlD5Dt2MANOmCxa0HBs=; b=NgWQE8MGaJ/4GatsceArm7M7z56yz2qYHzw5TG7VtapeDQGT5f8nPFALf69mdkzkwJ qMCbvQvOzoI3y666bizCSOO3AsUsJKlK4MtY/yXps5bfpEGoUhtdpGBpsGJCkRq5s/5N UO9oyEaqcQJ7aG0YWonPZ+/M8CGTeLltbPAA5vh/wPpzA79ZFENlXdkdQEaQ2jcPgjaN zP7JpD3bk4yH3Qs7GgGHk6dN4zGjeloH83Y2d5eHWemDTGENoS7Gp1qMYuUnDj304N2L 0rwtTMR9IFpO9/SuMS9qtQLeNwa4F/OyKNHcZgVjaBuCVqma3PCQZLJWrHc9f0T9EVZv hdqQ== X-Gm-Message-State: AKaTC00eCKn64Hv9x15LUlsMYRANpjJXGDT0mqIoJsTKHzHnel4HP2XLjRZPA/+6jDonk2+C X-Received: by 10.194.58.52 with SMTP id n20mr54257896wjq.110.1481032402931; Tue, 06 Dec 2016 05:53:22 -0800 (PST) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id q7sm25901355wjh.9.2016.12.06.05.53.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Dec 2016 05:53:22 -0800 (PST) Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD To: Konstantin Belousov References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> Cc: "freebsd-hackers@freebsd.org" From: Steven Hartland Message-ID: <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> Date: Tue, 6 Dec 2016 13:53:52 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161206125919.GQ54029@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 13:53:25 -0000 On 06/12/2016 12:59, Konstantin Belousov wrote: > On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote: >> Hi guys I'm trying to help identify / fix an issue with golang where by >> fork results in memory corruption. >> >> Details of the issue can be found here: >> https://github.com/golang/go/issues/15658 >> >> In summary when a fork is done in golang is has a chance of causing >> memory corruption in the parent resulting in a process crash once detected. >> >> Its believed that this only effects FreeBSD. >> >> This has similarities to other reported issues such as this one which >> impacted perl during 10.x: >> https://rt.perl.org/Public/Bug/Display.html?id=122199 > I cannot judge about any similarilities when all the description provided > is 'memory corruption'. BTW, the perl issue described, where child segfaults > after the fork, is more likely to be caused by the set of problems referenced > in the FreeBSD-EN-16:17.vm. > >> And more recently the issue with nginx on 11.x: >> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html > Which does not affect anything unless aio is used on Sandy/Ivy. > >> Its possible, some believe likely, that this is a kernel bug around fork >> / vm that golang stresses, but I've not been able to confirm. >> >> I can reproduce the issue at will, takes between 5mins and 1hour using >> 16 threads, and it definitely seems like an interaction between fork and >> other memory operations. > Which arch is the kernel and the process which demonstrates the behaviour ? > I mean i386/amd64. amd64 > >> I've tried reproducing the issue in C but also no joy (captured in the bug). >> >> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix >> (#306350). > Switch to HEAD kernel, for start. > Show the memory map of the failed process. > Are you able to take ktrace of the process while still producing the bug ? When ever I've tried ktrace the issue doesn't present itself. I can try and run it for an extended period to see if it does eventually but I did run it for a few hours without any joy. I'm currently testing with a 11.0-RELEASE debug kernel, witness, invariants etc to see if that would detect anything; however so far its taking longer than usual to reproduce so it may simply not occur with a debug kernel. > Where is the memory corruption happen ? Is it in go runtime structures, > or in the application data ? Its usually detected by the runtime GC which panics with a number of errors e.g. fatal error: all goroutines are asleep - deadlock! fatal error: workbuf is empty runtime: nelems=256 nfree=233 nalloc=23 previous allocCount=18 nfreed=65531 fatal error: sweep increased allocation count runtime: failed MSpanList_Remove 0x800698500 0x800b46d40 0x53adb0 0x53ada0 fatal error: MSpanList_Remove As the test is very basic its unlikely to see an issue in the application data. > Can somebody knowledgable of either the go runtime or the app, > try to identify the initial corrupted userspace data ? The golang developers have looked but where unable to reproduce on freebsd-amd64-gce101 gomote running FreeBSD 10.1. This could be a factor of the VM its unclear. The app is tiny test binary which I'm current running with GOGC=2: package main import ( "fmt" "os/exec" "runtime" "time" ) var ( gcPeriod = time.Second * 10 forkRoutines = 16 ) func run(done chan struct{}) { cmd := exec.Command("/usr/bin/true") cmd.Start() cmd.Wait() done <- struct{}{} } func main() { fmt.Printf("Starting %v forking goroutines...\n", forkRoutines) fmt.Println("GOMAXPROCS:", runtime.GOMAXPROCS(0)) done := make(chan struct{}, forkRoutines*2) for i := 0; i < forkRoutines; i++ { go run(done) } for { start := time.Now() active := forkRoutines forking: for range done { if time.Since(start) > gcPeriod { active-- if active == 0 { break forking } } else { go run(done) } } runtime.GC() for i := 0; i < forkRoutines; i++ { go run(done) } } } From owner-freebsd-hackers@freebsd.org Tue Dec 6 14:35:39 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E6160C6A07B for ; Tue, 6 Dec 2016 14:35:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8CEBB1975 for ; Tue, 6 Dec 2016 14:35:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id uB6EZWOk016094 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 6 Dec 2016 16:35:33 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua uB6EZWOk016094 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id uB6EZW2K016093; Tue, 6 Dec 2016 16:35:32 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 6 Dec 2016 16:35:32 +0200 From: Konstantin Belousov To: Steven Hartland Cc: "freebsd-hackers@freebsd.org" Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD Message-ID: <20161206143532.GR54029@kib.kiev.ua> References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> User-Agent: Mutt/1.7.1 (2016-10-04) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 14:35:40 -0000 On Tue, Dec 06, 2016 at 01:53:52PM +0000, Steven Hartland wrote: > On 06/12/2016 12:59, Konstantin Belousov wrote: > > On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote: > >> Hi guys I'm trying to help identify / fix an issue with golang where by > >> fork results in memory corruption. > >> > >> Details of the issue can be found here: > >> https://github.com/golang/go/issues/15658 > >> > >> In summary when a fork is done in golang is has a chance of causing > >> memory corruption in the parent resulting in a process crash once detected. > >> > >> Its believed that this only effects FreeBSD. > >> > >> This has similarities to other reported issues such as this one which > >> impacted perl during 10.x: > >> https://rt.perl.org/Public/Bug/Display.html?id=122199 > > I cannot judge about any similarilities when all the description provided > > is 'memory corruption'. BTW, the perl issue described, where child segfaults > > after the fork, is more likely to be caused by the set of problems referenced > > in the FreeBSD-EN-16:17.vm. > > > >> And more recently the issue with nginx on 11.x: > >> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html > > Which does not affect anything unless aio is used on Sandy/Ivy. > > > >> Its possible, some believe likely, that this is a kernel bug around fork > >> / vm that golang stresses, but I've not been able to confirm. > >> > >> I can reproduce the issue at will, takes between 5mins and 1hour using > >> 16 threads, and it definitely seems like an interaction between fork and > >> other memory operations. > > Which arch is the kernel and the process which demonstrates the behaviour ? > > I mean i386/amd64. > amd64 How large is the machine, how many cores, what is the physical memory size ? > > > >> I've tried reproducing the issue in C but also no joy (captured in the bug). > >> > >> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix > >> (#306350). > > Switch to HEAD kernel, for start. > > Show the memory map of the failed process. > > Are you able to take ktrace of the process while still producing the bug ? > When ever I've tried ktrace the issue doesn't present itself. > > I can try and run it for an extended period to see if it does eventually > but I did run it for a few hours without any joy. > > I'm currently testing with a 11.0-RELEASE debug kernel, witness, > invariants etc to see if that would detect anything; however so far its > taking longer than usual to reproduce so it may simply not occur with a > debug kernel. > > > Where is the memory corruption happen ? Is it in go runtime structures, > > or in the application data ? > Its usually detected by the runtime GC which panics with a number of > errors e.g. > fatal error: all goroutines are asleep - deadlock! > > fatal error: workbuf is empty > > runtime: nelems=256 nfree=233 nalloc=23 previous allocCount=18 nfreed=65531 > fatal error: sweep increased allocation count > > runtime: failed MSpanList_Remove 0x800698500 0x800b46d40 0x53adb0 0x53ada0 > fatal error: MSpanList_Remove > > As the test is very basic its unlikely to see an issue in the > application data. > > > Can somebody knowledgable of either the go runtime or the app, > > try to identify the initial corrupted userspace data ? > The golang developers have looked but where unable to reproduce on > freebsd-amd64-gce101 gomote running FreeBSD 10.1. This could be a factor > of the VM its unclear. This is not what I asked. I am asking is it possible to make an educated guess at what initial corruption could be to cause the outcome. Like, if this variable suddently becomes zero, we get the errors. Does go runtime use FreeBSD libc and threading library ? > > The app is tiny test binary which I'm current running with GOGC=2: > package main > > import ( > "fmt" > "os/exec" > "runtime" > "time" > ) > > var ( > gcPeriod = time.Second * 10 > forkRoutines = 16 > ) > > func run(done chan struct{}) { > cmd := exec.Command("/usr/bin/true") > cmd.Start() > cmd.Wait() > > done <- struct{}{} > } > > func main() { > fmt.Printf("Starting %v forking goroutines...\n", forkRoutines) > fmt.Println("GOMAXPROCS:", runtime.GOMAXPROCS(0)) > > done := make(chan struct{}, forkRoutines*2) > > for i := 0; i < forkRoutines; i++ { > go run(done) > } > > for { > start := time.Now() > active := forkRoutines > forking: > for range done { > if time.Since(start) > gcPeriod { > active-- > if active == 0 { > break forking > } > } else { > go run(done) > } > } > > runtime.GC() > > for i := 0; i < forkRoutines; i++ { > go run(done) > } > } > } From owner-freebsd-hackers@freebsd.org Tue Dec 6 17:07:25 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E023DC6AE7A for ; Tue, 6 Dec 2016 17:07:25 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6BCBECC3 for ; Tue, 6 Dec 2016 17:07:25 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x22e.google.com with SMTP id u144so28435617wmu.1 for ; Tue, 06 Dec 2016 09:07:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=JHw0k5DG0E8/IwhiO5fom7MKN8ofm9ZERGHNo8uSTHk=; b=fROPdA5Vnnu/FyufC/OmvY9NgbQbmckbSTOYI5XHresSrRaLe6kEWGw/K+Dhhg54xD Ji9XL6lGBsMEtXRB+EwpiJKUkU+tJyy3XNQSlQ0dGgyWPkU0CvXD+H3t2CWjSoNnAy7q L6eP48yDZU+7iJOpCq0gsW7SPHV9i7ryJnrubtd64ETh5u8H/1aECqww7j+wqrPoaC8D nCAhiuVu5eu7VXgE9deBFPmQxxs8o/2MLS+ehYS4htJsSQvXoMs1slY7DTSWK7uh+QDM s5nyadYNCvu36ep57xwooqRKl4QJp1gQq5ll9vQsLlDF+NPEcUMToGzheOwLaLFbwSgw hEuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=JHw0k5DG0E8/IwhiO5fom7MKN8ofm9ZERGHNo8uSTHk=; b=Zt8pePfO/8M9qZgpbvgjHc2QrV2X5QQP1aqKSR4gUaJT/KhiYjD8cMmDjv3p9HIHF3 pcnfFWZKQllgEtyfgzNWthsx16bjdqXrWcNDuvT6xkuOaJBGVuo8dy6D1AJldO4JePNq SNPSuezL5iF5oeTSAFMLZ20utL1lCLuRkJn1sx5gyT33rsppjdz8vRzumAzov5V7qC7t iZBC90W6wjdjFmrpoUFyg8I8WD6Mh3UupFtlbnLqqWHIJJ+wkYMiSQuJA8T8BoaryZAR vn8sKFyog8C8ReiXhlr9ggJfROSASk+lDmL//ALTZuz8iLL1qZsNR4jjWYOclbzKIBPK J6eA== X-Gm-Message-State: AKaTC00yxB3CWIHCajVyHqVzQreVmGJgqt+fmrFgt1HVDGycBvFqynw89TsOwF5HoOdYu+nG X-Received: by 10.28.107.77 with SMTP id g74mr3465740wmc.109.1481044043105; Tue, 06 Dec 2016 09:07:23 -0800 (PST) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id w7sm4937325wmd.24.2016.12.06.09.07.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Dec 2016 09:07:22 -0800 (PST) Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD To: Konstantin Belousov References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua> Cc: "freebsd-hackers@freebsd.org" From: Steven Hartland Message-ID: Date: Tue, 6 Dec 2016 17:07:53 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161206143532.GR54029@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 17:07:26 -0000 On 06/12/2016 14:35, Konstantin Belousov wrote: > On Tue, Dec 06, 2016 at 01:53:52PM +0000, Steven Hartland wrote: >> On 06/12/2016 12:59, Konstantin Belousov wrote: >>> On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote: >>>> Hi guys I'm trying to help identify / fix an issue with golang where by >>>> fork results in memory corruption. >>>> >>>> Details of the issue can be found here: >>>> https://github.com/golang/go/issues/15658 >>>> >>>> In summary when a fork is done in golang is has a chance of causing >>>> memory corruption in the parent resulting in a process crash once detected. >>>> >>>> Its believed that this only effects FreeBSD. >>>> >>>> This has similarities to other reported issues such as this one which >>>> impacted perl during 10.x: >>>> https://rt.perl.org/Public/Bug/Display.html?id=122199 >>> I cannot judge about any similarilities when all the description provided >>> is 'memory corruption'. BTW, the perl issue described, where child segfaults >>> after the fork, is more likely to be caused by the set of problems referenced >>> in the FreeBSD-EN-16:17.vm. >>> >>>> And more recently the issue with nginx on 11.x: >>>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html >>> Which does not affect anything unless aio is used on Sandy/Ivy. >>> >>>> Its possible, some believe likely, that this is a kernel bug around fork >>>> / vm that golang stresses, but I've not been able to confirm. >>>> >>>> I can reproduce the issue at will, takes between 5mins and 1hour using >>>> 16 threads, and it definitely seems like an interaction between fork and >>>> other memory operations. >>> Which arch is the kernel and the process which demonstrates the behaviour ? >>> I mean i386/amd64. >> amd64 > How large is the machine, how many cores, what is the physical memory size ? 24 cores 32GB RAM. CPU: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2500.06-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206d7 Family=0x6 Model=0x2d Stepping=7 Features=0xbfebfbff Features2=0x1fbee3ff AMD Features=0x2c100800 AMD Features2=0x1 XSAVE Features=0x1 VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 34359738368 (32768 MB) avail memory = 33209896960 (31671 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs FreeBSD/SMP: 2 package(s) x 6 core(s) x 2 hardware threads The HEAD box I'm just updating to run the test on has the same CPU but 128GB of RAM. >>>> I've tried reproducing the issue in C but also no joy (captured in the bug). >>>> >>>> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix >>>> (#306350). >>> Switch to HEAD kernel, for start. >>> Show the memory map of the failed process. >>> Are you able to take ktrace of the process while still producing the bug ? >> When ever I've tried ktrace the issue doesn't present itself. >> >> I can try and run it for an extended period to see if it does eventually >> but I did run it for a few hours without any joy. >> >> I'm currently testing with a 11.0-RELEASE debug kernel, witness, >> invariants etc to see if that would detect anything; however so far its >> taking longer than usual to reproduce so it may simply not occur with a >> debug kernel. >> >>> Where is the memory corruption happen ? Is it in go runtime structures, >>> or in the application data ? >> Its usually detected by the runtime GC which panics with a number of >> errors e.g. >> fatal error: all goroutines are asleep - deadlock! >> >> fatal error: workbuf is empty >> >> runtime: nelems=256 nfree=233 nalloc=23 previous allocCount=18 nfreed=65531 >> fatal error: sweep increased allocation count >> >> runtime: failed MSpanList_Remove 0x800698500 0x800b46d40 0x53adb0 0x53ada0 >> fatal error: MSpanList_Remove >> >> As the test is very basic its unlikely to see an issue in the >> application data. >> >>> Can somebody knowledgable of either the go runtime or the app, >>> try to identify the initial corrupted userspace data ? >> The golang developers have looked but where unable to reproduce on >> freebsd-amd64-gce101 gomote running FreeBSD 10.1. This could be a factor >> of the VM its unclear. > This is not what I asked. I am asking is it possible to make an educated > guess at what initial corruption could be to cause the outcome. Like, > if this variable suddently becomes zero, we get the errors. I'll have a look through the crashes dumps I have to see if things point to null / zeroed memory. > Does go runtime use FreeBSD libc and threading library ? No it doesn't, each built binary its totally standalone and uses asm for core system calls. The runtime directly creates kernel threads with thr_new, which it then manages internally. One possibly important difference between golang and C apps is it uses goroutines which are lightweight so called green threads which are mapped onto a set of kernel threads. A pretty good write up of this can be found here: http://blog.nindalf.com/how-goroutines-work/ Regards Steve From owner-freebsd-hackers@freebsd.org Tue Dec 6 20:34:37 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 43697C6AEB6 for ; Tue, 6 Dec 2016 20:34:37 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com [IPv6:2a00:1450:400c:c09::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CD2E014E0 for ; Tue, 6 Dec 2016 20:34:36 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x233.google.com with SMTP id g23so142506459wme.1 for ; Tue, 06 Dec 2016 12:34:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=0x7fLjz8ywCh1e0KW/tl8dBZPAela5+/CgTWK8qX954=; b=wQRQH/ODOnLIGnL4C5It8k/qSz4beN/wLUqL7adlskRSY+rHGo2zT44Jm0IxLy6T8e vjlZGAODRk0q5AqS2dAdmCWpHd5vT6CPz1dQYX5wrW4zdaVZwKZx1x6r4WEgXX6wPisB IIEetSVpw+ZxLKEdgHQwqFWy7gnkxl5gzVhJrywbjaLaH9XCblyUCHtXlu9fu5WQ241C THoktEDIrVlawFkRfI0UP3Y1CrIa/wZGkdx0hfuAyeCecaxL59tkX498emUsBTCRW15e Wsn61azYxGsYCFbcnRkNxnFw0+UjZb3XUoH2HiA7R76b0XjDqNxh9Od4Nd8ZtXSVOfJa fdtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=0x7fLjz8ywCh1e0KW/tl8dBZPAela5+/CgTWK8qX954=; b=kFFTd1tnb1idGcSX6z3U3kj5JK8l+JDBrYg06sPrrKODQIJAM6qhowHgVO3piG5B4H 1Y70qOc6/CylRnOB8oAvPZRzlibm6FteycaWbpD7LK3rPvNKE0wn3fYnx/P2FQPLscej FPBt1MCMAMKms66hYmzug1SG8HYnhofUWi6vvO52aYMByGeu0L3YDfVvvTSCn60iK0ZA DDJrCzAal98a5wWgdXj9tbXTugTgarEhWrC8MyXv7KCY1a2x3XNLO0o+Nd2EGOXeKhIM EtEnr3EEzO10LUBfdIqkCgDoB8xx9dyHTRAzm+5WUdSdZUE+Ix5pst4VtiT5TeCmxlsz /iKw== X-Gm-Message-State: AKaTC03pJ+ASUCA3UcKQbPAJB7IT2oYxU4dqJKdZNOQjsA7zzPjkikzOUsm3v6yg8gvOLnUQ X-Received: by 10.28.185.203 with SMTP id j194mr320012wmf.73.1481056474766; Tue, 06 Dec 2016 12:34:34 -0800 (PST) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id b15sm5854952wma.5.2016.12.06.12.34.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Dec 2016 12:34:33 -0800 (PST) Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD To: Konstantin Belousov References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua> Cc: "freebsd-hackers@freebsd.org" From: Steven Hartland Message-ID: <9b40c93a-871f-bb32-668c-39bc3e31e385@multiplay.co.uk> Date: Tue, 6 Dec 2016 20:35:04 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161206143532.GR54029@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 20:34:37 -0000 On 06/12/2016 14:35, Konstantin Belousov wrote: > On Tue, Dec 06, 2016 at 01:53:52PM +0000, Steven Hartland wrote: >> On 06/12/2016 12:59, Konstantin Belousov wrote: >>> On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote: >>>> Hi guys I'm trying to help identify / fix an issue with golang where by >>>> fork results in memory corruption. >>>> >>>> Details of the issue can be found here: >>>> https://github.com/golang/go/issues/15658 >>>> >>>> In summary when a fork is done in golang is has a chance of causing >>>> memory corruption in the parent resulting in a process crash once detected. >>>> >>>> Its believed that this only effects FreeBSD. >>>> >>>> This has similarities to other reported issues such as this one which >>>> impacted perl during 10.x: >>>> https://rt.perl.org/Public/Bug/Display.html?id=122199 >>> I cannot judge about any similarilities when all the description provided >>> is 'memory corruption'. BTW, the perl issue described, where child segfaults >>> after the fork, is more likely to be caused by the set of problems referenced >>> in the FreeBSD-EN-16:17.vm. >>> >>>> And more recently the issue with nginx on 11.x: >>>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html >>> Which does not affect anything unless aio is used on Sandy/Ivy. >>> >>>> Its possible, some believe likely, that this is a kernel bug around fork >>>> / vm that golang stresses, but I've not been able to confirm. >>>> >>>> I can reproduce the issue at will, takes between 5mins and 1hour using >>>> 16 threads, and it definitely seems like an interaction between fork and >>>> other memory operations. >>> Which arch is the kernel and the process which demonstrates the behaviour ? >>> I mean i386/amd64. >> amd64 > How large is the machine, how many cores, what is the physical memory size ? > >>>> I've tried reproducing the issue in C but also no joy (captured in the bug). >>>> >>>> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix >>>> (#306350). >>> Switch to HEAD kernel, for start. >>> Show the memory map of the failed process. No sign of zeroed memory that I can tell. This error was caused by hitting the following validation in gc: func (list *mSpanList) remove(span *mspan) { if span.prev == nil || span.list != list { println("runtime: failed MSpanList_Remove", span, span.prev, span.list, list) throw("MSpanList_Remove") } runtime: failed MSpanList_Remove 0x80052e580 0x80052e300 0x53e9c0 0x53e9b0 fatal error: MSpanList_Remove (gdb) print list $4 = (runtime.mSpanList *) 0x53e9b0 (gdb) print span.list $5 = (runtime.mSpanList *) 0x53e9c0 (gdb) print span.prev $6 = (struct runtime.mspan **) 0x80052e300 (gdb) print *list $7 = {first = 0x80052e580, last = 0x8008aa180} (gdb) print *span.list $8 = {first = 0x8007ea7e0, last = 0x80052e580} procstat -v test.core.1481054183 PID START END PRT RES PRES REF SHD FLAG TP PATH 1178 0x400000 0x49b000 r-x 115 223 3 1 CN-- vn /root/test 1178 0x49b000 0x528000 r-- 97 223 3 1 CN-- vn /root/test 1178 0x528000 0x539000 rw- 10 0 1 0 C--- vn /root/test 1178 0x539000 0x55a000 rw- 16 16 1 0 C--- df 1178 0x800528000 0x800a28000 rw- 118 118 1 0 C--- df 1178 0x800a28000 0x800a68000 rw- 1 1 1 0 CN-- df 1178 0x800a68000 0x800aa8000 rw- 2 2 1 0 CN-- df 1178 0x800aa8000 0x800c08000 rw- 50 50 1 0 CN-- df 1178 0x800c08000 0x800c48000 rw- 2 2 1 0 CN-- df 1178 0x800c48000 0x800c88000 rw- 1 1 1 0 CN-- df 1178 0x800c88000 0x800cc8000 rw- 1 1 1 0 CN-- df 1178 0xc000000000 0xc000001000 rw- 1 1 1 0 CN-- df 1178 0xc41ffe0000 0xc41ffe8000 rw- 8 8 1 0 CN-- df 1178 0xc41ffe8000 0xc41fff0000 rw- 8 8 1 0 CN-- df 1178 0xc41fff0000 0xc41fff8000 rw- 8 8 1 0 C--- df 1178 0xc41fff8000 0xc420300000 rw- 553 553 1 0 C--- df 1178 0xc420300000 0xc420400000 rw- 234 234 1 0 C--- df 1178 0x7ffffffdf000 0x7ffffffff000 rwx 2 2 1 0 C--D df 1178 0x7ffffffff000 0x800000000000 r-x 1 1 33 0 ---- ph This is from FreeBSD 12.0-CURRENT #36 r309618M ktrace on 11.0-RELEASE is still running 6 hours so far. Regards Steve From owner-freebsd-hackers@freebsd.org Tue Dec 6 23:13:17 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C6AC6C6A150 for ; Tue, 6 Dec 2016 23:13:17 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from smtp.vangyzen.net (hotblack.vangyzen.net [199.48.133.146]) by mx1.freebsd.org (Postfix) with ESMTP id B1364182E for ; Tue, 6 Dec 2016 23:13:17 +0000 (UTC) (envelope-from vangyzen@FreeBSD.org) Received: from sweettea.beer.town (unknown [76.164.8.130]) by smtp.vangyzen.net (Postfix) with ESMTPSA id 828475648E for ; Tue, 6 Dec 2016 17:13:11 -0600 (CST) Subject: Re: kinfo_proc::ki_tdname truncation To: FreeBSD Hackers References: <3379ac69-16e0-4071-8e69-4fb05b011ba7@FreeBSD.org> From: Eric van Gyzen Message-ID: <326a2d68-c4f4-e6fc-17c5-4556ecf8c853@FreeBSD.org> Date: Tue, 6 Dec 2016 17:13:10 -0600 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.5.0 MIME-Version: 1.0 In-Reply-To: <3379ac69-16e0-4071-8e69-4fb05b011ba7@FreeBSD.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 23:13:17 -0000 On 12/05/2016 20:29, Eric van Gyzen wrote: > I just noticed that kinfo_proc::ki_tdname is three characters shorter than > thread::td_name. Would anyone object if I steal 4 bytes from ki_sparestrings to > add a field for these three extra characters (and fix all consumers > accordingly)? Yes, I care enough about those three characters to do this. If anyone is interested: https://reviews.freebsd.org/D8722 Eric From owner-freebsd-hackers@freebsd.org Wed Dec 7 12:14:56 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3403AC6AD8D for ; Wed, 7 Dec 2016 12:14:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B50763CE for ; Wed, 7 Dec 2016 12:14:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id uB7CEn8t046228 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 7 Dec 2016 14:14:49 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua uB7CEn8t046228 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id uB7CEnIX046227; Wed, 7 Dec 2016 14:14:49 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 7 Dec 2016 14:14:49 +0200 From: Konstantin Belousov To: Steven Hartland Cc: "freebsd-hackers@freebsd.org" Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD Message-ID: <20161207121449.GV54029@kib.kiev.ua> References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua> <9b40c93a-871f-bb32-668c-39bc3e31e385@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9b40c93a-871f-bb32-668c-39bc3e31e385@multiplay.co.uk> User-Agent: Mutt/1.7.1 (2016-10-04) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Dec 2016 12:14:56 -0000 On Tue, Dec 06, 2016 at 08:35:04PM +0000, Steven Hartland wrote: > On 06/12/2016 14:35, Konstantin Belousov wrote: > > On Tue, Dec 06, 2016 at 01:53:52PM +0000, Steven Hartland wrote: > >> On 06/12/2016 12:59, Konstantin Belousov wrote: > >>> On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote: > >>>> Hi guys I'm trying to help identify / fix an issue with golang where by > >>>> fork results in memory corruption. > >>>> > >>>> Details of the issue can be found here: > >>>> https://github.com/golang/go/issues/15658 > >>>> > >>>> In summary when a fork is done in golang is has a chance of causing > >>>> memory corruption in the parent resulting in a process crash once detected. > >>>> > >>>> Its believed that this only effects FreeBSD. > >>>> > >>>> This has similarities to other reported issues such as this one which > >>>> impacted perl during 10.x: > >>>> https://rt.perl.org/Public/Bug/Display.html?id=122199 > >>> I cannot judge about any similarilities when all the description provided > >>> is 'memory corruption'. BTW, the perl issue described, where child segfaults > >>> after the fork, is more likely to be caused by the set of problems referenced > >>> in the FreeBSD-EN-16:17.vm. > >>> > >>>> And more recently the issue with nginx on 11.x: > >>>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html > >>> Which does not affect anything unless aio is used on Sandy/Ivy. > >>> > >>>> Its possible, some believe likely, that this is a kernel bug around fork > >>>> / vm that golang stresses, but I've not been able to confirm. > >>>> > >>>> I can reproduce the issue at will, takes between 5mins and 1hour using > >>>> 16 threads, and it definitely seems like an interaction between fork and > >>>> other memory operations. > >>> Which arch is the kernel and the process which demonstrates the behaviour ? > >>> I mean i386/amd64. > >> amd64 > > How large is the machine, how many cores, what is the physical memory size ? I was able to reproduce that as well, reliably, on two desktop-size machines. One is SandyBridge, same core microarchitecture as your crashbox, another is Haswell. I see the error both with PCID enabled and disabled on both machines (Haswell does implement INVPCID, so the original aio/PCID bug did never affected this microarchitecture). I believe this clears the PCID changes from the accusations. > > > >>>> I've tried reproducing the issue in C but also no joy (captured in the bug). > >>>> > >>>> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix > >>>> (#306350). > >>> Switch to HEAD kernel, for start. > >>> Show the memory map of the failed process. > No sign of zeroed memory that I can tell. > > This error was caused by hitting the following validation in gc: > func (list *mSpanList) remove(span *mspan) { > if span.prev == nil || span.list != list { > println("runtime: failed MSpanList_Remove", span, > span.prev, span.list, list) > throw("MSpanList_Remove") > } > > runtime: failed MSpanList_Remove 0x80052e580 0x80052e300 0x53e9c0 0x53e9b0 > fatal error: MSpanList_Remove > > (gdb) print list > $4 = (runtime.mSpanList *) 0x53e9b0 > (gdb) print span.list > $5 = (runtime.mSpanList *) 0x53e9c0 The difference, which triggered the exception, is quite curious: list is 0x53e9b0, and span.list == list + 0x10. More, this is not a single-bit error: bit patter is 1011 for 0xb and 1100 for 0xc. It is highly unlikely that the cause is a memory corruption due to OS mis-managing pages or TLB. Typically, you get either page or cache line of complete garbage, instead of the almost identical but slightly modified data. > (gdb) print span.prev > $6 = (struct runtime.mspan **) 0x80052e300 > (gdb) print *list > $7 = {first = 0x80052e580, last = 0x8008aa180} > (gdb) print *span.list > $8 = {first = 0x8007ea7e0, last = 0x80052e580} > > procstat -v test.core.1481054183 > PID START END PRT RES PRES REF SHD FLAG > TP PATH > 1178 0x400000 0x49b000 r-x 115 223 3 1 CN-- vn > /root/test > 1178 0x49b000 0x528000 r-- 97 223 3 1 CN-- vn > /root/test > 1178 0x528000 0x539000 rw- 10 0 1 0 C--- vn > /root/test > 1178 0x539000 0x55a000 rw- 16 16 1 0 C--- df > 1178 0x800528000 0x800a28000 rw- 118 118 1 0 C--- df > 1178 0x800a28000 0x800a68000 rw- 1 1 1 0 CN-- df > 1178 0x800a68000 0x800aa8000 rw- 2 2 1 0 CN-- df > 1178 0x800aa8000 0x800c08000 rw- 50 50 1 0 CN-- df > 1178 0x800c08000 0x800c48000 rw- 2 2 1 0 CN-- df > 1178 0x800c48000 0x800c88000 rw- 1 1 1 0 CN-- df > 1178 0x800c88000 0x800cc8000 rw- 1 1 1 0 CN-- df > 1178 0xc000000000 0xc000001000 rw- 1 1 1 0 CN-- df > 1178 0xc41ffe0000 0xc41ffe8000 rw- 8 8 1 0 CN-- df > 1178 0xc41ffe8000 0xc41fff0000 rw- 8 8 1 0 CN-- df > 1178 0xc41fff0000 0xc41fff8000 rw- 8 8 1 0 C--- df > 1178 0xc41fff8000 0xc420300000 rw- 553 553 1 0 C--- df > 1178 0xc420300000 0xc420400000 rw- 234 234 1 0 C--- df > 1178 0x7ffffffdf000 0x7ffffffff000 rwx 2 2 1 0 C--D df > 1178 0x7ffffffff000 0x800000000000 r-x 1 1 33 0 ---- ph > > This is from FreeBSD 12.0-CURRENT #36 r309618M > > ktrace on 11.0-RELEASE is still running 6 hours so far. > > Regards > Steve > From owner-freebsd-hackers@freebsd.org Wed Dec 7 14:31:02 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2E68CC6ACA1 for ; Wed, 7 Dec 2016 14:31:02 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B013B103F for ; Wed, 7 Dec 2016 14:31:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id uB7EUpLq079579 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 7 Dec 2016 16:30:51 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua uB7EUpLq079579 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id uB7EUpQB079577; Wed, 7 Dec 2016 16:30:51 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 7 Dec 2016 16:30:51 +0200 From: Konstantin Belousov To: Steven Hartland Cc: "freebsd-hackers@freebsd.org" Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD Message-ID: <20161207143051.GX54029@kib.kiev.ua> References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua> <9b40c93a-871f-bb32-668c-39bc3e31e385@multiplay.co.uk> <20161207121449.GV54029@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161207121449.GV54029@kib.kiev.ua> User-Agent: Mutt/1.7.1 (2016-10-04) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Dec 2016 14:31:02 -0000 On Wed, Dec 07, 2016 at 02:14:49PM +0200, Konstantin Belousov wrote: > On Tue, Dec 06, 2016 at 08:35:04PM +0000, Steven Hartland wrote: > > On 06/12/2016 14:35, Konstantin Belousov wrote: > > > On Tue, Dec 06, 2016 at 01:53:52PM +0000, Steven Hartland wrote: > > >> On 06/12/2016 12:59, Konstantin Belousov wrote: > > >>> On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote: > > >>>> Hi guys I'm trying to help identify / fix an issue with golang where by > > >>>> fork results in memory corruption. > > >>>> > > >>>> Details of the issue can be found here: > > >>>> https://github.com/golang/go/issues/15658 > > >>>> > > >>>> In summary when a fork is done in golang is has a chance of causing > > >>>> memory corruption in the parent resulting in a process crash once detected. > > >>>> > > >>>> Its believed that this only effects FreeBSD. > > >>>> > > >>>> This has similarities to other reported issues such as this one which > > >>>> impacted perl during 10.x: > > >>>> https://rt.perl.org/Public/Bug/Display.html?id=122199 > > >>> I cannot judge about any similarilities when all the description provided > > >>> is 'memory corruption'. BTW, the perl issue described, where child segfaults > > >>> after the fork, is more likely to be caused by the set of problems referenced > > >>> in the FreeBSD-EN-16:17.vm. > > >>> > > >>>> And more recently the issue with nginx on 11.x: > > >>>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html > > >>> Which does not affect anything unless aio is used on Sandy/Ivy. > > >>> > > >>>> Its possible, some believe likely, that this is a kernel bug around fork > > >>>> / vm that golang stresses, but I've not been able to confirm. > > >>>> > > >>>> I can reproduce the issue at will, takes between 5mins and 1hour using > > >>>> 16 threads, and it definitely seems like an interaction between fork and > > >>>> other memory operations. > > >>> Which arch is the kernel and the process which demonstrates the behaviour ? > > >>> I mean i386/amd64. > > >> amd64 > > > How large is the machine, how many cores, what is the physical memory size ? > I was able to reproduce that as well, reliably, on two desktop-size > machines. One is SandyBridge, same core microarchitecture as your > crashbox, another is Haswell. I see the error both with PCID enabled > and disabled on both machines (Haswell does implement INVPCID, so the > original aio/PCID bug did never affected this microarchitecture). > > I believe this clears the PCID changes from the accusations. > > > > > > >>>> I've tried reproducing the issue in C but also no joy (captured in the bug). > > >>>> > > >>>> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix > > >>>> (#306350). > > >>> Switch to HEAD kernel, for start. > > >>> Show the memory map of the failed process. > > No sign of zeroed memory that I can tell. > > > > This error was caused by hitting the following validation in gc: > > func (list *mSpanList) remove(span *mspan) { > > if span.prev == nil || span.list != list { > > println("runtime: failed MSpanList_Remove", span, > > span.prev, span.list, list) > > throw("MSpanList_Remove") > > } > > > > runtime: failed MSpanList_Remove 0x80052e580 0x80052e300 0x53e9c0 0x53e9b0 > > fatal error: MSpanList_Remove > > > > (gdb) print list > > $4 = (runtime.mSpanList *) 0x53e9b0 > > (gdb) print span.list > > $5 = (runtime.mSpanList *) 0x53e9c0 > The difference, which triggered the exception, is quite curious: > list is 0x53e9b0, and span.list == list + 0x10. More, this is not > a single-bit error: bit patter is 1011 for 0xb and 1100 for 0xc. > > It is highly unlikely that the cause is a memory corruption due to > OS mis-managing pages or TLB. Typically, you get either page or cache > line of complete garbage, instead of the almost identical but slightly > modified data. > > > (gdb) print span.prev > > $6 = (struct runtime.mspan **) 0x80052e300 > > (gdb) print *list > > $7 = {first = 0x80052e580, last = 0x8008aa180} > > (gdb) print *span.list > > $8 = {first = 0x8007ea7e0, last = 0x80052e580} > > > > procstat -v test.core.1481054183 > > PID START END PRT RES PRES REF SHD FLAG > > TP PATH > > 1178 0x400000 0x49b000 r-x 115 223 3 1 CN-- vn > > /root/test > > 1178 0x49b000 0x528000 r-- 97 223 3 1 CN-- vn > > /root/test > > 1178 0x528000 0x539000 rw- 10 0 1 0 C--- vn > > /root/test > > 1178 0x539000 0x55a000 rw- 16 16 1 0 C--- df > > 1178 0x800528000 0x800a28000 rw- 118 118 1 0 C--- df > > 1178 0x800a28000 0x800a68000 rw- 1 1 1 0 CN-- df > > 1178 0x800a68000 0x800aa8000 rw- 2 2 1 0 CN-- df > > 1178 0x800aa8000 0x800c08000 rw- 50 50 1 0 CN-- df > > 1178 0x800c08000 0x800c48000 rw- 2 2 1 0 CN-- df > > 1178 0x800c48000 0x800c88000 rw- 1 1 1 0 CN-- df > > 1178 0x800c88000 0x800cc8000 rw- 1 1 1 0 CN-- df > > 1178 0xc000000000 0xc000001000 rw- 1 1 1 0 CN-- df > > 1178 0xc41ffe0000 0xc41ffe8000 rw- 8 8 1 0 CN-- df > > 1178 0xc41ffe8000 0xc41fff0000 rw- 8 8 1 0 CN-- df > > 1178 0xc41fff0000 0xc41fff8000 rw- 8 8 1 0 C--- df > > 1178 0xc41fff8000 0xc420300000 rw- 553 553 1 0 C--- df > > 1178 0xc420300000 0xc420400000 rw- 234 234 1 0 C--- df > > 1178 0x7ffffffdf000 0x7ffffffff000 rwx 2 2 1 0 C--D df > > 1178 0x7ffffffff000 0x800000000000 r-x 1 1 33 0 ---- ph > > > > This is from FreeBSD 12.0-CURRENT #36 r309618M > > > > ktrace on 11.0-RELEASE is still running 6 hours so far. One thing that I noted. In my later attempt to reproduce the issue, I got the following output: sandy% GOGC=2 ./1.go /mnt/1 Starting 16 forking goroutines... GOMAXPROCS: 8 runtime: failed MSpanList_Remove 0x8006a9d60 0x53f2e0 0x53f2f0 0x53f2e0 fatal error: MSpanList_Remove runtime stack: runtime.throw(0x4cca4d, 0x10) /usr/local/go/src/runtime/panic.go:566 +0x95 runtime.(*mSpanList).remove(0x53f2e0, 0x8006a9d60) /usr/local/go/src/runtime/mheap.go:1001 +0x19d runtime.(*mcentral).cacheSpan(0x53f2d0, 0x44abfb) /usr/local/go/src/runtime/mcentral.go:55 +0x3d0 runtime.(*mcache).refill(0x80052a0d0, 0xc400000016, 0xc420200d38) /usr/local/go/src/runtime/mcache.go:121 +0xae runtime.(*mcache).nextFree.func1() /usr/local/go/src/runtime/malloc.go:505 +0x33 runtime.systemstack(0xc420018000) /usr/local/go/src/runtime/asm_amd64.s:298 +0x79 runtime.mstart() /usr/local/go/src/runtime/proc.go:1079 goroutine 8797810 [running]: runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:252 fp=0xc420200be0 sp=0xc420200bd8 runtime.(*mcache).nextFree(0x80052a0d0, 0xc420200c16, 0x40ec95, 0xc42011c050, 0x10) /usr/local/go/src/runtime/malloc.go:506 +0xb2 fp=0xc420200c38 sp=0xc420200be0 runtime.mallocgc(0x1a0, 0x4aaf60, 0xc420200d01, 0x44ab40) /usr/local/go/src/runtime/malloc.go:658 +0x809 fp=0xc420200cd8 sp=0xc420200c38 runtime.makeslice(0x4aaf60, 0x0, 0x19, 0xc4200b8688, 0x8, 0x8) /usr/local/go/src/runtime/slice.go:57 +0x7b fp=0xc420200d30 sp=0xc420200cd8 syscall.Environ(0x0, 0x0, 0x0) /usr/local/go/src/syscall/env_unix.go:142 +0xd0 fp=0xc420200dc0 sp=0xc420200d30 os.Environ(0x0, 0x0, 0x0) /usr/local/go/src/os/env.go:116 +0x22 fp=0xc420200de8 sp=0xc420200dc0 os/exec.(*Cmd).envv(0xc4201422c0, 0xc4200b8680, 0x0, 0x1) /usr/local/go/src/os/exec/exec.go:171 +0x38 fp=0xc420200e10 sp=0xc420200de8 os/exec.(*Cmd).Start(0xc4201422c0, 0x6, 0x0) There are two traces for goroutines, and note that both first and second are in malloc. Is go malloc fine-grain locked ? E.g. I know that JVM uses per-thread arenas. In fact, are there stress-tests for the go mutual exclusion primitives ? The runtime seems to try to use thr and umtx syscalls directly, which could be the source of bugs. From owner-freebsd-hackers@freebsd.org Wed Dec 7 17:51:34 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 29C3FC6BC92 for ; Wed, 7 Dec 2016 17:51:34 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: from mail-qk0-x243.google.com (mail-qk0-x243.google.com [IPv6:2607:f8b0:400d:c09::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DCEA0777 for ; Wed, 7 Dec 2016 17:51:33 +0000 (UTC) (envelope-from kmacybsd@gmail.com) Received: by mail-qk0-x243.google.com with SMTP id h201so49284802qke.3 for ; Wed, 07 Dec 2016 09:51:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=+b2j19NnanRCKA0B4sztUmi8DwWE20FR4c8gVOzgpFE=; b=vWhoqlDXrTCWuGLxlNA61d1Tjt+WBXZyh95MJ0TPNKtX+w68v7tg1wmUlDWdBNvHCY de9TspSsmZ7mpvYLwPmePdvJCO8BdW+5ei4Ly7GUtl7+XB/G1Bo+8KJFel+FSQtjqPSB xj0fiN7Rom9I36ybSxmLyeAbLZ/ScLi/IdHCrmEqQI6UXmuOmqROR1iCg+APQ2LrkQR7 Z/RPNJ2EOieoM0agoEI4CfqfZrT0y3zvJgTU63uvVcwy25rpwCH2IIzUjiJf8zDR7QVp kpiIcee+u9NcVTH4vcJrhbnzRd6BX0q4ea6sy2kefvK01tryui8zeZC6Jzp91KVJWtxk SuVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=+b2j19NnanRCKA0B4sztUmi8DwWE20FR4c8gVOzgpFE=; b=l1y03X6VPxvewB2tc7nNtGkXmYZFgO22cl3POgGjmn7epANr2CGpU75qVQneC9NlJP S2BYZMosq7fXYm+89sJ20dr5TJ7+AWLi1wg+LnNhEFcY4uHxaA9bW3NoukVXhY9GyNG7 M+fUiOG6zvWF+sqAnRE/+jo+ufbvcvXcDDkXr/OWmzv+Lb4u74NhEJ4iWXNBJNNWGDE9 c0MdpUI6ntg8z3RIXlyJZLOlpKHu07eFKHH3a5UTCUKGGJc24qV6IxSGyGB9nqlSE89R sX05gaapUyaP6a928RX2tv/GpFjt19OIZ4f5TrYAY/PpuYfcyBBgcxvgEU+oLe+bGqAA +EbQ== X-Gm-Message-State: AKaTC03R86l/U0wk/TFPqmUldSUE4+9D9WWNREC7CXEJ3WTp9yh/rsbWhyMx0dlxBniZcv9VI6Nsq6pmhrZFjQ== X-Received: by 10.55.161.21 with SMTP id k21mr61786676qke.149.1481133093085; Wed, 07 Dec 2016 09:51:33 -0800 (PST) MIME-Version: 1.0 Sender: kmacybsd@gmail.com Received: by 10.140.91.33 with HTTP; Wed, 7 Dec 2016 09:51:32 -0800 (PST) In-Reply-To: References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua> From: "K. Macy" Date: Wed, 7 Dec 2016 09:51:32 -0800 X-Google-Sender-Auth: xerPnhXOpbejXWW3-PcRJjLX4xw Message-ID: Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD To: Steven Hartland Cc: Konstantin Belousov , "freebsd-hackers@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Dec 2016 17:51:34 -0000 > No it doesn't, each built binary its totally standalone and uses asm for > core system calls. > > The runtime directly creates kernel threads with thr_new, which it then > manages internally. The thr and umtx syscalls are, in practice, an SPI (service private interface). They're not well documented, and using them directly is going to be brittle across releases. In general, using system calls directly is essentially static linking which is even out of vogue on Linux and not even permitted on OSX, Windows, or Solaris. You're creating a program that is only guaranteed to work for the duration of current ABI guarantees. I understand that this is one of the things that people *like* about go, but I just wanted to observe that YMWV. -M From owner-freebsd-hackers@freebsd.org Wed Dec 7 23:49:21 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9D74DC6CF9F for ; Wed, 7 Dec 2016 23:49:21 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x22f.google.com (mail-wm0-x22f.google.com [IPv6:2a00:1450:400c:c09::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3B2A31803 for ; Wed, 7 Dec 2016 23:49:21 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x22f.google.com with SMTP id g23so192873797wme.1 for ; Wed, 07 Dec 2016 15:49:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=4VCIA2jzF8gpDDLde1aC1VdOK615a+j2ECIiCJtyFIE=; b=M+Ioo7/L40AHBjaoJpjlHZYL5sCHAuaYrk4wqd885pXLdbBVMFXSvrEm/Kvo2EpFRG cWX34XoS2Y4bBpO0zqZU2KuvPG39VcLiqLURHMXcBsfa34HyaSlktG/IN5ymUiv+r2Th woKI3LWwWPDeRXUXlOxfsVm/Br+0180b6FF8OeMSSsRyveLfx84kfXmVpeoO0/p0+Mvx FizRiJpPy9YuXKMGqgeCBHaAYrDPHShF4l8hNMDaYom/z4DkcHgPD7aTHrQGNt+FrXm3 C1WLUupAX4/QdqE0sqvlCYDui0MprhYSXBnI9Ib96h21+dcYHKf9b7Vg41VNyu2eIrAe 8GhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=4VCIA2jzF8gpDDLde1aC1VdOK615a+j2ECIiCJtyFIE=; b=mj6xZRyzihTRA8v3P7BjBcepH1dOUT896wUs0EAX9s81bcVv4TV6ldh9DREZpfOLDH cKSszog9EjaOG9TtqQhABuZKdPFbkpJ3Vd7eyis8kyvcCF5OuCifsukBD9Y850iRf0jF hpEn4sW0VvF2HMXCjXjXFbLQxwPuc4OdVac5vnQe+sJPuaoWTdHKt9GpSl4eVhEO94kJ 0RBVCe2I+gHJXIwSUugQ87V6VI9AJJirqAPObF2tji1Ko8W5TIUI2rC6gMQsx6GL5Kbk W542wnxmrq+rIEWHVyGGYLUrNqUNw+NEkTr6+hpKptsto7Vtw14jdLUxgdg116wveJ95 rm3g== X-Gm-Message-State: AKaTC01qLNKPLEX/yfu3JO2BcDV3ZKmEYCGdU2Wico6DCBfGyQmvNlIL0nKB4YluXeejAgbb X-Received: by 10.28.173.131 with SMTP id w125mr4748826wme.0.1481154559114; Wed, 07 Dec 2016 15:49:19 -0800 (PST) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id x140sm12156359wme.19.2016.12.07.15.49.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Dec 2016 15:49:18 -0800 (PST) Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD To: Konstantin Belousov References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua> <9b40c93a-871f-bb32-668c-39bc3e31e385@multiplay.co.uk> <20161207121449.GV54029@kib.kiev.ua> <20161207143051.GX54029@kib.kiev.ua> Cc: "freebsd-hackers@freebsd.org" From: Steven Hartland Message-ID: <37d0e944-5104-3db0-2884-be6fa80bc95d@multiplay.co.uk> Date: Wed, 7 Dec 2016 23:49:50 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161207143051.GX54029@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Dec 2016 23:49:21 -0000 On 07/12/2016 14:30, Konstantin Belousov wrote: > On Wed, Dec 07, 2016 at 02:14:49PM +0200, Konstantin Belousov wrote: >> On Tue, Dec 06, 2016 at 08:35:04PM +0000, Steven Hartland wrote: >>> On 06/12/2016 14:35, Konstantin Belousov wrote: >>>> On Tue, Dec 06, 2016 at 01:53:52PM +0000, Steven Hartland wrote: >>>>> On 06/12/2016 12:59, Konstantin Belousov wrote: >>>>>> On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote: >>>>>>> Hi guys I'm trying to help identify / fix an issue with golang where by >>>>>>> fork results in memory corruption. >>>>>>> >>>>>>> Details of the issue can be found here: >>>>>>> https://github.com/golang/go/issues/15658 >>>>>>> >>>>>>> In summary when a fork is done in golang is has a chance of causing >>>>>>> memory corruption in the parent resulting in a process crash once detected. >>>>>>> >>>>>>> Its believed that this only effects FreeBSD. >>>>>>> >>>>>>> This has similarities to other reported issues such as this one which >>>>>>> impacted perl during 10.x: >>>>>>> https://rt.perl.org/Public/Bug/Display.html?id=122199 >>>>>> I cannot judge about any similarilities when all the description provided >>>>>> is 'memory corruption'. BTW, the perl issue described, where child segfaults >>>>>> after the fork, is more likely to be caused by the set of problems referenced >>>>>> in the FreeBSD-EN-16:17.vm. >>>>>> >>>>>>> And more recently the issue with nginx on 11.x: >>>>>>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html >>>>>> Which does not affect anything unless aio is used on Sandy/Ivy. >>>>>> >>>>>>> Its possible, some believe likely, that this is a kernel bug around fork >>>>>>> / vm that golang stresses, but I've not been able to confirm. >>>>>>> >>>>>>> I can reproduce the issue at will, takes between 5mins and 1hour using >>>>>>> 16 threads, and it definitely seems like an interaction between fork and >>>>>>> other memory operations. >>>>>> Which arch is the kernel and the process which demonstrates the behaviour ? >>>>>> I mean i386/amd64. >>>>> amd64 >>>> How large is the machine, how many cores, what is the physical memory size ? >> I was able to reproduce that as well, reliably, on two desktop-size >> machines. One is SandyBridge, same core microarchitecture as your >> crashbox, another is Haswell. I see the error both with PCID enabled >> and disabled on both machines (Haswell does implement INVPCID, so the >> original aio/PCID bug did never affected this microarchitecture). >> >> I believe this clears the PCID changes from the accusations. >> >>>>>>> I've tried reproducing the issue in C but also no joy (captured in the bug). >>>>>>> >>>>>>> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix >>>>>>> (#306350). >>>>>> Switch to HEAD kernel, for start. >>>>>> Show the memory map of the failed process. >>> No sign of zeroed memory that I can tell. >>> >>> This error was caused by hitting the following validation in gc: >>> func (list *mSpanList) remove(span *mspan) { >>> if span.prev == nil || span.list != list { >>> println("runtime: failed MSpanList_Remove", span, >>> span.prev, span.list, list) >>> throw("MSpanList_Remove") >>> } >>> >>> runtime: failed MSpanList_Remove 0x80052e580 0x80052e300 0x53e9c0 0x53e9b0 >>> fatal error: MSpanList_Remove >>> >>> (gdb) print list >>> $4 = (runtime.mSpanList *) 0x53e9b0 >>> (gdb) print span.list >>> $5 = (runtime.mSpanList *) 0x53e9c0 >> The difference, which triggered the exception, is quite curious: >> list is 0x53e9b0, and span.list == list + 0x10. More, this is not >> a single-bit error: bit patter is 1011 for 0xb and 1100 for 0xc. >> >> It is highly unlikely that the cause is a memory corruption due to >> OS mis-managing pages or TLB. Typically, you get either page or cache >> line of complete garbage, instead of the almost identical but slightly >> modified data. >> >>> (gdb) print span.prev >>> $6 = (struct runtime.mspan **) 0x80052e300 >>> (gdb) print *list >>> $7 = {first = 0x80052e580, last = 0x8008aa180} >>> (gdb) print *span.list >>> $8 = {first = 0x8007ea7e0, last = 0x80052e580} >>> >>> procstat -v test.core.1481054183 >>> PID START END PRT RES PRES REF SHD FLAG >>> TP PATH >>> 1178 0x400000 0x49b000 r-x 115 223 3 1 CN-- vn >>> /root/test >>> 1178 0x49b000 0x528000 r-- 97 223 3 1 CN-- vn >>> /root/test >>> 1178 0x528000 0x539000 rw- 10 0 1 0 C--- vn >>> /root/test >>> 1178 0x539000 0x55a000 rw- 16 16 1 0 C--- df >>> 1178 0x800528000 0x800a28000 rw- 118 118 1 0 C--- df >>> 1178 0x800a28000 0x800a68000 rw- 1 1 1 0 CN-- df >>> 1178 0x800a68000 0x800aa8000 rw- 2 2 1 0 CN-- df >>> 1178 0x800aa8000 0x800c08000 rw- 50 50 1 0 CN-- df >>> 1178 0x800c08000 0x800c48000 rw- 2 2 1 0 CN-- df >>> 1178 0x800c48000 0x800c88000 rw- 1 1 1 0 CN-- df >>> 1178 0x800c88000 0x800cc8000 rw- 1 1 1 0 CN-- df >>> 1178 0xc000000000 0xc000001000 rw- 1 1 1 0 CN-- df >>> 1178 0xc41ffe0000 0xc41ffe8000 rw- 8 8 1 0 CN-- df >>> 1178 0xc41ffe8000 0xc41fff0000 rw- 8 8 1 0 CN-- df >>> 1178 0xc41fff0000 0xc41fff8000 rw- 8 8 1 0 C--- df >>> 1178 0xc41fff8000 0xc420300000 rw- 553 553 1 0 C--- df >>> 1178 0xc420300000 0xc420400000 rw- 234 234 1 0 C--- df >>> 1178 0x7ffffffdf000 0x7ffffffff000 rwx 2 2 1 0 C--D df >>> 1178 0x7ffffffff000 0x800000000000 r-x 1 1 33 0 ---- ph >>> >>> This is from FreeBSD 12.0-CURRENT #36 r309618M >>> >>> ktrace on 11.0-RELEASE is still running 6 hours so far. > One thing that I noted. In my later attempt to reproduce the issue, I > got the following output: > > sandy% GOGC=2 ./1.go /mnt/1 > Starting 16 forking goroutines... > GOMAXPROCS: 8 > runtime: failed MSpanList_Remove 0x8006a9d60 0x53f2e0 0x53f2f0 0x53f2e0 > fatal error: MSpanList_Remove > > runtime stack: > runtime.throw(0x4cca4d, 0x10) > /usr/local/go/src/runtime/panic.go:566 +0x95 > runtime.(*mSpanList).remove(0x53f2e0, 0x8006a9d60) > /usr/local/go/src/runtime/mheap.go:1001 +0x19d > runtime.(*mcentral).cacheSpan(0x53f2d0, 0x44abfb) > /usr/local/go/src/runtime/mcentral.go:55 +0x3d0 > runtime.(*mcache).refill(0x80052a0d0, 0xc400000016, 0xc420200d38) > /usr/local/go/src/runtime/mcache.go:121 +0xae > runtime.(*mcache).nextFree.func1() > /usr/local/go/src/runtime/malloc.go:505 +0x33 > runtime.systemstack(0xc420018000) > /usr/local/go/src/runtime/asm_amd64.s:298 +0x79 > runtime.mstart() > /usr/local/go/src/runtime/proc.go:1079 > > goroutine 8797810 [running]: > runtime.systemstack_switch() > /usr/local/go/src/runtime/asm_amd64.s:252 fp=0xc420200be0 sp=0xc420200bd8 > runtime.(*mcache).nextFree(0x80052a0d0, 0xc420200c16, 0x40ec95, 0xc42011c050, 0x10) > /usr/local/go/src/runtime/malloc.go:506 +0xb2 fp=0xc420200c38 sp=0xc420200be0 > runtime.mallocgc(0x1a0, 0x4aaf60, 0xc420200d01, 0x44ab40) > /usr/local/go/src/runtime/malloc.go:658 +0x809 fp=0xc420200cd8 sp=0xc420200c38 > runtime.makeslice(0x4aaf60, 0x0, 0x19, 0xc4200b8688, 0x8, 0x8) > /usr/local/go/src/runtime/slice.go:57 +0x7b fp=0xc420200d30 sp=0xc420200cd8 > syscall.Environ(0x0, 0x0, 0x0) > /usr/local/go/src/syscall/env_unix.go:142 +0xd0 fp=0xc420200dc0 sp=0xc420200d30 > os.Environ(0x0, 0x0, 0x0) > /usr/local/go/src/os/env.go:116 +0x22 fp=0xc420200de8 sp=0xc420200dc0 > os/exec.(*Cmd).envv(0xc4201422c0, 0xc4200b8680, 0x0, 0x1) > /usr/local/go/src/os/exec/exec.go:171 +0x38 fp=0xc420200e10 sp=0xc420200de8 > os/exec.(*Cmd).Start(0xc4201422c0, 0x6, 0x0) > > There are two traces for goroutines, and note that both first and second > are in malloc. Is go malloc fine-grain locked ? E.g. I know that JVM > uses per-thread arenas. I'm not particularly familiar with the golang memory allocator, as I only started digging into this a few days ago, but malloc is pool based. One article which specifies some details about the internals of the allocator I just found is: https://blog.altoros.com/golang-internals-part-6-bootstrapping-and-memory-allocator-initialization.html > In fact, are there stress-tests for the go mutual exclusion primitives ? go has a full test suite built in which is run when its built by default, components of it also include benchmarks. You can run tests for the current directory with verbose output using: go test -v or for all directories below the current one with: go test -v ./... to also run all benchmarks add: -bench=. for more information about test which has loads of nice options including profiling: go test -h > The runtime seems to try to use thr and umtx syscalls directly, which > could be the source of bugs. Yes it could well be. Looking through the runtime code I spotted it uses sys sigprocmask directly in src/runtime/os_freebsd.go may also have potential to cause problems given libthr does additional work on top the syscall and that sigprocmask(8) explicitly warns against the use of sigprocmask for thread apps: https://github.com/golang/go/blob/master/src/runtime/os_freebsd.go#L135 I've now got a failure with ktrace running took over 24hours so its quite large so I've uploaded it along side the log and exact binary build to: https://blog.multiplay.co.uk/dropzone/freebsd/golang-panic.tar.gz The crash was the same as before: runtime: failed MSpanList_Remove 0x80052d860 0x8007abbc0 0x53ea30 0x53ea20 fatal error: MSpanList_Remove Given the offset is the same again this doesn't really feel like random corruption. Regards Steve From owner-freebsd-hackers@freebsd.org Thu Dec 8 20:24:01 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1EB32C6DF3A for ; Thu, 8 Dec 2016 20:24:01 +0000 (UTC) (envelope-from j.deboynepollard-newsgroups@ntlworld.com) Received: from know-smtprelay-omc-6.server.virginmedia.net (know-smtprelay-omc-6.server.virginmedia.net [80.0.253.70]) by mx1.freebsd.org (Postfix) with ESMTP id 907BF10EA for ; Thu, 8 Dec 2016 20:23:59 +0000 (UTC) (envelope-from j.deboynepollard-newsgroups@ntlworld.com) Received: from [192.168.1.100] ([86.10.211.13]) by know-smtprelay-6-imp with bizsmtp id HYPt1u0040HtmFq01YPt9j; Thu, 08 Dec 2016 20:23:53 +0000 X-Originating-IP: [86.10.211.13] X-Spam: 0 X-Authority: v=2.1 cv=H94muLsi c=1 sm=1 tr=0 a=SB7hr1IvJSWWr45F2gQiKw==:117 a=SB7hr1IvJSWWr45F2gQiKw==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=N659UExz7-8A:10 a=7nziCu_AKo5A3TZolGUA:9 a=pILNOxqGKmIA:10 Subject: NOTE_TRACK, EVFILT_PROC, kqueue, and subreapers To: FreeBSD Hackers , supervision@list.skarnet.org References: <20161102185444.GA911@protected.rcdrun.com> <20161201081829.GG1487@protected.rcdrun.com> <20161201120531.374588b2@mydesk.domain.cxm> <20161201172846.GP3428@protected.rcdrun.com> <20161201124118.46778e2b@mydesk.domain.cxm> <20161201174837.GR3428@protected.rcdrun.com> <20161201125438.15230317@mydesk.domain.cxm> <20161206104020.6b2ebb30@eto-mona.office.smartweb.sk> <20161206102637.1ddd152a@mydesk.domain.cxm> <20161207155638.4b2dd629@eto-mona.office.smartweb.sk> <630ace89-e29b-d0d3-9f15-110d8dc3de08@NTLWorld.com> <20161208132842.5d7940bd@eto-mona.office.smartweb.sk> From: Jonathan de Boyne Pollard Message-ID: Date: Thu, 8 Dec 2016 20:23:29 +0000 User-Agent: Mozilla/5.0 (Windows NT 6.0; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20161208132842.5d7940bd@eto-mona.office.smartweb.sk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Dec 2016 20:24:01 -0000 Martin "eto" Misuth: > I think that might be the reason why my PID1 s6-svscan on FreeBSD is > accumulating zombies sometimes (seems like it is affected by dead > descendants of ssh and my experiments). [...] > Anyway as you are probably much closer to FreeBSD team than I am, [...] I'm not. You have the same access as I and the rest of the world have. For what it's worth, I've seen similar behaviour with zombies lying around. If we can nail it down you can file a kernel bug report. Have you checked that you aren't getting a NOTE_EXIT? From owner-freebsd-hackers@freebsd.org Fri Dec 9 21:10:37 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0FEE0C6EEB7; Fri, 9 Dec 2016 21:10:37 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id EC9F77EC; Fri, 9 Dec 2016 21:10:36 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id uB9L9ell001159 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 9 Dec 2016 13:09:40 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id uB9L9eKK001158; Fri, 9 Dec 2016 13:09:40 -0800 (PST) (envelope-from sgk) Date: Fri, 9 Dec 2016 13:09:40 -0800 From: Steve Kargl To: freebsd-current@freebsd.org, freebsd-hackers@freebsd.org Subject: System hangs at boot in xhci0 Message-ID: <20161209210940.GA1144@troutmask.apl.washington.edu> Reply-To: kargl@uw.edu MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.6.1 (2016-04-27) X-Mailman-Approved-At: Fri, 09 Dec 2016 22:36:02 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Dec 2016 21:10:37 -0000 I updated my system to % svn info /usr/src Path: /usr/src Working Copy Root Path: /usr/src URL: svn://svn.freebsd.org/base/head Relative URL: ^/head Repository Root: svn://svn.freebsd.org/base Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f Revision: 309748 Built a shiny new kernel, which hangs during boot. There is no panic. Using the dmesg from kernel.old/kernel, the last few reported are pci2: on pcib2 xhci0: mem 0xfe900000-0xfe900fff irq 48 at device 0.0 on pci2 xhci0: 32 bytes context size, 64-bit DMA At this point, the system is completely unresponse and needs to be power cycled. -- steve From owner-freebsd-hackers@freebsd.org Fri Dec 9 22:58:49 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E099AC6F3E9; Fri, 9 Dec 2016 22:58:49 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (mail.turbocat.net [IPv6:2a01:4f8:d16:4514::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ADD8B1765; Fri, 9 Dec 2016 22:58:49 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [62.141.129.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 08E031FE1FC; Fri, 9 Dec 2016 23:58:46 +0100 (CET) Subject: Re: System hangs at boot in xhci0 To: kargl@uw.edu, freebsd-current@freebsd.org, freebsd-hackers@freebsd.org References: <20161209210940.GA1144@troutmask.apl.washington.edu> From: Hans Petter Selasky Message-ID: <5a509678-2a67-b538-ea42-d9e80a6bd3eb@selasky.org> Date: Fri, 9 Dec 2016 23:58:21 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161209210940.GA1144@troutmask.apl.washington.edu> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Dec 2016 22:58:50 -0000 On 12/09/16 22:09, Steve Kargl wrote: > I updated my system to > > % svn info /usr/src > Path: /usr/src > Working Copy Root Path: /usr/src > URL: svn://svn.freebsd.org/base/head > Relative URL: ^/head > Repository Root: svn://svn.freebsd.org/base > Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f > Revision: 309748 > > Built a shiny new kernel, which hangs during boot. > There is no panic. Using the dmesg from kernel.old/kernel, > the last few reported are > > > pci2: on pcib2 > xhci0: mem 0xfe900000-0xfe900fff irq 48 at device 0.0 on pci2 > xhci0: 32 bytes context size, 64-bit DMA > > At this point, the system is completely unresponse and > needs to be power cycled. > Hi, What is the next message in the old kernel which is printed? There has been zero changes in the XHCI driver recently. Can you copy /boot/kernel.old to /boot/kernel.works Then add this option to the GENERIC kernel config: options VERBOSE_SYSINIT What are the last few messages in dmesg when you boot with the above flag? --HPS From owner-freebsd-hackers@freebsd.org Fri Dec 9 23:45:31 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CFC6CC6FF30; Fri, 9 Dec 2016 23:45:31 +0000 (UTC) (envelope-from kargl@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id AA8A31644; Fri, 9 Dec 2016 23:45:31 +0000 (UTC) (envelope-from kargl@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id uB9NjUCH034476 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 9 Dec 2016 15:45:30 -0800 (PST) (envelope-from kargl@troutmask.apl.washington.edu) Received: (from kargl@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id uB9NjU9M034475; Fri, 9 Dec 2016 15:45:30 -0800 (PST) (envelope-from kargl) Date: Fri, 9 Dec 2016 15:45:30 -0800 From: "Steven G. Kargl" To: Hans Petter Selasky Cc: kargl@uw.edu, freebsd-current@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: System hangs at boot in xhci0 Message-ID: <20161209234530.GA21666@troutmask.apl.washington.edu> Reply-To: kargl@uw.edu References: <20161209210940.GA1144@troutmask.apl.washington.edu> <5a509678-2a67-b538-ea42-d9e80a6bd3eb@selasky.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5a509678-2a67-b538-ea42-d9e80a6bd3eb@selasky.org> User-Agent: Mutt/1.6.1 (2016-04-27) X-Mailman-Approved-At: Fri, 09 Dec 2016 23:48:01 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Dec 2016 23:45:31 -0000 On Fri, Dec 09, 2016 at 11:58:21PM +0100, Hans Petter Selasky wrote: > On 12/09/16 22:09, Steve Kargl wrote: > > I updated my system to > > > > % svn info /usr/src > > Revision: 309748 > > > > Built a shiny new kernel, which hangs during boot. > > There is no panic. Using the dmesg from kernel.old/kernel, > > the last few reported are > > > > > > pci2: on pcib2 > > xhci0: mem 0xfe900000-0xfe900fff > > irq 48 at device 0.0 on pci2 > > xhci0: 32 bytes context size, 64-bit DMA > > > > At this point, the system is completely unresponse and > > needs to be power cycled. > > > > Hi, > > What is the next message in the old kernel which is printed? There has > been zero changes in the XHCI driver recently. > > Can you copy /boot/kernel.old to /boot/kernel.works > > Then add this option to the GENERIC kernel config: > > options VERBOSE_SYSINIT > > What are the last few messages in dmesg when you boot with the above flag? > With a boot_verbose of the new kernel I get the following output: xhci0: 32 bytes context size, 64-bit DMA xhci0: attempting to allocate 1 MSI vectors (4 supported) msi: routing MSI IRQ 260 to local APIC 16 vector 55 xhci0: using IRQ 260 for MSI xhci0: MSI enabled usbus0 on xhci0 xhci0: usbpf: Attached random: harvesting attach, 8 bytes (4 bits) from usbus0 random: harvesting attach, 8 bytes (4 bits) from xhci0 random: harvesting attach, 8 bytes (4 bits) from pci2 random: harvesting attach, 8 bytes (4 bits) from pcib2 and then the system locks up. With the old kernel (circa Oct 10th sources), next few lines from dmesg are pcib3: irq 54 at device 10.0 on pci0 pcib0: allocated type 4 (0xd000-0xdfff) for rid 1c of pcib3 pcib0: allocated type 3 (0xfe800000-0xfe8fffff) for rid 20 of pcib3 pcib3: domain 0 pcib3: secondary bus 3 pcib3: subordinate bus 3 pcib3: I/O decode 0xd000-0xdfff pcib3: memory decode 0xfe800000-0xfe8fffff pci3: on pcib3 I think that hang isn't caused by xhci, but rather is a victim on being the last successfully probed device. In the last weeks there have been a few commits (309588, 309400, and 308953) that touched ACPI. I'm currently reverting these changes to test if one is causing the problem. I did see that one of these revisions specific mentions the ALASKA AMI bios, which I happen to have. However, that commit also mentions a skylake processor while I have an AMD FX-8350. -- Steve http://troutmask.apl.washington.edu/~kargl/ https://www.youtube.com/watch?v=6hwgPfCcpyQ