From owner-freebsd-hackers@freebsd.org Fri Oct 27 11:37:44 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 362E0E417BD for ; Fri, 27 Oct 2017 11:37:44 +0000 (UTC) (envelope-from nkoch@demig.de) Received: from exch.demig.de (exch.demig.de [87.128.30.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ED405162B for ; Fri, 27 Oct 2017 11:37:43 +0000 (UTC) (envelope-from nkoch@demig.de) Received: from [192.168.148.248] (port=19700 helo=SRV-FS-2.Demig.intra) by exch.demig.de with esmtps (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1e82xE-000221-15; Fri, 27 Oct 2017 13:37:36 +0200 Received: from [192.168.148.215] (192.168.148.215) by SRV-FS-2 (192.168.148.248) with Microsoft SMTP Server (TLS) id 14.3.361.1; Fri, 27 Oct 2017 13:37:30 +0200 X-CTCH-RefID: str=0001.0A0B0204.59F31A80.00AC, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 Subject: Re: crerating coredump of multithreaded process To: References: <20171027093311.GF2566@kib.kiev.ua> CC: From: Norbert Koch Message-ID: <95ad25da-dc53-1c6a-030b-71cf9021a75b@demig.de> Date: Fri, 27 Oct 2017 13:37:29 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20171027093311.GF2566@kib.kiev.ua> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable X-C2ProcessedOrg: e1e98c77-ec17-4cb1-9b24-fe57656077ed X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Oct 2017 11:37:44 -0000 Ok, thank you for your explanation. ***************************************** * demig Prozessautomatisierung GmbH * * * * Anschrift: Haardtstrasse 40 * * D-57076 Siegen * * Registergericht: Siegen HRB 2819 * * Geschaeftsfuehrer: Joachim Herbst, * * Winfried Held * * Telefon: +49 271 772020 * * Telefax: +49 271 74704 * * E-Mail: info@demig.de * * http://www.demig.de * ***************************************** Am 2017-10-27 um 11:33 schrieb Konstantin Belousov: > On Fri, Oct 27, 2017 at 10:44:41AM +0200, Norbert Koch wrote: >> Hello. >> >> When trying to create the coredump of a running >> process (without killing it) under FreeBSD 10.3 >> I am seeing a somewhat strange behaviour. > Try this on HEAD or stable/11. There were a lot of changes and bugfixes > in ptrace(2). > > I do not claim that the behaviour you see has changed, but 10.3 is too > diverged from the code where developers would be willing to look at. >> As I want to see the state of all threads, the q&d way >> of fork() + SIGABRT does not work for me. >> >> So, what I do is having a supervisor program waiting for SIGUSR1. >> When my application signals the wish to be coredumped >> it sends SIGSTOP to itself immediately after sending SIGUSR1. >> The supervisor then forks gcore. >> >> From what I can see using top, my application immediately starts >> again as if SIGCONT has been received while gcore hangs in wait. > SIGCONT cannot be blocked, otherwise programs could create unkillable > processes. > >> Gcore calls ptrace(PT_ATTACH) followed by waitpid(). >> So I assume that the ptrace call restarts my application >> and waitpid hangs (why?). >> >> If I manually send SIGCONT to my stopped application >> immediately before exec-ing gcore, the coredump is being >> created, but for obvious reasons not as consistent as >> I want it to be. >> >> I should add that in my application most other signals are >> blocked. Blocking (or not) SIGCONT seems to have no effect. >> >> Am I doing something wrong here? If yes, ist there >> a different/better/more elegant way of creating a consistent coredump? > What is the purpose of sending SIGSTOP to itself ? Practically, it is no > different than the action of ptrace(PT_ATTACH): all threads are parked > at some safe place in the kernel, or are forcibly moved into the kernel > mode by sending IPI if executing in userspace on other cores. To get > into the safe place in kernel, threads often need to execute some more. > IPI delivery is also not guaranteed to occur in the deterministic place > ("at next instruction boundary"), it happens as hardware reacts to it. > As you see, the process is very asynchronous, it cannot guarantee that > the final snapshot is consistent with arbitrary thread state at the > point of request, but it does represent the valid process state assuming > that the thread are executing async. > > More, ptrace(PT_ATTACH) currently operates not only by a mechanism to > similar to SIGSTOP, it really sends SIGSTOP to the debuggee. We do not > track nested SIGSTOPs, process is either stopped or runnable. So I am > not surprised that attaching to stopped process do not occur until the > stopped state established earlier passes away: the debugger waits for > the confirmation from all threads that they are parked at safe place, > but there is no because the threads are already stopped. If threads are > made runnable the acks are sent and the attach completes. > > I am explaining this to point out that trying to send SIGSTOP and > then attaching with ptrace(PT_ATTACH) is just worse than doing > ptrace(PT_ATTACH). I think you need to have supervisor either > directly execute gcore(1) without SIGSTOP, or execute ptrace(PT_ATTACH) > instead of kill(SIGSTOP), and have gcore functionality embedded into the > it. The consistency of the generated core is actually same. --=20 Dipl.-Ing. Norbert Koch Entwicklung Prozessregler