Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 27 Oct 2017 13:37:29 +0200
From:      Norbert Koch <nkoch@demig.de>
To:        <kostikbel@gmail.com>
Cc:        <freebsd-hackers@freebsd.org>
Subject:   Re: crerating coredump of multithreaded process
Message-ID:  <95ad25da-dc53-1c6a-030b-71cf9021a75b@demig.de>
In-Reply-To: <20171027093311.GF2566@kib.kiev.ua>
References:  <e455d19c-72ac-3501-8764-415c4d154c74@demig.de> <20171027093311.GF2566@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Ok, thank you for your explanation.



*****************************************
* demig Prozessautomatisierung GmbH     *
*                                       *
* Anschrift:          Haardtstrasse 40  *
*                       D-57076 Siegen  *
* Registergericht:     Siegen HRB 2819  *
* Geschaeftsfuehrer:   Joachim Herbst,  *
*                        Winfried Held  *
* Telefon:              +49 271 772020  *
* Telefax:              +49 271 74704   *
* E-Mail:                info@demig.de  *
*                  http://www.demig.de  *
*****************************************
Am 2017-10-27 um 11:33 schrieb Konstantin Belousov:
> On Fri, Oct 27, 2017 at 10:44:41AM +0200, Norbert Koch wrote:
>> Hello.
>>
>> When trying to create the coredump of a running
>> process (without killing it) under FreeBSD 10.3
>> I am seeing a somewhat strange behaviour.
> Try this on HEAD or stable/11.  There were a lot of changes and bugfixes
> in ptrace(2).
>
> I do not claim that the behaviour you see has changed, but 10.3 is too
> diverged from the code where developers would be willing to look at.
>> As I want to see the state of all threads, the q&d way
>> of fork() + SIGABRT does not work for me.
>>
>> So, what I do is having a supervisor program waiting for SIGUSR1.
>> When my application signals the wish to be coredumped
>> it sends SIGSTOP to itself immediately after sending SIGUSR1.
>> The supervisor then forks gcore.
>>
>>   From what I can see using top, my application immediately starts
>> again as if SIGCONT has been received while gcore hangs in wait.
> SIGCONT cannot be blocked, otherwise programs could create unkillable
> processes.
>
>> Gcore calls ptrace(PT_ATTACH) followed by waitpid().
>> So I assume that the ptrace call restarts my application
>> and waitpid hangs (why?).
>>
>> If I manually send SIGCONT to my stopped application
>> immediately before exec-ing gcore, the coredump is being
>> created, but for obvious reasons  not as consistent as
>> I want it to be.
>>
>> I should add that in my application most other signals are
>> blocked. Blocking (or not) SIGCONT seems to have no effect.
>>
>> Am I doing something wrong here? If yes, ist there
>> a different/better/more elegant way of creating a consistent coredump?
> What is the purpose of sending SIGSTOP to itself ? Practically, it is no
> different than the action of ptrace(PT_ATTACH): all threads are parked
> at some safe place in the kernel, or are forcibly moved into the kernel
> mode by sending IPI if executing in userspace on other cores. To get
> into the safe place in kernel, threads often need to execute some more.
> IPI delivery is also not guaranteed to occur in the deterministic place
> ("at next instruction boundary"), it happens as hardware reacts to it.
> As you see, the process is very asynchronous, it cannot guarantee that
> the final snapshot is consistent with arbitrary thread state at the
> point of request, but it does represent the valid process state assuming
> that the thread are executing async.
>
> More, ptrace(PT_ATTACH) currently operates not only by a mechanism to
> similar to SIGSTOP, it really sends SIGSTOP to the debuggee. We do not
> track nested SIGSTOPs, process is either stopped or runnable. So I am
> not surprised that attaching to stopped process do not occur until the
> stopped state established earlier passes away: the debugger waits for
> the confirmation from all threads that they are parked at safe place,
> but there is no because the threads are already stopped. If threads are
> made runnable the acks are sent and the attach completes.
>
> I am explaining this to point out that trying to send SIGSTOP and
> then attaching with ptrace(PT_ATTACH) is just worse than doing
> ptrace(PT_ATTACH).  I think you need to have supervisor either
> directly execute gcore(1) without SIGSTOP, or execute ptrace(PT_ATTACH)
> instead of kill(SIGSTOP), and have gcore functionality embedded into the
> it.  The consistency of the generated core is actually same.

--=20
Dipl.-Ing. Norbert Koch
Entwicklung Prozessregler





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?95ad25da-dc53-1c6a-030b-71cf9021a75b>