Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 7 Nov 2020 11:57:21 -0800
From:      Patrick Mahan <plmahan@gmail.com>
To:        Pete Wright <pete@nomadlogic.org>
Cc:        questions list <freebsd-questions@freebsd.org>
Subject:   Re: Helping understand cause of SIGSEGV
Message-ID:  <CAFDHx1JDyJq%2Bsepz1O186AeijTqyXP6AuQajsETY00j5eAsLXQ@mail.gmail.com>
In-Reply-To: <f51dfaf6-46da-9cd8-ea37-b2733f5ad9bc@nomadlogic.org>
References:  <c2eab4b0-b10b-9db3-1aa3-1f61689e24e8@nomadlogic.org> <CAFDHx1Jg_9k3oWU8X-WdP2CJX8hnBYgMz%2BvxwOs766JZcM3WRQ@mail.gmail.com> <0764e7ef-bd81-a6c5-47c4-7cd539a428f5@nomadlogic.org> <CAFDHx1K2-RWS4=xYtNUKMV3t_J7OKKPUE56f9JY45Q%2B0nH_TFA@mail.gmail.com> <f51dfaf6-46da-9cd8-ea37-b2733f5ad9bc@nomadlogic.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 7, 2020 at 9:59 AM Pete Wright <pete@nomadlogic.org> wrote:

>
>
> On 11/5/20 9:44 PM, Patrick Mahan wrote:
>
> On Thu, Nov 5, 2020 at 5:01 PM Pete Wright <pete@nomadlogic.org> wrote:
>
>>
>>
>> On 11/5/20 4:01 PM, Patrick Mahan wrote:
>>
>>
>>
>>> | thread #1, name = 'fluent-bit', stop reason = signal SIGABRT
>>>    * frame #0: 0x000000004087100a libc.so.7`__sys_thr_kill at
>>> thr_kill.S:4
>>>      frame #1: 0x00000000407e6c84 libc.so.7`__raise(s=6) at raise.c:52:10
>>>      frame #2: 0x000000004089a5d9 libc.so.7`abort at abort.c:67:8
>>>      frame #3: 0x000000000034a7a8
>>> fluent-bit`flb_signal_handler(signal=11) at fluent-bit.c:418:9
>>>      frame #4: 0x00000000406d1c20
>>> libthr.so.3`handle_signal(actp=0x00007fffdfffc600, sig=11,
>>> info=0x00007fffdfffc9f0, ucp=0x00007fffdfffc680) at thr_sig.c:303:3
>>>      frame #5: 0x00000000406d11ef libthr.so.3`thr_sighandler(sig=11,
>>> info=0x00007fffdfffc9f0, _ucp=0x00007fffdfffc680) at thr_sig.c:246:2
>>>      frame #6: 0x00007fffffffe193
>>>      frame #7: 0x000000000036fe0c fluent-bit`tasks_start [inlined]
>>> output_params_set(th=0x00000000416091c0, data=0x000000004165d980,
>>> bytes=128, tag="random.0", tag_len=8, i_ins=0x0000000040e58000,
>>> out_plugin=0x0000000040e2dfc0, out_context=0x00000000416051e0,
>>> config=0x0000000040e19180) at flb_output.h:429:5
>>>
>>
>> I would look at what is happening here in output_params_set().  Something
>> is accessing out of bounds memory.
>>
>>
>>
>> thanks for your response Patrick i really appreciate it.
>>
>> So here is where output_params_set() is defined - with an interesting
>> comment that i haven't chased down yet:
>>
>> 521     /* Workaround for makecontext() */
>> 522     output_params_set(th,
>> 523                       buf,
>> 524                       size,
>> 525                       tag,
>> 526                       tag_len,
>> 527                       i_ins,
>> 528                       o_ins->p,
>> 529                       o_ins->context,
>> 530                       config);
>> 531     return th;
>> 532 }
>> 533
>>
>> and the frame from the backtrace is this for reference:
>>      frame #8: 0x000000000036fd14 fluent-bit`tasks_start [inlined]
>> flb_output_thread(task=0x00000000416410a0, i_ins=0x0000000040e58000,
>> o_ins=0x0000000040e5b000, config=0x0000000040e19180,
>> buf=0x000000004165d980, size=128, tag="random.0", tag_len=8) at
>> flb_output.h:522
>>
>> and then later on line 429 of flb_output.h it does this:
>> 428     FLB_TLS_SET(flb_libco_params, params);
>> 429     co_switch(th->callee);
>>
>> like i said i'm not really sure how to grok this, but it sounds like one
>> of the params in output_params_set isn't being set correctly.  hopefully
>> the code snippet makes the error more obvious :)
>>
>>
> Okay, I don't know lldb very well.  But according to the GDB to LLDB
> command map <http://lldb.llvm.org/use/map.html>; it uses the same commands
> to move between frames.  So at startup you want to ensure you are in thread
> 1 (thread select 1).  That should place you in the last frame on the stack
> (frame #0).  You just move up the stack using the command 'up' until you
> are in frame #7.
>
> Once there you need to dump the contents of 'th' using the command 'p *th'
> or 'frame variable -T *th'.  I suspect the value of th->callee is
> incorrect.  The next frame on the stack is -
>
>     frame #6: 0x00007fffffffe193
>
> This is different from the rest of the stack addresses.  So I suspect it
> is out of bounds.
>
> Patrick
>
>
>
> that's totally it - thanks Patrick!
>
> frame #7: 0x000000000036fe0c fluent-bit`tasks_start [inlined]
> output_params_set(th=0x00000000416091c0, data=0x000000004165d980,
> bytes=128, tag="random.0", tag_len=8, i_ins=0x0000000040e58000,
> out_plugin=0x0000000040e2dfc0, out_context=0x00000000416051e0,
> config=0x0000000040e19180) at flb_output.h:429:5
>    426       params->th          = th;
>    427
>    428       FLB_TLS_SET(flb_libco_params, params);
> -> 429       co_switch(th->callee);
>    430   }
>    431
>    432   static FLB_INLINE void output_pre_cb_flush(void)
> (lldb) p *th
> (flb_thread) $0 = {
>   caller = 0x00000000406b2950
>   callee = 0x000000004169f640
>   data = 0xa5a5a5a5a5a5a5a5
>   cb_destroy = 0x0000000000000000
> }
> (lldb)
>
> i guess the next question to answer is why is this out of bounds.  i'm
> gonna poke around and see what i can learn today.
>
>
The value of th->callee should be a function, I think.  That is just from a
cursory glance at libco.

Good luck.

Patrick


>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFDHx1JDyJq%2Bsepz1O186AeijTqyXP6AuQajsETY00j5eAsLXQ>