Date: Tue, 5 Jul 2016 14:48:08 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Maxim Sobolev <sobomax@freebsd.org> Cc: stable@freebsd.org, hackers@freebsd.org Subject: Re: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) Message-ID: <20160705114808.GN38613@kib.kiev.ua> In-Reply-To: <CAH7qZfu=XveZCAgS0%2BdzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com> References: <CAH7qZfu=XveZCAgS0%2BdzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote: > Hi all, investigating some random postgresql-9.1.21 server crashes on > FreeBSD 10.3, we've started seeing those after upgrading from postgres > 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very > unlikely. I suspect that postgres is at fault, however I am also curious > how could it be that kernel is not capable of generating core file when > application does something silly? Is it that some ELF-related data > structures got corrupted or something else? Are we protecting the page > where ELF header is mapped with R/O flag? I am looking at possibly > recreating this by poking around elf header(s), seeing if I can corrupt it > in a similar manner reliably, any pointers or suggestions are appreciated. > > Jun 27 04:10:18 dal12 kernel: Failed to write core file for process > postgres (error 14) > Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on > signal 11 > Jul 1 05:21:46 dal12 kernel: Failed to write core file for process > postgres (error 14) > Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on signal > 11 > > #define EFAULT 14 /* Bad address */ > > The resulting files are truncated and is not really usable for anything. > We've seen the same issue > > -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 postgres.41361.core > -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 postgres.1722.core > > [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core > GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] > Copyright (C) 2016 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html > > > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-portbld-freebsd10.3". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from postgres...(no debugging symbols found)...done. > BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core file > size >= 517120000, found: 1310720. > [New LWP 100261] > Core was generated by `postgres'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 > (gdb) where > #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 > Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 > (gdb) q > https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160705114808.GN38613>