From owner-freebsd-current Fri Aug 28 01:09:38 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id BAA14881 for freebsd-current-outgoing; Fri, 28 Aug 1998 01:09:38 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from grape.carrier.kiev.ua (grape.carrier.kiev.ua [193.193.193.219]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id BAA14872 for ; Fri, 28 Aug 1998 01:09:30 -0700 (PDT) (envelope-from archer@grape.carrier.kiev.ua) Received: (from archer@localhost) by grape.carrier.kiev.ua (8.9.1/8.8.8) id LAA00123; Fri, 28 Aug 1998 11:08:06 +0300 (EEST) (envelope-from archer) Date: Fri, 28 Aug 1998 11:08:06 +0300 (EEST) From: Alexander Litvin Message-Id: <199808280808.LAA00123@grape.carrier.kiev.ua> To: Archie Cobbs Cc: current@FreeBSD.ORG Subject: Re: encountered possible VM bug ? X-Newsgroups: grape.freebsd.current In-Reply-To: <199808272051.NAA27400@bubba.whistle.com> Organization: Lucky Grape User-Agent: tin/pre-1.4-980202 (UNIX) (FreeBSD/3.0-CURRENT (i386)) Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article <199808272051.NAA27400@bubba.whistle.com> you wrote: >> GW> No, this is the ``daemons dying'' bug which nobody has fixed yet. >> GW> When the system runs out of swap, some random selection of processes >> GW> which are in swap get corrupted. Usually this results in a daemon >> GW> which dies whenever it fork()s, but sometimes it is manifested as >> GW> other sorts of corruption. The message you see from realloc is >> GW> indicative of a corrupted pointer. >> >> Really, I was under impression, that it is the problem just with fork(). >> But now I may confirm that processes get corrupted in different manners. >> E.g., I have now a specially written dummy daemon running, which I >> was able to corrupt (intentionally exhausting swap) in such a way that >> it successfully forks. Than child process sleeps (just to give me >> chance to attach to it with debugger), allocates memory, accesses it >> -- and during all that it doesn't get SIGSEGV. But then it dies when >> trying to syslog(3). It seems that the corruption is in mmaped ld.so >> or libc.3.1.so. >> >> If anybody cares, I may try to give any other details. AC> At Whistle, we've seen this bug every so often for a long time. AC> The common elements seem to be: AC> 1. memory mapping is in use AC> 2. a fork() is happening or just happened AC> But #1 and #2 are not necessarily both related to the same process. AC> This bug has been around for a *long* time, in both 2.x and 3.x. I saw bash exiting with SIGSEGV. It was not trying to fork some job. It was swapped out, I just hit , and it exited with signal 11. Cron sometimes seem to just stop forking cron jobs, when it is not segfaulting -- it just doesn't try to fork. AC> Running out of swap may or may not be related, not sure... I think AC> we've seen this when swap was not an issue. Perhaps running out of AC> swap amplifies the problem. AC> It's really hard to pin down, because the panic seems to come a AC> while after the initial damage is done. We've seen random processes AC> crashing every time they try to fork(), kernel panic's because of AC> some process being on two different queues at the same time (eg, AC> sleep and runnable), and other manifestations. AC> A common manifestation is that a file being written out contains AC> some random page of memory from some other file -- we think the other AC> file is a currently mmap'd file. In my case it seems that the process have some of its pages zeroed. At least here's the simpthom (I have it still running and segfaulting -- for investigation ;): root:~/dummy_daemon:grape:> gdb dummy_daemon 29643 [...] Attaching to program `/usr/home/archer/dummy_daemon/dummy_daemon', process 29643 Reading symbols from /usr/libexec/ld.so...done. Reading symbols from /usr/lib/aout/libc.so.3.1...done. Error accessing memory address 0x0: Bad address. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ What exactly does that line mean? When I attach to not deseased dummy_daemon, it does not appear, instead I see: 0x20057c21 in nanosleep () AC> Julian and Terry can supply more details. AC> -Archie AC> ___________________________________________________________________________ AC> Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com --- It's lucky you're going so slowly, because you're going in the wrong direction. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message