From owner-freebsd-stable@FreeBSD.ORG  Fri Nov 18 18:22:49 2005
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9E85716A41F;
	Fri, 18 Nov 2005 18:22:49 +0000 (GMT)
	(envelope-from johan@stromnet.org)
Received: from pne-smtpout2-sn2.hy.skanova.net
	(pne-smtpout2-sn2.hy.skanova.net [81.228.8.164])
	by mx1.FreeBSD.org (Postfix) with ESMTP id BE7D043D77;
	Fri, 18 Nov 2005 18:22:41 +0000 (GMT)
	(envelope-from johan@stromnet.org)
Received: from elfi2.stromnet.org (81.231.107.13) by
	pne-smtpout2-sn2.hy.skanova.net (7.2.060.1)
	id 437DDD3B00010570; Fri, 18 Nov 2005 19:22:29 +0100
Received: from [10.10.0.6] (vpn1-c1.stromnet.org [10.10.0.6])
	by elfi2.stromnet.org (Postfix) with ESMTP id 316E0CF03F;
	Fri, 18 Nov 2005 19:22:24 +0100 (CET)
In-Reply-To: <a78074950511180943r57fd9d03r64efcc705001bc35@mail.gmail.com>
References: <991F35AA-151B-4AEA-82BD-5F4AEDF28424@stromnet.org>
	<a78074950511180117r6d64db25o4ae37c0c5998e002@mail.gmail.com>
	<74994962-5050-47BD-897B-DE3880B9EBD5@stromnet.org>
	<a78074950511180943r57fd9d03r64efcc705001bc35@mail.gmail.com>
Mime-Version: 1.0 (Apple Message framework v746.2)
Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed
Message-Id: <A6F22EE2-B1E6-44B5-B4C2-E77E1A24FEBB@stromnet.org>
Content-Transfer-Encoding: quoted-printable
From: =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.org>
Date: Fri, 18 Nov 2005 19:23:25 +0100
To: delphij@delphij.net
X-Mailer: Apple Mail (2.746.2)
Cc: pjd@freebsd.org, freebsd-stable@freebsd.org
Subject: Re: Page fault, GEOM problem??
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Nov 2005 18:22:49 -0000

Hi!

On 18 nov 2005, at 18.43, Xin LI wrote:

> Hi, Johan,
>
> On 11/18/05, Johan Str=F6m <johan@stromnet.org> wrote:
>> On 18 nov 2005, at 10.17, Xin LI wrote:
> [snip]
>> Doesnt look like I got any "usable" dump devices..
>> When booting i get
> [...]
>> Loading configuration files.
>> No suitable dump device was found.
>> Entropy harvesting:
>> interrupts
>> ethernet
>> point_to_point
>> kickstart
>> .
>> swapon: adding /dev/mirror/gm0s1b as swap device
>
> I see, so your both SATA disks are in the same mirror group...
>
>> Then naturally:
>> /etc/rc: WARNING: Dump device does not exist.  Savecore not run.
>>
>> Looked around in the rc-scripts and tried to figure out what it did,
>> the dumpon script
>> tries to autolookup a good dump device but finds none..
>
> Unfortunately, kernel dumps currently does not support every device,
> for some technical reasons (probably to simplify the crash code so
> they do not make more mistakes^Wdamages)
>
>> According to the page you linked to, the dumpon command has to be
>> executed AFTER swapon.. Why is the rc scripts trying to run it before
>> swapon then?
>
> I guess this is because that dumpon now can detect dump device
> automatically, but I'm not quite sure about this.  Will look for the
> reason.  I think either Handbook should be updated, or the code should
> be corrected.
>
> What I am very curious is that why dumpon is "BEFORE" savecore.  Maybe
> I have some misunderstanding...

Sorry, partly my misstake.. I think i missunderstod how save savecore =20=

works below (when i tried it manually in last mail)..
But the messages from above are directly from boot, seems it tries =20
dumpon before savecore? Relevant bootlog from last boot:


ad0: 2441MB <WDC AC22500L 32.41N35> at ata0-master UDMA33
acd0: CDROM <CD-ROM CDU701-F/1.0q> at ata1-master PIO4
ad6: 286188MB <Maxtor 7L300S0 BANC1G10> at ata3-master SATA150
ad10: 286188MB <Maxtor 7L300S0 BANC1G10> at ata5-master SATA150
GEOM_MIRROR: Device gm0s1 created (id=3D4118114647).
GEOM_MIRROR: Device gm0s1: provider ad6s1 detected.
GEOM_MIRROR: Device gm0s1: provider ad10s1 detected.
GEOM_MIRROR: Device gm0s1: provider ad10s1 activated.
GEOM_MIRROR: Device gm0s1: provider ad6s1 activated.
GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched.
Trying to mount root from ufs:/dev/mirror/gm0s1a
Loading configuration files.
dumpon: (this DIOCSKERNELDUMP message is probably since i specified =20
dumpdev in rc.conf so it forced useage of gm0s1b instead of letting =20
the scripts autodetect.. )
ioctl(DIOCSKERNELDUMP)
:
Operation not supported
Entropy harvesting:
interrupts
ethernet
point_to_point
kickstart
.
swapon: adding /dev/mirror/gm0s1b as swap device
Starting file system checks:
/dev/mirror/gm0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/mirror/gm0s1a: clean, 213811 free (771 frags, 26630 blocks, 0.3% =20=

fragmentation)
/dev/mirror/gm0s1e: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/mirror/gm0s1e: clean, 1012917 free (85 frags, 126604 blocks, =20
0.0% fragmentation)
/dev/mirror/gm0s1f: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/mirror/gm0s1f: clean, 115955787 free (40747 frags, 14489380 =20
blocks, 0.0% fragmentation)
/dev/mirror/gm0s1d: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/mirror/gm0s1d: clean, 1983354 free (4834 frags, 247315 blocks, =20
0.2% fragmentation)
<ifconfig stuff>
Starting devd.
Mounting NFS file systems:
.
Creating and/or trimming log files:
.
Starting syslogd.
Checking for core dump on /dev/mirror/gm0s1b...
savecore: no dumps found
Starting named.
<rest of boot>

So, it seems it does run savecore after running dumpon and mounting =20
disks etc... Is that wrong?

>
>> Anyway, tried to do dumpon manually on my swap drive:
>>
>> $ dumpon -v /dev/mirror/gm0s1b
>> dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported
>>
>> Didn't work too good..
>> Also tried savecore manually:
>>
>> $ savecore /var/crash/ /dev/mirror/gm0s1b
>> savecore: no dumps found

(This was my misstake, of course there are no dumps when I didnt have =20=

a dump when it crashed..)

>>
>> Didnt work very good either (but probably expected since there was no
>> working dumps..)
>> Google showed me some other thread in this list about gmirror swap
>> dump, just a question (if it was supported) w/o any answers tho. Same
>> error as I got.
>
> It seems that this could not be workaround'ed easily.  If possible, my
> suggestion is that you attach a third disk and create a swap partition
> on it for the crash dump.  If this is not feasible, then adding DDB
> and KDB may give us a chance to catch the panic and you can use
> "trace" command at the ddb> prompt to obtain a simplified backtrace,
> and there is good chance that it would reveal what is happening.
>
> I have cc'ed to Pawel who is very knowledgeable in this area, and
> let's see whether he has some better suggestions :-)

Okay, just added an old but working 2 gig disk to the system, made it =20=

a swap and swapon'ed and:

root@elfi:~$ dumpon -v /dev/ad0s1b
kernel dumps on /dev/ad0s1b

Great! :) So, let's see when/if it dies next time... Before I took it =20=

down for the dump-disk, it had been running fine
for 1d 1h (since boot after crasch), however probably not as loaded =20
as the day it crashed.. I'll try to load it some now and see if it =20
crashes.

Thanks

Johan

>
> Cheers,
> --
> Xin LI <delphij@delphij.net> http://www.delphij.net