Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Jun 2011 12:19:47 -0700
From:      mdf@FreeBSD.org
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-current@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: kern.sync_on_panic
Message-ID:  <BANLkTi=UcmXqZLqmU3E4HqByHX1QewHuQQ@mail.gmail.com>
In-Reply-To: <4E08568E.4060309@FreeBSD.org>
References:  <4E05F582.2010500@FreeBSD.org> <6C42CE07-9298-444A-8094-9C60384CA4F1@bsdimp.com> <4E08568E.4060309@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 27, 2011 at 3:08 AM, Andriy Gapon <avg@freebsd.org> wrote:
> on 26/06/2011 08:51 Warner Losh said the following:
>>
>> On Jun 25, 2011, at 8:49 AM, Andriy Gapon wrote:
>>> Does anybody actually use kern.sync_on_panic tunable/sysctl? If yes, th=
en
>>> in what circumstances do you need it? That is, why any other alternativ=
e
>>> doesn't work for you? Like: 1. remounting filesystems R/O before panic =
if
>>> you knowingly provoke it for testing 2. using netboot for your test sys=
tem
>>> 3. using su+j, gjournal or a different filesystem altogether 4. using f=
sck
>>> after reboot
>>>
>>> It seems to me that syncing filesystems in panic context is an adventur=
e.
>>> And it may become even more of an adventure if we introduce code that
>>> completely stops scheduler in and after panic.
>>
>> I've used it in the past when I was developing a device driver that was =
in
>> the late stages of maturing. =A0Since all the panics in the system were =
when
>> the driver dereferenced NULL in that driver, sync was safe because all t=
he
>> data structures were sane except the aforementioned driver.
>>
>> (1) It was a production system, and everything that could be was already
>> mounted r/w. =A0However, some small, but every critical, amount of data =
was
>> still r/w and it was very important to not lose this data. =A0Production=
 here
>> likely should be in quotes, because it was in the late stages of
>> testing/validation. =A0The problem was without this sometimes the saved =
state
>> of the GPS receiver and other hardware would wind up being zero, which m=
eant
>> that we'd have to do a cold start which cost us a few hours of time. =A0=
At the
>> time I was doing this, we saw zero files a couple times a day without th=
is
>> turned on. (2) netbooting wasn't an option since we were qualifying a
>> non-netbooting system. (3) these weren't available at the time, but the =
goal
>> was to prevent data loss, not to necessarily have to avoid fsck on boot.=
 (4)
>> Data loss without it.
>>
>> Now, I'll be the first to admit this has been a few years, and I haven't=
 done
>> a fresh evaluation to see if things are still safe. =A0I'll also be the =
first
>> to admit that this was a useful debugging setting late in development, a=
nd
>> not in production. =A0I'm also the first to admit this isn't what I'd ca=
ll a
>> very wide-spread case. =A0But it did come in very handy when chasing a f=
ew bugs
>> to be able to do 10 panic/reboot cycles an hour rather than 2 a day.
>
> A fine enough use-case for me. =A0I guess the problem ultimately boiled d=
own to
> peculiarities of UFS behavior, but still...
> However, please be aware that sync_on_panic might get broken when/if we s=
tart
> stopping scheduler in panic.

The entirety of the sync code should be a subroutine in vfs_bio.c so
the 'buf' variable is static to the file.  At that point it would be
reasonable to explicitly call it at the beginning of panic(9) for the
sync-on-panic case, either before IPIing the other CPUs, or at least
before entering the critical section that prevents the scheduler from
running.

Cheers,
matthew



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTi=UcmXqZLqmU3E4HqByHX1QewHuQQ>