Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Dec 2009 11:53:07 +0100 (CET)
From:      Konrad Heuer <kheuer2@gwdg.de>
To:        cronfy <cronfy@sprinthost.ru>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: FreeBSD is too filesystem errors sensitive
Message-ID:  <20091208114509.B67127@gwdu60.gwdg.de>
In-Reply-To: <4B1E2D40.9060900@sprinthost.ru>
References:  <4B1DF953.4050504@sprinthost.ru> <hfl7v5$f9j$1@ger.gmane.org> <4B1E2D40.9060900@sprinthost.ru>

next in thread | previous in thread | raw e-mail | index | archive | help


On Tue, 8 Dec 2009, cronfy wrote:

>
>>> Please forgive me for probably a very stupid question. But why is FreeBSD 
>>> so sensitive to filesystem errors that it ends up with panics like 
>>> 'freeing free block' or 'ffs_valloc: dup alloc'? I just can't get it. 
>>> Failed to allocate vnode? Go allocate another one! Freeing free block? 
>>> Leave it free then! I understand these situations should never happen, but 
>>> the hell why is it required to panic and kill everything that would be 
>>> working happily even if something very disasterous happen to /backup 
>>> partition, in example?
>> Probably because UFS is not designed to be a backup file system but a 
>> working one :)
>> 
>> All those errors indicate file system corruption. To protect other data 
>> from getting corrupted (e.g. by invalid pointers or calculations), the 
>> kernel panics.
>
> To protect us against terrorists our government do strange things too ;-)
>
> After panic data *is* getting corrupted anyway - MySQL tables that were open 
> are broken, soft-updates are unsync'ed etc etc.
> Server is required to reboot, fsck, time is wasted while this occurs. Why all 
> this should happen because of a single vnode fail? Why not just throw message 
> in /var/log/messages, return "oh, I failed to save a file" to the process 
> that initiated the operation and just go on? Are consequences of attept to 
> "free already free block" *so* dangerous that it is needed to give up on 
> EVERYTHING? Let's say it was not /backup partition, ok, it was 
> /var/tmp/some-php-session or even /var/cron/tabs/someuser file that failed. 
> So what? Even /boot/kernel/kernel corruption is not critical if you are not 
> going to reboot right now (or if you have /boot/kernel.old :)
>
> Is there a way to say "Dear kernel, don't panic, I'am holding your hand, keep 
> working please-please-please?" If so, can it lead to complete filesystem 
> corruption indeed or it is not so serious?

Afaik you can't do this. And you shouldn't do if it'd be possible. The 
file system errors you mention above should not happen under any normal 
circumstances. They may happen after a crash caused by other reasons but 
should get repaired by fsck. The kernel cannot continue with such errors 
because the whole file system metadata cannot be trusted anymore until 
repaired.

I use FreeBSD with UFS for more than 15 years now; partially on heavily 
loaded and i/o-bound systems. I never had any serious filesystem problems 
as long as the disks or the storage area network (san) didn't fail.

In the worst case, after a san crash, I had to run fsck three times (one 
run immediately after the other) in single user mode on large partitions 
until all errors were repaired.

Best regards

Konrad Heuer
GWDG, Am Fassberg, 37077 Goettingen, Germany, kheuer2@gwdg.de




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20091208114509.B67127>