Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Mar 2015 19:24:54 +1000
From:      Da Rock <freebsd-fs@herveybayaustralia.com.au>
To:        Benjamin Kaduk <kaduk@MIT.EDU>
Cc:        freebsd-fs@freebsd.org, mckusick@freebsd.org
Subject:   Re: Delete a directory, crash the system
Message-ID:  <55127EE6.2010506@herveybayaustralia.com.au>
In-Reply-To: <alpine.GSO.1.10.1503250018030.22210@multics.mit.edu>
References:  <CAHAXwYDPMrdY-TP-5T1_6M_ot4gY09jo2_Wi_REOmE=%2Bu%2B_QuQ@mail.gmail.com> <CAGwOe2byRc4LVsyxvTJgxNGCbhvOEaeDXjmFJ7DoXThPQe1bcQ@mail.gmail.com> <CAHAXwYCj9AV8ZcDffNNGx-ivL=h_TK9zLQRTPknArX25HSfEag@mail.gmail.com> <CAGwOe2YCDRqHudovDB_Kz9WHppvB8v2L%2B0gkDnWgG88bgZTKSA@mail.gmail.com> <CAHAXwYCnRDQqgRcvaEE1BmSJYYOidoQzzUoHX_QWdyJzYO3kKw@mail.gmail.com> <551007DD.5020109@herveybayaustralia.com.au> <alpine.GSO.1.10.1503231049050.22210@multics.mit.edu> <5510B995.8060307@herveybayaustralia.com.au> <alpine.GSO.1.10.1503241014270.22210@multics.mit.edu> <5511D807.3040606@herveybayaustralia.com.au> <alpine.GSO.1.10.1503250018030.22210@multics.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On 03/25/15 14:25, Benjamin Kaduk wrote:
> On Tue, 24 Mar 2015, Da Rock wrote:
>
>> On 03/25/15 00:16, Benjamin Kaduk wrote:
>>> On Mon, 23 Mar 2015, Da Rock wrote:
>>>
>>>> Unfortunately, fsck isn't helping - foreground or otherwise. All it shows
>>>> on
>>>> every single fs is inode 4 recovery which doesn't sound quite right. And
>>> Have you posted the exact output in a previous message (could you send a
>>> link)?
>> Not precisely, but the message is just a flash and there is no copying of it.
>> Anyway, inode 4 is the .sujournal file as expected; this means there is an
>> issue with the softupdates. Could this be narrowing it down (the OP to this
>> was also in this age of enlightenment, SU came in with 8.x didn't it?)?
> Ah, SU+J could be quite relevant.  Soft-update journalling was enabled by
> default for a period of time, but I believe it was disabled because there
> were some scenarios where it was destabilizing.  CC-ing Kirk to improve on
> my lousy memory.
Hmmm... not sure about that. This was set by a fresh install at the time 
and I haven't fiddled with that - I have set trim though (I think). To 
verify, I just checked my fresh 10.1 and it has the same settings, so I 
don't think they're disabled yet...
>
> Do you remember what version was used to install the system in question
> (i.e., create the filesystem in question)?
Version of what exactly? Do you mean the OS or the utilities for 
filesystem ops? The filesystem was originally setup at install (I start 
with a clean system when I install freebsd - exceptions happen of 
course, but thats the rule. Makes it easier... they are just 
workstations after all) so I wouldn't remember or discover exactly what 
utils were used. Install was using bsdinstall as per FBSD10 disk.
> Please show the output of
> 'tunefs -p <filesystem>'
root:

tunefs: POSIX.1e ACLs: (-a)                                disabled
tunefs: NFSv4 ACLs: (-N)                                   disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: soft update journaling: (-j)                       enabled
tunefs: gjournal: (-J)                                     disabled
tunefs: trim: (-t)                                         enabled
tunefs: maximum blocks per file in a cylinder group: (-e)  4096
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: space to hold for metadata blocks: (-k)            5240
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)

All the others are about the same - variations mainly in space variables 
due to size.
>
>>>> again, it is only showing during updates to ports being built. I'm
>>> Er, what is only showing up?  The panics?
>>> Surely you are not only running fsck while building ports...
>> Yes, the panics.
>>
>> Sorry, I thought that was obvious seeing as the alternative is impossible :)
>>>> investigating further, but it may be just a corrupt file in pkg system.
>>>>
>>>> Incidentally, I'm not suggesting an absolute fix for the issue as such,
>>>> but a
>>>> better means of handling it rather than crashing the system. The posts on
>>>> this
>>> Understood.  But, there will always be some types of error which are truly
>>> unrecoverable, and there is no real option other than to panic.  (Which is
>>> not to say that your situation is necessarily one of them.)
>> That I get, and given this may be an issue with SU it may well be warranted.
>> What can we do to narrow this down, as obviously one cannot be sitting
>> watching exactly what happens for the hours required while building ports.
>> Your bound to look away for just a second and miss it even if you did try! :D
>>>> If I discover anything more I'll keep everyone posted :)
>> So I did some fiddling with fsck, fsdb, find and stat; and got nowhere. I ran
>> fsck again and it gave me not much again. It did hint at some files in the
>> ports tree, so I cleaned up the ports tree to fresh install point, ran fsck
>> again and rebooted. So far so good, but I'm keeping my fingers crossed still.
> It is probably important to note that 'fsck -F' and saying 'no' to "USE
> JOURNAL?" is the most relevant fsck invocation.
Ok. I only use fsck in single user mode, as its only really of use to me 
there and something is usually broken if I'm using it :) so -F is 
usually implied there. No to use journal - good to know, I'll use that 
next time then when it happens.
>
>> This doesn't help the panics - they're still a pita when they happen. It does
>> help me resolve the issue this time though. But initiating this error in
>> testing is damn near impossible. What can we document here as a way to gather
>> data to determine how to resolve this issue? Given my luck with this, its
>> bound to happen again at some point :)
> I think actual diagnostic is beyond my expertise/time committment at the
> moment.  I suspect that using tunefs to disable softupdate journalling
> will be a workaround, if that is what you are really interested.
Don't know. Might be SU+J or maybe a pkgng fault in managing ports. 
Might just wing it - might be helpful to the project after all :) (could 
erk some of my users though :P)
>
> I'll let Kirk decide if he wants to debug more, but the answer may well be
> "no" if you're not running the latest ufs from -current.
>
> -Ben




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55127EE6.2010506>