Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 28 Apr 2012 12:51:10 -0400
From:      Alejandro Imass <aimass@yabarana.com>
To:        Robert Bonomi <bonomi@mail.r-bonomi.com>
Cc:        wojtek@wojtek.tensor.gdynia.pl, freebsd-questions@freebsd.org
Subject:   Re: UFS Crash and directories now missing
Message-ID:  <CAHieY7S-o-iFG0Z9SW08puMagDnHQnznLkWYJOR_6LQdHr70dw@mail.gmail.com>
In-Reply-To: <CAHieY7Sip7LePPnt7S6Yqt=nuAoytG%2B5EqfH4t5kVnqFFZtRkg@mail.gmail.com>
References:  <CAHieY7ToprF89C7yoeWkX8Pqom-=PY9tk2raNuNGHsbnhukXmg@mail.gmail.com> <201204281539.q3SFdtir061045@mail.r-bonomi.com> <CAHieY7Sip7LePPnt7S6Yqt=nuAoytG%2B5EqfH4t5kVnqFFZtRkg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Apr 28, 2012 at 12:36 PM, Alejandro Imass <aimass@yabarana.com> wro=
te:
> On Sat, Apr 28, 2012 at 11:39 AM, Robert Bonomi
> <bonomi@mail.r-bonomi.com> wrote:
>>
>> =A0Alejandro Imass <aimass@yabarana.com> wrote:
>>> On Sat, Apr 28, 2012 at 3:22 AM, Wojciech Puchar
>>> <wojtek@wojtek.tensor.gdynia.pl> wrote:
>>> >> I somewhat agree, but it wasn't a person. I am the only administrato=
r,
>>> >> the only one with root access. The jails were effectively moved to t=
he
>>> >> /usr/local/etc/apache22 of the single that survived at the top level=
.
>>> >> I'm thinking something between mount, EzJail, the journal and the wa=
y
>>> >> MySQL created a great deal of head contention, so something must hav=
e
>>> >> gotten corrupted at the directory level like you state, but the
>>> >> strange part is no _data_ corruption as such, because I was able to
>>> >> physically archive the jails, move them to the correct directory and
>>> >
>>> >
>>> > no matter what you do FreeBSD DOES NOT ramdomly move directories. if =
you are
>>> > sure you didn't move it yourself then it must be machine hardware pro=
blem
>>> > but still unlikely.
>>>
>>> After a little more research, ___it it NOT unlikely at all___ that
>>> under high distress and a hard boot, UFS could have somehow corrupted
>>> the directory structure, whilst maintaining the data intact.
>>
>> This is techically accurate, *BUT* the specifics of the quote "corruptio=
n"
>> unquote in the case under discussion make it *EXTREMELY* unlikely that t=
his
>> is what happened.
>>
>> 99.99+++% of all UFS filesystem "corruption' issues are the result of a
>> system crash _between_ the time cached 'meta-data' is updated in memory
>> and that data is flushed to disk (a deferred write).
>>
>> The second most common (and vanishingly rare) failure mode is a powerfai=
l
>> _as_ a sector of disk is being written -- resulting in 'garbage data'
>> being written to disk.
>>
>> The next possibility is 'cosmic rays'. =A0If running on 'cheap' hardware=
 (i.e.,
>> without 'ECC' memory), this can cause a *SINGLE-BIT* error in data being
>> output.
>>
>> The fact that the 'corrupted' filesystem passed fsck -without- any repor=
ted
>> errors shows that everything in the filesystem meta-data was consistent
>>
>> Given *that*, there are precisely *TWO* ways that the 'results' that hav=
e
>> been reported could have happened.
>>
>> =A01) "Something" did a mv(2) of the various jail directories 'from' the=
ir
>> =A0 =A0 original location to the 'apache' diretory. =A0This involves sim=
ply
>> =A0 =A0 *copying* the diretory entry from the jail's 'parent directory' =
to
>> =A0 =A0 the apache directory, and then marking the entry in the original
>> =A0 =A0 parent as 'unused'. =A0Nothing other than the =A0directory whre =
the jail
>> =A0 =A0 'used to live', and the directory 'where it was found' are touch=
ed.
>> =A0 =A0 This occured _through_ the system 'mv' function, so all the norm=
al
>> =A0 =A0 'housekeeping' was done properly.
>>
>> =A02) it was -not- done though mv(2) -- but that requires that a whole
>> =A0 =A0 *series* of "corruptions" of the filesystem, _ALL_ of which had =
to
>> =A0 =A0 occur in 'exactly' the right way. =A0They are:
>
> [...]
>
>> I think it is safe to conclude that the probabilities -greatly- favor
>> alternative #1.
>>
>
> OK. So after your comments and further research I concur with you on
> the mv but if it wasn't a human, then this might be exposing a serious
> security flaw in the jail system or the way EzJail implements it. The
> whole point of using jails is to protect things like this from
> happening. Given that the only jail that survived was the front-end
> Apache Web server/reverse proxy, then it is also safe to suspect the
> apache (or other) process running on it was able to perform a mv of
> the rest of the jails to it's own /usr/local/etc/apache22 directory.
>
> Is there no possibility is that after the system crash, the journal
> recocery process and/or fsck could have moved this directories ?
>

Also note that even the EzJail basejail was moved also, so it could be
a security hole in the way nullfs is used or in nullfs itself. but the
curious thing is that the basejail is supposed to be mounted read-only
so how did that get moved to the http-proxy jail??

That is why I suspect it could have been something in the boot process
like the journal recovery, fsck or something else with that kind of
privilege and when the EzJail filesystems were unmounted.

--=20
Alejandro



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHieY7S-o-iFG0Z9SW08puMagDnHQnznLkWYJOR_6LQdHr70dw>