Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 Nov 2015 18:57:32 +0100
From:      Julien Cigar <jcigar@ulb.ac.be>
To:        Gerhard Schmidt <schmidt@ze.tum.de>
Cc:        Adam Vande More <amvandemore@gmail.com>, FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: Random Lockup with FreeBSD 10.2 on SuperMicro Boards
Message-ID:  <20151117175731.GY2604@mordor.lan>
In-Reply-To: <564B5D83.5000909@ze.tum.de>
References:  <56498205.3060806@ze.tum.de> <20151116094334.GS2604@mordor.lan> <5649A761.7040303@ze.tum.de> <20151116111609.a9757a4a.freebsd@edvax.de> <5649AEC3.5090104@ze.tum.de> <20151116164507.GA87691@neutralgood.org> <CA%2BtpaK3065Tw_NC=VXa0Pq3ZD_mXUcHhvoVrSOw8feHr7i5gaw@mail.gmail.com> <564B5D83.5000909@ze.tum.de>

next in thread | previous in thread | raw e-mail | index | archive | help

--SbbROFN+SMqu6LKU
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Nov 17, 2015 at 06:01:55PM +0100, Gerhard Schmidt wrote:
> Am 17.11.2015 17:23, schrieb Adam Vande More:
> > On Mon, Nov 16, 2015 at 10:45 AM, <kpneal@pobox.com
> > <mailto:kpneal@pobox.com>> wrote:
> >=20
> >     When in doubt use 'fsck -f' to force a check despite the filesystem
> >     being marked clean.
> >=20
> > =20
> > Yes, but a full fsck should be run on a regular basis regardless of
> > suspicion.
> >=20
> >     Personally, I got bit by SU (plain) a long time ago and I've never
> >     really
> >     trusted it since. I strongly advise you to 'fsck -f' on your /var
> >     just to
> >     rule out _any_ corruption there.
> >=20
> >=20
> > A lower level fs error isn't going be to detected by a background
> > fsck(only does preening) or SUJ fsck(trusts the journal).  Such errors
> > can occur on *any* journaled fs.  Periodically doing a full fsck on fs's
> > is actually something Linux does better.
> >=20
> > https://lists.freebsd.org/pipermail/freebsd-current/2013-July/042951.ht=
ml
> >=20
> > Many think SU or SUJ obviate the need for a periodic full fsck.  It does
> > not.  SU and SUJ devs have repeated this since their respective
> > inception.  [1] Hardware still lies, bitrot still occurs, do a full
> > fsck.  Vague reports of "I don't trust this" aren't helpful.   If you
> > know of a bug, please report it so it can be addressed.=20
> >=20
> > [1]
> > https://lists.freebsd.org/pipermail/freebsd-arch/2010-January/009872.ht=
ml --
> > Well initially it's claimed "eliminate fsck after an unclean shutdown"
> > but details it later showing fsck using journal isn't a full fsck.
>=20
> Let's get back to Topic. There is no corruption. And still if there is
> that's software bug and has to be fixed. This is not biology where
> something happens spontaneously. This is computer science. If there is
> something wrong there are only three explanations. The User done
> something wrong, not likely here. There is an Hardware error, on three
> different Servers roughly after the same amount of time not very likely
> either. So it's cause number three: Bug in the Software.
>=20
> As I said. I have 76 Servers running FreeBSD (various versions from 8.4
> to 10.2) only 3 of them are 10.2 (5 since yesterday) and of this three
> running 10.2 longer than a month 100% had this Problem at least once.
> out of the 73 other servers 0% had this Problem and 45 of them are the
> exact same Hardware and all of them running considerably longer than one
> Month.
>=20
> And for the fscks. The last time i had to do a fsck on any partition,
> beside Hardware failures, was about 2 and 1/2 years ago when your UPS
> died and killed the power. And besides from some logfiles even than
> there was no corruption. I have filesystems that are 8 years without a
> fsck, that are production servers. I have never had problems with UFS SU
> and UFS SU-J.
>=20
> Sorry guys there is no problem with UFS on FreeBSD.

couldn't you disable SU+J only on one of them? It would be worth trying
at least. I never had any problem with SU, but I'm sorry to say that
SU+J almost never worked for me (see PR 203588 for latest problem that I
had).

I'll repeat myself but I had random lock ups on some HP Proliant servers
here too (without any corruption) with SU+J. Since I disabled journaling
lock ups "automagically" disappeared.

>=20
> I agree if there is an unclean shutdown you might want to do an complete
> fsck. But in the case discussed here the unclean shutdown was an result
> of the lockup not the other way round.
>=20
> Regards
>   Estartu
>=20
>=20
>=20
> --=20
> -------------------------------------------------
> Gerhard Schmidt       | E-Mail and JabberID:
> TU-M=C3=BCnchen            | schmidt@ze.tum.de
> WWW & Online Services | PGP-Publickey on Request

--=20
Julien Cigar
Belgian Biodiversity Platform (http://www.biodiversity.be)
PGP fingerprint: EEF9 F697 4B68 D275 7B11  6A25 B2BB 3710 A204 23C0
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

--SbbROFN+SMqu6LKU
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAABCgAGBQJWS2qIAAoJEAi2KiTKQR5piNkP/iy1x+MnF9Ro7QdwFABQCG73
UYSjPEzDQFeAzwutNdHzbVwMUuRCg/+6wrB3Ras6j21taJDRbkYFZ4MMUHzwaK56
Un1UmyWBrt9f8AGwD65VEv5iVzSWDDix3u2uZplzfs/3uxYLRmX/14pBNxzba1pK
3jVE/V/zFVdzUo935Wbd5yFZAc4RHWw+2bsKpfdDeQj0hZK/M7OFM7jD6JJ2oQG4
PoxtNfYsMIBRB4b4wS2bFB0o/oPRtfL+JE8PM1gWgx71Oic2+Xqzw5GZjc5yfoXx
dy3/R4eu+63zZ78Bh0zOLP8z+N2dJ4edb3zexZw9SoDarHS/eH4jsjRcFB6kgYSg
FxjUZ7JFKdD+mFX+cKN9TINY/aMLhBRCJ4eOp6iO4PixHGCIv7Y/yOjyONapEait
bWqDWR1x6sZrU1BIEovGCdCouSrnus1jhzYzGcpUXWKhaftMYO/3ObLFl9IoNgtk
yjlnVEFiAKdSD3UMnq4Yq2gX7O6cylwZfs7jf8xpPRr0qJEvjUC0p41+W6oQ+Oj2
ffU/5+E4HkKo/QwvkQBTbMylR8NWVihfOiBgSZzaIaQvF/TQ3Z5BWm+QFoC6RUSR
UhKKBIEvyeUB+qpsBHryeQ/S0+Jkg4G4K5xbUyzenkK6ZSdDMAGv3EOQ6rnzpCRH
L7OTng+xnrZOlBQHtPvs
=L+sY
-----END PGP SIGNATURE-----

--SbbROFN+SMqu6LKU--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151117175731.GY2604>