Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Oct 2004 00:36:01 +0800
From:      Xin LI <delphij@frontfree.net>
To:        Pavel Merdine <freebsd-fs@merdin.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: panic again
Message-ID:  <20041026163601.GA2172@frontfree.net>
In-Reply-To: <1038372273.20041026140339@merdin.com>
References:  <1038372273.20041026140339@merdin.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--azLHFNyN32YCQGCU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello, Pavel,

First off, I'm sorry to hear your terrible story.

On Tue, Oct 26, 2004 at 02:03:39PM +0400, Pavel Merdine wrote:
>   I'd like to start discussions about panic and fs over again.
>=20
>   Since  yesterday  one  of  our  servers is periodically being hit by
>   various panics. It seems that it relates to only one partition.
>   I  thought maybe it's a faulty HDD and copied the content to another
>   HDD. I done it using dd. After one hour the server had panic again:
> Oct  26  05:11:54  images8  /kernel:  mode =3D 040700, inum =3D 2382, fs =
=3D
> /mountpount
> Oct 26 05:11:54 images8 /kernel: panic: ffs_valloc: dup alloc
> Oct 26 05:11:54 images8 /kernel:
> Oct 26 05:11:54 images8 /kernel: syncing disks... 49 12
> Oct 26 05:11:54 images8 /kernel: done

It's wrong to use dd'ed disk IMHO.  A better solution for your situation
might be:
	- dd(1) the bad disk to a disk having same size.
	- do a fsck -fy on the copy.
	- mount the copy read-only
	- use tar(1) or dump to dump the data out the copy, then restore
	  to another disk.

>   Dispite  the  line  "syncing  disks", all disks were not clean after
>   reboot:
>=20
> Oct 26 05:11:56 images8 /kernel: WARNING: R/W mount of /mountp denied.
> Filesystem is not clean - run fsck
>=20
>   I  do not mount with -f because softupdates didn't show good results
>   and made system panic more and more times.
>=20
>   My  question  is:  Is  there  any  future  in  FFS,  it's panics and
>   non-working softupdates?

Frankly, the ONLY way to guarantee data integrity is more backups.
How can you expect your operating system to run correctly when it is
running on defective hardware?

>   I dont see any reliability in such system.
>   But what I remember is high reliability of MS NTFS. I didn't see any
>   disk  checks  after any failure and I didn't experience a file loss.
>   And  I didn't see it's popular "blue screen" with an error caused by
>   filesystem code.

Accusing other systems is not what we feel beneficial because we don't
sell FreeBSD, and our interest is to improve *our* system and make it
even better.  Nothing but backup can guarantee that you survive from a
hardware failure.

The intention of panic() in our code is to stop the operating system
before it can make more damage.  We feel that your data is more important
than "pretending to work" but silently damage your data.

>   I  think  that FreeBSD has no future without a reliable FS and clean
>   code for it.

There are many efforts focusing the storage system and file system, while
we still need more manpower to work on it.

>   BTW:   We  user  FreeBSD  4.10,  I talk about IDE drives. I tried to
>   switch IDE write cache off, but ffs does not work better.

Turning IDE write cache off will give the system more opportunity to
survive a system/power failure.  It does not necessarily that you need
it when you have UPS, and it does not guarantee that you survive a
hardware failure.

>   Sorry  if  I wrote too long letter. Our company is just tired of the
>   problems related to all of this.

User experience is important, but again, we need more details, manpower.

>   And,  by  the  way, FFS code still have a divide by integer error in
>   dirpref().  I  tried  to  report  it two times, I saw it reported in
>   lists, but nobody cares :( . No future.

In order to get your problem tracked down, you need to provide more
information.  A good start is to set a dump area (e.g. if your swap
space is bigger than your RAM, then you can use that) and obtain a
backtrace then send it to list or send-pr(1).

I have some patches agains the FFS code however they are related to
file systems that are very big, so I am not sure if it will fix your
problem.

Instruction on obtaining a backtrace is available at:
	http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerne=
ldebug.html#KERNELDEBUG-OBTAIN

Cheers,
--=20
Xin LI <delphij frontfree net>	http://www.delphij.net/
See complete headers for GPG key and other information.


--azLHFNyN32YCQGCU
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (FreeBSD)

iD8DBQFBfnzx/cVsHxFZiIoRAvnoAJ4kyVYhcrSnHHxdGExlFCZ1uROiJQCfcECL
i6xnxOPTI9yTjJEqTAkIrv4=
=QsS/
-----END PGP SIGNATURE-----

--azLHFNyN32YCQGCU--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041026163601.GA2172>