Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 03 Dec 2015 15:07:48 -0800
From:      Kirk McKusick <mckusick@mckusick.com>
To:        Mateusz Guzik <mjguzik@gmail.com>
Cc:        Rick Macklem <rmacklem@uoguelph.ca>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: panic "ffs_checkblk: bad block" on recent -head kernels
Message-ID:  <201512032307.tB3N7mMl001027@chez.mckusick.com>
In-Reply-To: <20151203224752.GA19134@dft-labs.eu>

next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Thu, 3 Dec 2015 23:47:52 +0100
> From: Mateusz Guzik <mjguzik@gmail.com>
> To: Rick Macklem <rmacklem@uoguelph.ca>
> Cc: FreeBSD Current <freebsd-current@freebsd.org>
> Subject: Re: panic "ffs_checkblk: bad block" on recent -head kernels
> =

> On Thu, Dec 03, 2015 at 05:08:27PM -0500, Rick Macklem wrote:
>> Hi,
>> =

>> I get a fairly reproducible panic when doing a full kernel build
>> on a 256Mbyte single core i386 when running recent kernels from -head.
>> =

>> The panic is "ffs_checkblk: bad block ..". I don't actually have the
>> block # (although I think it's just 0xfffffffffffffff, given the backtr=
ace),
>> because it runs off the screen. (I looked up the message via the debugg=
er
>> from the first arg. to panic.)
>> =

>> Here's the backtrace without all the numbers:
>> panic(c14f4b55, ffffffff, ffffffff, 0, 64,...)
>> ffs_checkblk(ffffffff, 8000, fffffff9c, ffffffff, c4a02454,...)
>> ffs_reallocblks
>> VOP_REALLOCBLKS_APV
>> cluster_write
>> ffs_write
>> VOP_WRITE_APV
>> vn_write
>> vn_io_fault_doio
>> vn_io_fault1
>> vn_io_fault
>> dofilewrite
>> kern_writev
>> sys_write
>> syscall
>> =

>> It doesn't happen on a kernel dated Sep. 30, but does happen on a Nov. =
30 one.
>> (I was away from home, so I didn't upgrade kernels for 2 months.)
>> =

>> I am slowly doing a binary search for the first kernel rev. where it oc=
curs,
>> but since each build takes hours, it's going to take a while;-).
>> =

>> At this point, it doesn't appear to happen on r289278 (just before jeff=
@'s buffer
>> cache patch).
>> With kernels between r289279-->r290480, I get into the "R" state that
>> was fixed by r290481 before I get a crash.
>> I tried reverting r289405 and r290047 from a recent kernel and the cras=
hes still
>> occurred, so it doesn't appear to be these commits.
>> =

>> I am currently testing r290481 to see if the crash occurs for this rev.
>> =

>> If anyone has some insight into which commit might cause this,
>> please let me know.
> =

> Well, did it crash with r291460 or later?
> =

> If so, try the kernel just before that and if that helps, try:
> =

> diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
> index ff37de8..0ad6ef7 100644
> --- a/sys/kern/vfs_subr.c
> +++ b/sys/kern/vfs_subr.c
> @@ -2783,6 +2783,7 @@ _vdrop(struct vnode *vp, bool locked)
>         vp->v_op =3D NULL;
>  #endif
>         bzero(&vp->v_un, sizeof(vp->v_un));
> +       vp->v_lasta =3D vp->v_clen =3D vp->v_cstart =3D vp->v_lastw =3D =
0;
>         vp->v_iflag =3D 0;
>         vp->v_vflag =3D 0;
>         bo->bo_flag =3D 0;
> =

> -- =

> Mateusz Guzik <mjguzik gmail.com>

I concur with trying this suggestion. starting with r291460 these
fields were no longer zero'ed when allocating the vnode. So you may
have some residual values in there that are causing trouble.

	Kirk McKusick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201512032307.tB3N7mMl001027>