Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Jun 2006 14:08:12 +0200
From:      Stanislaw Halik <sthalik@tehran.lain.pl>
To:        freebsd-stable@freebsd.org
Subject:   Re: trap 12: supervisor write, page not present on 6.1-STABLE Tue May 16 2006
Message-ID:  <20060630120812.GA2380@tehran.lain.pl>
In-Reply-To: <20060628101405.I50845@fledge.watson.org>
References:  <20060627045310.GA6324@tehran.lain.pl> <20060627140946.J273@fledge.watson.org> <20060627134134.GA23337@tehran.lain.pl> <20060628101405.I50845@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--AqsLC8rIMeq19msA
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jun 28, 2006, Robert Watson wrote:

>>>> 6.1-STABLE crashed on me. I'm providing a backtrace. Could any of you,
>>>> experienced people, suggest me if it's a hardware problem or is it an
>>>> error inside the OS?
>>> This is a known bug in the TCP code; a large set of outstanding changes=
=20
>>> is present in 7.x that will fix the problem when merged.  However, I=20
>>> recently had push-back on merging the larger batch of changes, so am=20
>>> looking at merging a workaround that will also correct the problem=20
>>> without the larger set of architectural changes.  I hope to have a chan=
ce=20
>>> to look at that in detail this weekend.

>> I'm glad to know that it isn't either unknown or hardware-related. Thank=
=20
>> you for your prompt reply!

> Per my earlier e-mail, I had hoped to merge a larger set of changes from=
=20
> HEAD that resolve the underlying problem here (that inpcb's can be detach=
ed=20
> from a socket while the socket is still in use), but right now I'm=20
> deferring merging those changes as they are somewhat risky (as they are=
=20
> large).  Instead, I've produced a candidate work-around patch, now attach=
ed=20
> to kern/97095.  This does not fix the underlying problem, but seeks to=20
> narrow the window for the race to be exercised by avoiding caching a=20
> volatile pointer across user memory copying, which under load can result =
in=20
> blocking I/O.  I would be quite interested in knowing if this resolves th=
e=20
> problem in practice -- if so, it's a definite short-term merge candidate =
to=20
> reduce the symptoms of this problem until the proper fix can be merged.

Unfortunately, it still happens to crash in the same code path:

(kgdb) up 7
#7  0xc058e947 in ip_ctloutput (so=3D0x0, sopt=3D0xd67f2c80) at
/usr/src/sys/netinet/ip_output.c:1216
1216                                    inp->inp_ip_tos =3D optval;
(kgdb) l /usr/src/sys/netinet/ip_output.c:1216
1211                                    break;
1212
1213                            inp =3D sotoinpcb(so);
1214                            switch (sopt->sopt_name) {
1215                            case IP_TOS:
1216                                    inp->inp_ip_tos =3D optval;
1217                                    break;
1218
1219                            case IP_TTL:
1220                                    inp->inp_ip_ttl =3D optval;
(kgdb) p inp
$1 =3D (struct inpcb *) 0x0

I'll be happy to test any other patches when they're available.

--AqsLC8rIMeq19msA
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (FreeBSD)

iD8DBQFEpRQradU+vjT62TERAjJrAJ0bWpv8wC6K2BAelp8POEoXqYmgigCfbJUG
aQLckZG3f03/qf3S8mXQsAw=
=9bjz
-----END PGP SIGNATURE-----

--AqsLC8rIMeq19msA--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060630120812.GA2380>