From owner-freebsd-current  Sat Oct 19  2: 5:18 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 953A537B401; Sat, 19 Oct 2002 02:05:16 -0700 (PDT)
Received: from haystack.lclark.edu (haystack.lclark.edu [149.175.1.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 8EBD843E88; Sat, 19 Oct 2002 02:05:06 -0700 (PDT)
	(envelope-from eta@lclark.edu)
Received: from copeland-30-191.lclark.edu (anholt@copeland-30-191.lclark.edu [149.175.30.191])
	by haystack.lclark.edu (8.9.3/8.9.3) with ESMTP id BAA13336;
	Sat, 19 Oct 2002 01:57:03 -0700 (PDT)
Subject: Re: X problems & 5.0... -RELEASE?
From: Eric Anholt <eta@lclark.edu>
To: Eric Anholt <eta@lclark.edu>
Cc: Kris Kennaway <kris@obsecurity.org>,
	Wesley Morgan <morganw@chemikals.org>, current@FreeBSD.ORG,
	Maxim Sobolev <sobomax@FreeBSD.ORG>
In-Reply-To: <1034575226.3020.75.camel@anholt.dyndns.org>
References: <20021013231430.F92271-100000@volatile.chemikals.org> 
	<20021014041422.GA31437@xor.obsecurity.org> 
	<1034575226.3020.75.camel@anholt.dyndns.org>
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
X-Mailer: Ximian Evolution 1.0.8 
Date: 19 Oct 2002 01:57:02 -0700
Message-Id: <1035017831.882.25.camel@anholt.dyndns.org>
Mime-Version: 1.0
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

On Sun, 2002-10-13 at 23:00, Eric Anholt wrote:
> On Sun, 2002-10-13 at 21:14, Kris Kennaway wrote:
> > On Sun, Oct 13, 2002 at 11:28:51PM -0400, Wesley Morgan wrote:
> > 
> > > I know there is some work being done on the recent signal changes to fix
> > > some things, but are we sure this is the problem? I would hate to see
> > > release schedules pushed back because these problems are lost in the
> > > noise, and I can't see a release being made that has a known unstable X.
> > 
> > I thought this was believed to be a bug in X that was exposed by
> > kernel changes.
> > 
> > Kris

> Could anyone who is having stability issues with X please email me
> privately if they are using either -current before September or 
> -stable?  If not, without some sort of hints of where an issue really
> is, I'm going to chalk this up to kernel bugs.

Just to let people know what's going on with this on my end: I've got my
laptop up to a fresh kernel, world, and X as of 10/17 or so.  I've got a
reproducible X server crash with XFree86 + glxgears alone  (DRI
disabled).  I'm working on getting backtraces to see if anything useful
can be produced.  However, gdb521 is crashing if I start XFree86 from it
(gdbing that gdb produced only silliness -- gbs exiting semicleanly or
senseless backtraces).  gdb521 can attach to a running XFree86 fine
apparently, but then it doesn't get the module info.

On my -stable box, gdb521 appears to start XFree86 fine, but on stable
(and current iirc) ^Cing in gdb results in nothing happening and needing
to kill the gdb or the XFree86 because they go unresponsive.

If I can get gdb52 to be useful, I'll add a patch to XFree86-4-Server
(and dri-devel maybe?) to compile debuggable X Server/modules and
install them properly.

From the reports:

Both stable and current users get the "self-healing" hang, where the X
server responds to nothing but the mouse moves, and at some minutes
later time it continues and responds to those actions.  One person said
they'd had this since at least current in July.

Folks with kernels later than a couple weeks ago get X crashes all the
time.  Updated world wasn't necessary to get it (in my case), updated
world+kernel didn't help (others), and updated world+kernel+X didn't
help (my case, too).  

Note that just about any X crash will result in a signal six reported by
the kernel, because one X crash causes another while it tries to recover
(reset to the console, etc), and on the second crash that gets caught it
aborts.  To see what started the mess, look at the console output from
startx if you used startx.  You're looking for the first "Fatal error"
-- later stuff is trying to recover from that crash that got caught.  If
you use xdm, it's in your /var/log/xdm-errors iirc.

If you blame this on type1/bezier, make sure you actually have an error
message about bezier or something else in your log before the abort. 
All of the type1 module's aborts have a reason printed before the abort.

-- 
Eric Anholt <eta@lclark.edu>
http://people.freebsd.org/~anholt/dri/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message