From owner-freebsd-hackers@freebsd.org Sun Jun 11 23:09:36 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF17DBEF18C for ; Sun, 11 Jun 2017 23:09:36 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-155.reflexion.net [208.70.211.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 955C07A158 for ; Sun, 11 Jun 2017 23:09:35 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 26194 invoked from network); 11 Jun 2017 23:02:55 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 11 Jun 2017 23:02:55 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sun, 11 Jun 2017 19:02:55 -0400 (EDT) Received: (qmail 13073 invoked from network); 11 Jun 2017 23:02:54 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 11 Jun 2017 23:02:54 -0000 Received: from [192.168.1.114] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 33979EC8074; Sun, 11 Jun 2017 16:02:54 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: A different 32-bit powerpc head -r317820 panic on old PowerMac G5: dual backtraces from "timeout stopping cpus" (dump failed though): any comments? Date: Sun, 11 Jun 2017 16:02:53 -0700 References: <1F1E52BD-375E-47CC-BF06-ECB1092121B4@dsl-only.net> To: Justin Hibbits , Nathan Whitehorn , FreeBSD PowerPC ML , freebsd-hackers@freebsd.org In-Reply-To: <1F1E52BD-375E-47CC-BF06-ECB1092121B4@dsl-only.net> Message-Id: <29CCA1EC-242D-42E7-97E9-6F2F67178DF3@dsl-only.net> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Jun 2017 23:09:37 -0000 On 2017-Jun-6, at 11:09 AM, Mark Millard wrote: > . . . > FYI: I'm currently doing an approximate > binary search for localizing part of the panic problem. This effort failed. More after the reminder of the technique as it was when I started to try this. > This is based on the classic panics that are instead > from jumping to a non-code area. . . > > At a given point in my other experiments I was > getting: > > srr0=0x90a0f0 etext+0xb8fc > > Adding (unused) code somewhat before that etext > (so increasing etext) got: > > srr0=0x90a0f0 etext+0xb8a8 > (The additional code was larger than I now use.) > > But instead adding some code earlier (by around > 0x100000 in this example) got: > > srr0=0x90a110 etext+0xb8fc > > So comparing to the starting conditions in > each case: > > The bad-address accessed in one case stayed > constant but the etext offset decreased: in essence > the only thing that happened is etext increased > (matching the offset decrease). > > In the other case the etext offset stayed constant > but the bad-address and etext increased by the > same amount. > > . . . > > Currently I'm adding code by adding: > > void HACKISH_EXTRA_CODE(void) {} > > to one .c file from /usr/src/sys/. . . based which > file gets to within a ballpark of a more accurate > binary search position. (Large binary search > jumps currently: I'm not being picky about where > in the .c the addition is made yet.) The reason for the failure is that the behavioral changes and failure modes changed depending where HACKISH_EXTRA_CODE was added (over a very wide span of addresses for where the code was tried). Overall I was unable to have a criteria for picking between larger addresses and smaller addresses in the search in a way that targeted getting near a boundary having two specific, distinct behaviors on each side of the boundary. Also adding code to panic instead of accessing or changing inappropriate memory for failures seen in some failures again changed the behavior observed, no longer accessing or corrupting the same way. So for the binary search I had to revert such extra problem-detection code. Very memory-layout dependent. At this point I'm not hopeful of providing any better evidence than I have in my various prior list messages. I doubt anyone can pick anything out based on just those from the last several weeks. At most if something is noticed the reports might be able to be checked for "would this now identified code-problem have possibly contributed to those reports?". (Even that use seems unlikely.) === Mark Millard markmi at dsl-only.net