From owner-freebsd-alpha@FreeBSD.ORG Mon Aug 21 19:56:31 2006 Return-Path: X-Original-To: freebsd-alpha@FreeBSD.org Delivered-To: freebsd-alpha@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A7B5516A4FE for ; Mon, 21 Aug 2006 19:56:31 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id A0FA743DE0 for ; Mon, 21 Aug 2006 19:55:34 +0000 (GMT) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (linimon@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k7LJt8ou062398 for ; Mon, 21 Aug 2006 19:55:08 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k7LJt7d9062394 for freebsd-alpha@FreeBSD.org; Mon, 21 Aug 2006 19:55:07 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 21 Aug 2006 19:55:07 GMT Message-Id: <200608211955.k7LJt7d9062394@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: linimon set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-alpha@FreeBSD.org Cc: Subject: Current problem reports assigned to you X-BeenThere: freebsd-alpha@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Alpha List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Aug 2006 19:56:31 -0000 Current FreeBSD problem reports Critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o alpha/75317 alpha [ata] [busdma] ATA DMA broken on PCalpha 1 problem total. Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o alpha/47952 alpha DEFPA causes machine check with V5.0-release o alpha/59116 alpha [ntfs] mount_ntfs of a Windows 2000-formatted fs cause o alpha/61940 alpha Can't disklabel new disk from FreeBSD/alpha 5.2-RELEAS o alpha/61973 alpha Machine Check on boot-up of AlphaServer 2100A RM s alpha/67626 alpha X crashes an alpha machine, resulting reboot o alpha/85346 alpha PREEMPTION causes unstability in Alpha4000 SMP kernel 6 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o alpha/25284 alpha PC164 won't reboot with graphics console o alpha/38031 alpha osf1.ko not loaded during boot-time of linux-emu enabl o alpha/48676 alpha Changing the baud rate of serial consoles for Alpha sy o alpha/50868 alpha fd0 floppy device is not mapped into /dev (XP1000) Fre o alpha/66478 alpha unexpected machine check: panic for 4.9, 4.10, 5.2 or o alpha/67903 alpha hw.chipset.memory: 1099511627776 - thats way to much : 6 problems total. From owner-freebsd-alpha@FreeBSD.ORG Tue Aug 22 17:01:48 2006 Return-Path: X-Original-To: freebsd-alpha@freebsd.org Delivered-To: freebsd-alpha@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BCFCB16A4E0 for ; Tue, 22 Aug 2006 17:01:48 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id 530FC43D81 for ; Tue, 22 Aug 2006 17:01:24 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id k7MH1KlX047670; Tue, 22 Aug 2006 13:01:20 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-alpha@freebsd.org Date: Tue, 22 Aug 2006 10:35:21 -0400 User-Agent: KMail/1.9.1 References: <877j19oe9i.wl%rand@meridian-enviro.com> In-Reply-To: <877j19oe9i.wl%rand@meridian-enviro.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200608221035.22244.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Tue, 22 Aug 2006 13:01:21 -0400 (EDT) X-Virus-Scanned: ClamAV 0.88.3/1708/Tue Aug 22 08:43:00 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-3.2 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00, PERCENT_RANDOM autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: bryanh@meridian-enviro.com, pedersen@meridian-enviro.com Subject: Re: Problems with UP2000+ X-BeenThere: freebsd-alpha@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Alpha List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Aug 2006 17:01:48 -0000 On Tuesday 15 August 2006 17:55, Douglas K. Rand wrote: > We've got a Microway UP2000+ system that's been working just fine for > the last year. That is, until it seems to have developed some hardware > related problems. It started with: > > dc0: watchdog timeout > dc0: watchdog timeout > dc0: watchdog timeout > dc0: watchdog timeout > dc0: watchdog timeout > dc0: watchdog timeout > dc0: watchdog timeout > ahc0: Timedout SCBs already complete. Interrupts may not be functioning. > ahc0: Timedout SCBs already complete. Interrupts may not be functioning. > dc0: watchdog timeout > dc0: watchdog timeout > > Interestingly the system doesn't crash or completely hang. It stops > for a bit, considers the answer to the ultimate question (it isn't > fast enough to think about the actual question) and then works for a > few minutes. Rinse and repeat. > > And then a few hours later it started having SCSI problems: > > ahc0: Recovery Initiated > >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< > ahc0: Dumping Card State while idle, at SEQADDR 0x18 > Card was paused > ACCUM = 0x68, SINDEX = 0x48, DINDEX = 0xe4, ARG_2 = 0x1a > HCNT = 0x0 SCBPTR = 0x68 > SCSISIGI[0xa6]:(REQI|BSYI|MSGI|CDI) ERROR[0x0] SCSIBUSL[0x0] > LASTPHASE[0x1]:(P_BUSFREE) SCSISEQ[0x1a]:(ENAUTOATNP|ENAUTOATNO|ENRSELI) > SBLKCTL[0xa]:(SELWIDE|SELBUSB) SCSIRATE[0x0] SEQCTL[0x10]:(FASTMODE) > SEQ_FLAGS[0xc0]:(NO_CDB_SENT|NOT_IDENTIFIED) SSTAT0[0x0] > SSTAT1[0x13]:(REQINIT|PHASECHG|PHASEMIS) SSTAT2[0x0] > SSTAT3[0x0] SIMODE0[0x8]:(ENSWRAP) SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST| ENSELTIMO) > SXFRCTL0[0x80]:(DFON) DFCNTRL[0x0] DFSTATUS[0x89]:(FIFOEMP|HDONE| PRELOAD_AVAIL) > STACK: 0x0 0x154 0x16a 0x17 > SCB count = 192 > Kernel NEXTQSCB = 107 > Card NEXTQSCB = 107 > QINFIFO entries: > Waiting Queue entries: 104:104 > Disconnected Queue entries: > QOUTFIFO entries: > Sequencer Free SCB List: > Sequencer SCB Info: > > Well, first thing we tried was to replace the NIC. Got a fxp from the > shelf and tried that. It took 5 hours for it to have problems: > > ahc0: Timedout SCBs already complete. Interrupts may not be functioning. > ahc0: Timedout SCBs already complete. Interrupts may not be functioning. > fxp0: device timeout > fxp0: device timeout > > I had heard that the onboard SCSI sometimes go bad on these > motherboards, so I grabbed an Adaptec 2940UW from the shelf and tried > that. (Lucky for me the BIOS was "new" enough to be able to boot from > the 2940UW.) That lasted about 57 hours, but still ended up with the > same problem: > > fxp0: device timeout > ahc1: Timedout SCBs already complete. Interrupts may not be functioning. > ahc1: Timedout SCBs already complete. Interrupts may not be functioning. > fxp0: device timeout > ahc1: Timedout SCBs already complete. Interrupts may not be functioning. > ahc1: Timedout SCBs already complete. Interrupts may not be functioning. > ahc1:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET > SAVED_SCSIID == 0x17, SAVED_LUN == 0x0, ARG_1 == 0x17 ACCUM = 0x0 > SEQ_FLAGS == 0xc0, SCBPTR == 0x6, BTT == 0xff, SINDEX == 0x31 > SCSIID == 0x17, SCB_SCSIID == 0x17, SCB_LUN == 0x0, SCB_TAG == 0xff, SCB_CONTROL == 0x0 > SCSIBUSL == 0x17, SCSISIGI == 0xe6 > SXFRCTL0 == 0x88 > SEQCTL == 0x10 > > We are now in the process of trying different PCI slots for things, so > far with out any luck. And trying the system with one of the three > power supplies turned off. It sounds like interrupts have stopped working. A couple of questions for you: 1) Does it still happen if you disable SMP (set kern.smp.disabled=1 in the loader to test)? 2) Does it still happen if you remove PREEMPTION from your kernel config? (Can't recall if that was removed in 6.x on Alpha before or after 6.1) -- John Baldwin From owner-freebsd-alpha@FreeBSD.ORG Tue Aug 22 17:22:42 2006 Return-Path: X-Original-To: freebsd-alpha@freebsd.org Delivered-To: freebsd-alpha@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9368216A4E7; Tue, 22 Aug 2006 17:22:42 +0000 (UTC) (envelope-from rand@meridian-enviro.com) Received: from newman.meridian-enviro.com (newman.meridian-enviro.com [207.109.235.166]) by mx1.FreeBSD.org (Postfix) with ESMTP id B258843D5F; Tue, 22 Aug 2006 17:22:41 +0000 (GMT) (envelope-from rand@meridian-enviro.com) Received: from delta.meridian-enviro.com (delta.meridian-enviro.com [10.10.10.43]) by newman.meridian-enviro.com (8.13.1/8.13.1) with ESMTP id k7MHM21B020308; Tue, 22 Aug 2006 12:22:02 -0500 (CDT) (envelope-from rand@meridian-enviro.com) Date: Tue, 22 Aug 2006 12:22:02 -0500 Message-ID: <87oducit39.wl%rand@meridian-enviro.com> From: "Douglas K. Rand" To: John Baldwin In-Reply-To: <200608221035.22244.jhb@freebsd.org> References: <877j19oe9i.wl%rand@meridian-enviro.com> <200608221035.22244.jhb@freebsd.org> User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.6 Emacs/21.3 (i386--freebsd) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Virus-Scanned: ClamAV 0.88/1708/Tue Aug 22 07:43:00 2006 on newman.meridian-enviro.com X-Virus-Status: Clean Cc: bryanh@meridian-enviro.com, pedersen@meridian-enviro.com, freebsd-alpha@freebsd.org Subject: Re: Problems with UP2000+ X-BeenThere: freebsd-alpha@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Alpha List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Aug 2006 17:22:42 -0000 Doug> We've got a Microway UP2000+ system that's been working just fine for Doug> the last year. That is, until it seems to have developed some hardware Doug> related problems. John> It sounds like interrupts have stopped working. Yup. And now it is simply stopped working all together. I think the motherboard is busted. The system won't even boot now, not even into SRM. Both the video and serial console (which I normally use exclusively) are completely quiet when power is applied. All the fans turn, and the correct LEDs light. But thats it. I've tried swapping RAM, with only one (and then the other) CPU, and no PCI cards, and with different power supply configurations (N+1 config with 3 hot-swap power supplies) all with no luck. Currently the system is dead and we've moved its responsibilities someplace else. John> A couple of questions for you: John> 1) Does it still happen if you disable SMP (set John> kern.smp.disabled=1 in the loader to test)? Sorry, can't test this. :) John> 2) Does it still happen if you remove PREEMPTION from your John> kernel config? (Can't recall if that was removed in 6.x on John> Alpha before or after 6.1) All I can say is that PREEMPTION was in the kernel config. Thanks for the reply. At this point we are assuming that the mobo is busted and not worth fixing. (Unless somebody has the right incantation.)