From owner-freebsd-stable@FreeBSD.ORG Fri Sep 1 16:42:41 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9773116A4DA for ; Fri, 1 Sep 2006 16:42:41 +0000 (UTC) (envelope-from hausen@punkt.de) Received: from kagate.punkt.de (kagate.punkt.de [217.29.33.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9484643D68 for ; Fri, 1 Sep 2006 16:42:40 +0000 (GMT) (envelope-from hausen@punkt.de) Received: from hugo10.ka.punkt.de (hugo10.ka.punkt.de [10.0.0.110]) by kagate1.punkt.de with ESMTP id k81Ggd3w028399 for ; Fri, 1 Sep 2006 18:42:39 +0200 (CEST) Received: from hugo10.ka.punkt.de (localhost [127.0.0.1]) by hugo10.ka.punkt.de (8.12.10/8.12.10) with ESMTP id k81Ggca9067259 for ; Fri, 1 Sep 2006 18:42:38 +0200 (CEST) (envelope-from ry93@hugo10.ka.punkt.de) Received: (from ry93@localhost) by hugo10.ka.punkt.de (8.12.10/8.12.10/Submit) id k81Ggcup067258 for freebsd-stable@freebsd.org; Fri, 1 Sep 2006 18:42:38 +0200 (CEST) (envelope-from ry93) Date: Fri, 1 Sep 2006 18:42:38 +0200 From: "Patrick M. Hausen" To: freebsd-stable@freebsd.org Message-ID: <20060901164238.GA66726@hugo10.ka.punkt.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.10i Subject: LSI/amr driver controller cache problem? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Sep 2006 16:42:41 -0000 Hi, all! We just set a brand new Intel SSR212CC box into production. This is basically a standard server with 2 LSI SATA RAID controllers and 12 drive bays in 2 rack units height. Intel sells it as a storage product. There's a variant of Windows 2003 server that turns this box into an iSCSI target. We want to use it for disk based backup with Amanda. The system runs 6-STABLE at the moment. amr0: mem 0xfbef0000-0xfbefffff, 0xfcd00000-0xfcdfffff irq 72 at device 14.0 on pci6 amr0: Firmware 814C, BIOS H431, 128MB RAM amr1: mem 0xfbff0000-0xfbffffff, 0xfcf00000-0xfcffffff irq 96 at device 14.0 on pci8 amr1: Firmware 814C, BIOS H431, 128MB RAM amrd0: on amr0 amrd0: 1907348MB (3906248704 sectors) RAID 5 (optimal) Since the two RAID controllers come with a battery backup for their cache memory, I configured the logical drive with write back cache policy and the individual disk drives' write caches off. After cvsup and build/installworld, I noticed strange Sendmail failures (signal 11) on the box. Reinstalling Sendmail fixed the problem. Just to make sure I did installworld again, rebooted - Sendmail signal 11. Then it dawned at me that Sendmail is the last binary installed and written to the logical drive in the installworld process. I can reproduce the problem any time: installworld, reboot, Sendmail broken. Installworld or just reinstall Sendmail, don't reboot, everything's fine. No matter if I use "reboot" or "shutdown -r". Is it possible that the amr driver does not issue the necessary flush command to the controller (probably first part of the problem) and additionally the controller loses it's cache content at the following system reset despite it's BBU (second part of problem - iir controllers by ICP Vortex handle a system reset just fine, syncing the drives during boot)? And ideas? I don't have a different explanation. A coworker suggested a possible yet unknown UFS2 problem with large filesystems, but /usr is not large on this box. /var is. The last couple of writes before a system reboot are lost. Reliably. I will set the controller's cache policy back to "write through", but I'm still not sleeping well ... Thanks, Patrick P.S. As a side note: no problems at all with the em(4) driver so far on this one. -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25 Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de