From owner-freebsd-geom@FreeBSD.ORG Sun Feb 10 14:30:06 2008 Return-Path: Delivered-To: freebsd-geom@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A491B16A421 for ; Sun, 10 Feb 2008 14:30:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8B1D213C45A for ; Sun, 10 Feb 2008 14:30:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m1AEU6n9014099 for ; Sun, 10 Feb 2008 14:30:06 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m1AEU6NQ014096; Sun, 10 Feb 2008 14:30:06 GMT (envelope-from gnats) Date: Sun, 10 Feb 2008 14:30:06 GMT Message-Id: <200802101430.m1AEU6NQ014096@freefall.freebsd.org> To: freebsd-geom@FreeBSD.org From: Volker Cc: Subject: Re: bin/110705: gmirror control utility does not exit with correct exit status X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Volker List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Feb 2008 14:30:06 -0000 The following reply was made to PR bin/110705; it has been noted by GNATS. From: Volker To: bug-followup@FreeBSD.org, tom@tomjudge.com Cc: Subject: Re: bin/110705: gmirror control utility does not exit with correct exit status Date: Sun, 10 Feb 2008 15:28:28 +0100 MFC to RELENG_6 missing! If done, this PR can be closed. From owner-freebsd-geom@FreeBSD.ORG Sun Feb 10 14:58:09 2008 Return-Path: Delivered-To: freebsd-geom@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 233DB16A468; Sun, 10 Feb 2008 14:58:09 +0000 (UTC) (envelope-from rafan@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0DFB013C4EE; Sun, 10 Feb 2008 14:58:09 +0000 (UTC) (envelope-from rafan@FreeBSD.org) Received: from freefall.freebsd.org (rafan@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m1AEw8Vx015479; Sun, 10 Feb 2008 14:58:08 GMT (envelope-from rafan@freefall.freebsd.org) Received: (from rafan@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m1AEw89G015475; Sun, 10 Feb 2008 14:58:08 GMT (envelope-from rafan) Date: Sun, 10 Feb 2008 14:58:08 GMT Message-Id: <200802101458.m1AEw89G015475@freefall.freebsd.org> To: tom@tomjudge.com, rafan@FreeBSD.org, freebsd-geom@FreeBSD.org From: rafan@FreeBSD.org Cc: Subject: Re: bin/110705: gmirror control utility does not exit with correct exit status X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Feb 2008 14:58:09 -0000 Synopsis: gmirror control utility does not exit with correct exit status State-Changed-From-To: patched->closed State-Changed-By: rafan State-Changed-When: Sun Feb 10 14:58:08 UTC 2008 State-Changed-Why: Patch committed in RELENG_[67] and HEAD. Thanks!. http://www.freebsd.org/cgi/query-pr.cgi?pr=110705 From owner-freebsd-geom@FreeBSD.ORG Sun Feb 10 14:58:38 2008 Return-Path: Delivered-To: freebsd-geom@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 06CC816A41B; Sun, 10 Feb 2008 14:58:38 +0000 (UTC) (envelope-from rafan@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id E559B13C47E; Sun, 10 Feb 2008 14:58:37 +0000 (UTC) (envelope-from rafan@FreeBSD.org) Received: from freefall.freebsd.org (rafan@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m1AEwbdq015526; Sun, 10 Feb 2008 14:58:37 GMT (envelope-from rafan@freefall.freebsd.org) Received: (from rafan@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m1AEwbWi015522; Sun, 10 Feb 2008 14:58:37 GMT (envelope-from rafan) Date: Sun, 10 Feb 2008 14:58:37 GMT Message-Id: <200802101458.m1AEwbWi015522@freefall.freebsd.org> To: tom@tomjudge.com, rafan@FreeBSD.org, freebsd-geom@FreeBSD.org From: rafan@FreeBSD.org Cc: Subject: Re: bin/110705: gmirror control utility does not exit with correct exit status X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Feb 2008 14:58:38 -0000 Synopsis: gmirror control utility does not exit with correct exit status State-Changed-From-To: closed->patched State-Changed-By: rafan State-Changed-When: Sun Feb 10 14:58:22 UTC 2008 State-Changed-Why: Oops, this requires a MFC to 6 http://www.freebsd.org/cgi/query-pr.cgi?pr=110705 From owner-freebsd-geom@FreeBSD.ORG Mon Feb 11 11:07:05 2008 Return-Path: Delivered-To: freebsd-geom@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C7DB16A418 for ; Mon, 11 Feb 2008 11:07:05 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 352E413C474 for ; Mon, 11 Feb 2008 11:07:05 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m1BB75j9007384 for ; Mon, 11 Feb 2008 11:07:05 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m1BB742a007380 for freebsd-geom@FreeBSD.org; Mon, 11 Feb 2008 11:07:04 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 11 Feb 2008 11:07:04 GMT Message-Id: <200802111107.m1BB742a007380@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-geom@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-geom@FreeBSD.org X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Feb 2008 11:07:05 -0000 Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/73177 geom kldload geom_* causes panic due to memory exhaustion o kern/76538 geom [gbde] nfs-write on gbde partition stalls and continue o kern/83464 geom [geom] [patch] Unhandled malloc failures within libgeo o kern/84556 geom [geom] GBDE-encrypted swap causes panic at shutdown o kern/87544 geom [gbde] mmaping large files on a gbde filesystem deadlo s kern/89102 geom [geom_vfs] [panic] panic when forced unmount FS from u o bin/90093 geom fdisk(8) incapable of altering in-core geometry o kern/90582 geom [geom_mirror] [panic] Restore cause panic string (ffs_ o kern/98034 geom [geom] dereference of NULL pointer in acd_geom_detach o kern/104389 geom [geom] [patch] sys/geom/geom_dump.c doesn't encode XML o kern/113419 geom [geom] geom fox multipathing not failing back o kern/113957 geom [gmirror] gmirror is intermittently reporting a degrad o kern/115572 geom [gbde] [patch] gbde partitions fail at 28bit/48bit LBA o kern/120021 geom net-p2p/qbittorrent crashes system when it works thoug o kern/120231 geom [geom] GEOM_CONCAT error adding second drive 15 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o bin/78131 geom gbde "destroy" not working. o kern/79251 geom [2TB] newfs fails on 2.6TB gbde device o kern/94632 geom [geom] Kernel output resets input while GELI asks for f kern/105390 geom [geli] filesystem on a md backed by sparse file with s o kern/107707 geom [geom] [patch] [request] add new class geom_xbox360 to p bin/110705 geom gmirror control utility does not exit with correct exi o kern/113837 geom [geom] unable to access 1024 sector size storage o kern/113885 geom [geom] [patch] improved gmirror balance algorithm o kern/114532 geom [geom] GEOM_MIRROR shows up in kldstat even if compile o kern/115547 geom [geom] [patch] [request] let GEOM Eli get password fro o kern/119743 geom [geom] geom label for cds is keeped after dismount and o kern/120044 geom [msdosfs] [geom] incorrect MSDOSFS label fries adminis f kern/120091 geom [geom] [geli] [gjournal] geli does not prompt for pass 13 problems total. From owner-freebsd-geom@FreeBSD.ORG Wed Feb 13 02:06:45 2008 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 281DE16A421 for ; Wed, 13 Feb 2008 02:06:45 +0000 (UTC) (envelope-from dwiest@vailsys.com) Received: from cprobd02.vailsys.com (cprobd02.vailsys.com [63.210.102.130]) by mx1.freebsd.org (Postfix) with ESMTP id 03AB013C46E for ; Wed, 13 Feb 2008 02:06:44 +0000 (UTC) (envelope-from dwiest@vailsys.com) Received: from dpfuser01.vail (dpfuser01.vail [192.168.129.103]) by cprobd02.vailsys.com (Postfix) with ESMTP id 06FBACE53A for ; Tue, 12 Feb 2008 19:35:25 -0600 (CST) Received: from dfwdamian.vail (dfwdamian.vail [192.168.129.233]) by dpfuser01.vail (Postfix) with ESMTP id CDE195C90 for ; Tue, 12 Feb 2008 19:35:24 -0600 (CST) Received: (from dwiest@localhost) by dfwdamian.vail (8.13.8/8.13.8/Submit) id m1D1ZOf1096305 for freebsd-geom@freebsd.org; Tue, 12 Feb 2008 19:35:24 -0600 (CST) (envelope-from dwiest@vailsys.com) X-Authentication-Warning: dfwdamian.vail: dwiest set sender to dwiest@vailsys.com using -f Date: Tue, 12 Feb 2008 19:35:24 -0600 From: Damian Wiest To: freebsd-geom@freebsd.org Message-ID: <20080213013524.GE82589@dfwdamian.vail> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Subject: GEOM related panic during install X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Feb 2008 02:06:45 -0000 I recently tried to install FreeBSD 6.3 on an Intel based server and encountered a geom related panic during the boot process. The symptoms are very similar to those reported here, http://www.nabble.com/bypassing-gmirror-to-recover-filesystems-to9302550.html#a9302550 Here's the relevant output from the boot process: ad4: 476940MB at ata2-master SATA150 ad4: 976773168 sectors [969021C/16H/63S] 16 sectors/interrupt 1 depth queue GEOM: new disk ad4 ad4: Intel check1 failed ad4: Adaptec check1 failed ad4: LSI (v3) check1 failed ad4: LSI (v2) check1 failed ad4: FreeBSD check1 failed ata3-master: pio=PIO4 wdma=WDMA2 udma=UDMA133 cable=40 wire ad6: 476940MB at ata3-master SATA150 ad6: 976773168 sectors [969021C/16H/63S] 16 sectors/interrupt 1 depth queue GEOM: new disk ad6 ad6: Intel check1 failed ad6: Adaptec check1 failed ad6: LSI (v3) check1 failed ad6: LSI (v2) check1 failed ad6: FreeBSD check1 failed WARNING: Device name truncated! (ad6p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57) WARNING: Device name truncated! (ad6p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57p57) ... [warning repeats many, many times] ... Fatal double fault rip = 0xffffffff803ee5d0 rsp = 0xffffffffb2162fc0 rbp = 0xffffff0076a74680 panic: double fault Uptime: 54s Cannot dump. No dump device defined. Automatic reboot in 15 seconds - press a key on the console to abort --> Press a key on the console to reboot, --> or switch off the system now. I believe that the panic occurs while geom is tasting the system's disks. AFAIK, the disks were new, but someone here had configured the BIOS to use the onboard soft-RAID controller to mirror the drives. I disabled this setting before beginning the install, so I suspect that's how the label on ad6 got messed up. What's the proper way of recovering from this situation? I can't simply pull the offending disk, boot into FreeBSD, reinsert the disk and then use dd to zero the label because x86/amd64 servers won't notice the new disk. I ended up using a Solaris install CD to write a new label to each disk, but I suppose I could build a custom kernel that does not contain any of the geom modules and use that as a fixit disk. Do I just need to use boot option 6 and then have the loader unload any modules? -Damian From owner-freebsd-geom@FreeBSD.ORG Wed Feb 13 10:18:19 2008 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69DFE16A417 for ; Wed, 13 Feb 2008 10:18:19 +0000 (UTC) (envelope-from gcubfg-freebsd-geom@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id E487113C47E for ; Wed, 13 Feb 2008 10:18:18 +0000 (UTC) (envelope-from gcubfg-freebsd-geom@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1JPEgu-0002rZ-48 for freebsd-geom@freebsd.org; Wed, 13 Feb 2008 10:18:12 +0000 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 13 Feb 2008 10:18:12 +0000 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 13 Feb 2008 10:18:12 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-geom@freebsd.org From: Ivan Voras Date: Wed, 13 Feb 2008 11:19:56 +0100 Lines: 60 Message-ID: References: <20080213013524.GE82589@dfwdamian.vail> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigD945FF17E44D55F7D8106797" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 2.0.0.6 (X11/20071022) In-Reply-To: <20080213013524.GE82589@dfwdamian.vail> X-Enigmail-Version: 0.95.0 Sender: news Subject: Re: GEOM related panic during install X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Feb 2008 10:18:19 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigD945FF17E44D55F7D8106797 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Damian Wiest wrote: > I believe that the panic occurs while geom is tasting the system's disk= s. > AFAIK, the disks were new, but someone here had configured the BIOS to = > use the onboard soft-RAID controller to mirror the drives. I disabled = > this setting before beginning the install, so I suspect that's how the = > label on ad6 got messed up. >=20 > What's the proper way of recovering from this situation? I can't simpl= y > pull the offending disk, boot into FreeBSD, reinsert the disk and then > use dd to zero the label because x86/amd64 servers won't notice the new= =20 > disk. I ended up using a Solaris install CD to write a new label to ea= ch > disk, but I suppose I could build a custom kernel that does not contain= > any of the geom modules and use that as a fixit disk. Do I just need > to use boot option 6 and then have the loader unload any modules? The problem here is that even if you do remove optional GEOM modules/classes from the kernel, you'll still be left with the GEOM framework which does the initial tasting, which you can't remove because it's the kernel's interface to the drives. Also, the "Intel check1 failed" messages are from the ATA driver, as it tries to recognize BIOS/soft-raid configurations, and you can't remove that. It (the ATA driver) is also the probable cause of the panic here. It would be useful if you tried to debug the problem in the driver - try and download a recent snapshot of 8-current, with debugging enabled, and see if you can get a backtrace on panic which would help fix the driver. Other than that, you'll probably have to boot another OS (Linux, Solaris, etc.) and use dd to clear the first few and the last few sectors of the drives. --------------enigD945FF17E44D55F7D8106797 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHssRNldnAQVacBcgRAnfvAKC7dQpMXCtnwhseUlrgCrvuCprsYQCcC8d1 DYykWGQhuLKaLw2k/XKQp5A= =exne -----END PGP SIGNATURE----- --------------enigD945FF17E44D55F7D8106797-- From owner-freebsd-geom@FreeBSD.ORG Fri Feb 15 19:00:38 2008 Return-Path: Delivered-To: geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F9E816A41A for ; Fri, 15 Feb 2008 19:00:38 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from smtpoutm.mac.com (smtpoutm.mac.com [17.148.16.80]) by mx1.freebsd.org (Postfix) with ESMTP id 1283B13C4F0 for ; Fri, 15 Feb 2008 19:00:37 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from mac.com (asmtp007-s [10.150.69.70]) by smtpoutm.mac.com (Xserve/smtpout017/MantshX 4.0) with ESMTP id m1FIcxCQ021944 for ; Fri, 15 Feb 2008 10:38:59 -0800 (PST) Received: from mini-g4.jnpr.net (natint3.juniper.net [66.129.224.36]) (authenticated bits=0) by mac.com (Xserve/asmtp007/MantshX 4.0) with ESMTP id m1FIcuBL026461 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for ; Fri, 15 Feb 2008 10:38:57 -0800 (PST) Message-Id: <4A4329EB-B8EF-4CDA-98C0-4753289C4788@mac.com> From: Marcel Moolenaar To: geom@FreeBSD.org Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Fri, 15 Feb 2008 10:38:55 -0800 X-Mailer: Apple Mail (2.919.2) Cc: Subject: Brainstorm: NAND flash X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Feb 2008 19:00:38 -0000 All, I've been thinking about supporting NAND flash for disk storage and I've come up with some initial thoughts for people to shoot at. The intend of this thread is to align thoughts and for people to tell me that they already have an implementation ;-) NAND class ---------- NAND flash devices present themselves to the GEOM layer with GEOMs of the NAND class. This is similar to HDDs presenting themselves with GEOMs of the DISK class. The idea here is that we need some additional I/O requests and also want to be able to distinguish flash from disks. GEOMs of class NAND don't have the mediasize and sectorsize attributes (or they have them with value 0). The mediasize is dependent upon the number of bad blocks, which is not being dealt with at this level. NANDs don't have sectors. Attributes of this class include: blockcount - the raw number of blocks blocksize - the number of bytes or pages in a block pagesize - the number of bytes in a page oobsize - the number of bytes per page used for OOB The NAND class support BIO_DELETE. It'll also need something for random access to the OOB data. For this we can introduce BIO_READOOB and BIO_WRITEOOB. This allow byte-wise I/O. The standard BIO_READ and BIO_WRITE operate on pages by default. With the above, we have raw access to the NAND flash. That is before any wear-leveling or sector mapping happens. A device special file corresponding to GEOMs of this class can be used by diagnostics and/or initialization tools. Open issue: do we want this GEOM to deal with bad blocks? WEARLEVEL class --------------- GEOMs of the WEARLEVEL class (further referred to as WL class), will taste GEOMs of the NAND class. In particular, they will use the blockcount, blocksize, pagesize and oobsize in order to determine whether a GEOM is suitable. The tasting process will read OOB data to determine if wear-leveling is used. As such, wear-leveling needs to be setup. For this a geom(8) library exists. GEOMs of this class export the same variables as GEOMs of the NAND class, but also has a non-0 mediasize. The primary purpose of the WL class is to present a NAND flash device that for which wear-leveling is not a concern and that does not have any bad blocks. It can implement different policies, such as block-based wear-leveling or page-based wear-leveling. All configurable through geom(8). NANDDIDK class ------------- GEOMs of the NANDDISK class (bad name, I know) attach to GEOMs of either NAND or WEARLEVEL classes and present a consumer that looks like a "regular" disk. It has the mediasize and sectorsize attributes and not any of the blockcount, blocksize, pagesize or oobsize attributes. Also BIO_READOOB and BIO_WRITEOOB are not supported, though BIO_DELETE may be. The primary purpose of this class is to provide standard sector mapping for file systems that are not designed for NAND flash. The mapping can be trivial. NANDSIM class ------------- Not needed in production, but it would be good to have a GEOM that simulates a NAND flash and that keeps statistics. It is configured by geom(8) and needs a provider for actual storage. As such, you can use an underlying MD for storage and present the GEOM layer with a NAND flash device. Statistics include such things as erase count per block, read and write counts per block or page. Other features could include the simulation of power loss to test algorithms used for wear-leveling and or sector mapping. Let the discussion begin... -- Marcel Moolenaar xcllnt@mac.com From owner-freebsd-geom@FreeBSD.ORG Fri Feb 15 23:46:05 2008 Return-Path: Delivered-To: geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23A5E16A41B for ; Fri, 15 Feb 2008 23:46:05 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from smtpoutm.mac.com (smtpoutm.mac.com [17.148.16.73]) by mx1.freebsd.org (Postfix) with ESMTP id 1388A13C457 for ; Fri, 15 Feb 2008 23:46:04 +0000 (UTC) (envelope-from xcllnt@mac.com) Received: from mac.com (asmtp004-s [10.150.69.67]) by smtpoutm.mac.com (Xserve/smtpout010/MantshX 4.0) with ESMTP id m1FNk4pp000921; Fri, 15 Feb 2008 15:46:04 -0800 (PST) Received: from mini-g4.jnpr.net (natint3.juniper.net [66.129.224.36]) (authenticated bits=0) by mac.com (Xserve/asmtp004/MantshX 4.0) with ESMTP id m1FNk2m7001882 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Fri, 15 Feb 2008 15:46:03 -0800 (PST) Message-Id: From: Marcel Moolenaar To: Poul-Henning Kamp In-Reply-To: <93634.1203118109@critter.freebsd.dk> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Date: Fri, 15 Feb 2008 15:46:02 -0800 References: <93634.1203118109@critter.freebsd.dk> X-Mailer: Apple Mail (2.919.2) Cc: geom@FreeBSD.org Subject: Re: Brainstorm: NAND flash X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Feb 2008 23:46:05 -0000 On Feb 15, 2008, at 3:28 PM, Poul-Henning Kamp wrote: > In message <4A4329EB-B8EF-4CDA-98C0-4753289C4788@mac.com>, Marcel > Moolenaar wri > tes: > >> GEOMs of class NAND don't have the mediasize and sectorsize >> attributes (or they have them with value 0). The mediasize is >> dependent upon the number of bad blocks, which is not being >> dealt with at this level. > > Mediasize is about addressability, not about usability, so this > assumption is wrong. > > A GEOM provider is just an addressable array of sectors, it > doesn't guarantee that you can read them all or write them > all, as is indeed the case when your disk develops a bad sector. > > NAND is only special due to the OOB stuff, the main page array > is just a pretty spotty disk, for all GEOM cares. The reason I thought this was good is that disks are shipped without bad blocks visible to the "application". That is: the norm is no bad blocks. With NAND flash the norm is that bad blocks part of the deal. I thought that dealing with bad blocks explicitly for NAND would level the playing field and make it more consistent... >> dealt with at this level. NANDs don't have sectors. >> Attributes of this class include: >> blockcount - the raw number of blocks > > This goes in mediasize (as a byte count) > >> blocksize - the number of bytes or pages in a block > > This goes in sectorsize. Can't this cause race conditions? Suppose there happens to be a MBR in the first page at offset 0. The MBR class could end up taking the provider, when a wear-leveling geom should really take it. >> Open issue: do we want this GEOM to deal with bad blocks? > > I'm not sure I understand this question. GEOM doesn't know about > bad blocks, if you try to use them, GEOM happily transports the > resulting error code back, but it does not care if the error code > is "read error" or "values of beta gives rise to dom!" See above. >> NANDDIDK class >> ------------- > >> The primary purpose of this class is to provide standard sector >> mapping for file systems that are not designed for NAND flash. >> The mapping can be trivial. > > I don't understand why this would be necessary, this is normally > done in the wearleveling class (for reasons that should be obvious), > so why do you want to split it into a separate class ? I'm ignorant of the obviousness of why sector mapping and wear-leveling are to be done at the same time... ...and I presume you can't elaborate... -- Marcel Moolenaar xcllnt@mac.com From owner-freebsd-geom@FreeBSD.ORG Fri Feb 15 23:48:21 2008 Return-Path: Delivered-To: geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D7AB16A417 for ; Fri, 15 Feb 2008 23:48:21 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 63C7013C4E3 for ; Fri, 15 Feb 2008 23:48:20 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 34EC217104; Fri, 15 Feb 2008 23:28:30 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m1FNSTXj093635; Fri, 15 Feb 2008 23:28:29 GMT (envelope-from phk@critter.freebsd.dk) To: Marcel Moolenaar From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 15 Feb 2008 10:38:55 PST." <4A4329EB-B8EF-4CDA-98C0-4753289C4788@mac.com> Date: Fri, 15 Feb 2008 23:28:29 +0000 Message-ID: <93634.1203118109@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: geom@FreeBSD.org Subject: Re: Brainstorm: NAND flash X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Feb 2008 23:48:21 -0000 In message <4A4329EB-B8EF-4CDA-98C0-4753289C4788@mac.com>, Marcel Moolenaar wri tes: >GEOMs of class NAND don't have the mediasize and sectorsize >attributes (or they have them with value 0). The mediasize is >dependent upon the number of bad blocks, which is not being >dealt with at this level. Mediasize is about addressability, not about usability, so this assumption is wrong. A GEOM provider is just an addressable array of sectors, it doesn't guarantee that you can read them all or write them all, as is indeed the case when your disk develops a bad sector. NAND is only special due to the OOB stuff, the main page array is just a pretty spotty disk, for all GEOM cares. >dealt with at this level. NANDs don't have sectors. >Attributes of this class include: > blockcount - the raw number of blocks This goes in mediasize (as a byte count) > blocksize - the number of bytes or pages in a block This goes in sectorsize. > pagesize - the number of bytes in a page > oobsize - the number of bytes per page used for OOB These two are secondary attributes which are not likely to change easily for a given NAND, so they should be handled by the BIO_GETATTR (as "NAND::PAGESIZE" and "NAND::OOBSIZE" for instance). >For this we can introduce >BIO_READOOB and BIO_WRITEOOB. Yes, this sound sensible. The original plan is that all BIO_ operations are power of two and providers should have a bitmap of which they support, (G_PF_CANDELETE is a mistake in this respect) so this shouldn't be a problem. In general we should not introduce new BIO_ operations without reason, but these two are very reasonable. >With the above, we have raw access to the NAND flash. That is >before any wear-leveling or sector mapping happens. A device >special file corresponding to GEOMs of this class can be used >by diagnostics and/or initialization tools. Yes, given suitable ioctls to geom_dev, for the new BIO_*OOB. >Open issue: do we want this GEOM to deal with bad blocks? I'm not sure I understand this question. GEOM doesn't know about bad blocks, if you try to use them, GEOM happily transports the resulting error code back, but it does not care if the error code is "read error" or "values of beta gives rise to dom!" >WEARLEVEL class >--------------- Sounds good. I'm under NDA on M-Systems algorithm and Sandisk is sueing left and right on those patents. >NANDDIDK class >------------- >The primary purpose of this class is to provide standard sector >mapping for file systems that are not designed for NAND flash. >The mapping can be trivial. I don't understand why this would be necessary, this is normally done in the wearleveling class (for reasons that should be obvious), so why do you want to split it into a separate class ? >NANDSIM class >------------- > >Not needed in production, but it would be good to have a GEOM >that simulates a NAND flash and that keeps statistics. A very good idea. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-geom@FreeBSD.ORG Sat Feb 16 00:33:09 2008 Return-Path: Delivered-To: geom@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2FB8416A418 for ; Sat, 16 Feb 2008 00:33:09 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id AB82913C45A for ; Sat, 16 Feb 2008 00:33:08 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 3BE5817104; Sat, 16 Feb 2008 00:33:07 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.2/8.14.2) with ESMTP id m1G0X6tr094216; Sat, 16 Feb 2008 00:33:06 GMT (envelope-from phk@critter.freebsd.dk) To: Marcel Moolenaar From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 15 Feb 2008 15:46:02 PST." Date: Sat, 16 Feb 2008 00:33:06 +0000 Message-ID: <94215.1203121986@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: geom@FreeBSD.org Subject: Re: Brainstorm: NAND flash X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 Feb 2008 00:33:09 -0000 In message , Marcel Moolenaar wri tes: >> Mediasize is about addressability, not about usability, so this >> assumption is wrong. >> >> A GEOM provider is just an addressable array of sectors, it >> doesn't guarantee that you can read them all or write them >> all, as is indeed the case when your disk develops a bad sector. >> >> NAND is only special due to the OOB stuff, the main page array >> is just a pretty spotty disk, for all GEOM cares. > >The reason I thought this was good is that disks are >shipped without bad blocks visible to the "application". >That is: the norm is no bad blocks. With NAND flash >the norm is that bad blocks part of the deal. I thought >that dealing with bad blocks explicitly for NAND would >level the playing field and make it more consistent... Well, if you want to take that route, you should not use GEOM to connect the wear-leveling to the NAND flash in the first place. Which option you prefer there is sort of a toss. Putting it gives you devices in /dev and other benefits, using a private interface allows you to get it more precisely tailored to your needs. I would say put it under GEOM, the bad blocks will not trouble GEOM, and should somebody get perfect NAND (or care to handle the bad blocks otherwise), they can stick their filesystem there directly, if they don't need to write to it too much. >>> dealt with at this level. NANDs don't have sectors. >>> Attributes of this class include: >>> blockcount - the raw number of blocks >> >> This goes in mediasize (as a byte count) >> >>> blocksize - the number of bytes or pages in a block >> >> This goes in sectorsize. > >Can't this cause race conditions? > >Suppose there happens to be a MBR in the first page at >offset 0. The MBR class could end up taking the provider, >when a wear-leveling geom should really take it. At the moment the wear-leveling opens the NAND device for writing, the MBR would get spoiled and disappear. And the chances of MBR finding its metadata in the right physical sector is pretty small to begin with if the wear leveling is worth anything. Of course if you do simple bad-block substitution, the chance would be close to certainty, but the MBR would still get spoiled, so that would still work. >I'm ignorant of the obviousness of why sector mapping and >wear-leveling are to be done at the same time... > >...and I presume you can't elaborate... No I can't. But I can tell you something about filesystems under BSD license which might interest you. Imagine you implement a filesystem, that allocates space in 512 byte sectors, even though the underlying device has a (much) larger sector size.[1] To reduce the amount of disk-I/O, you would obviously want to avoid doing read 64k block modify 512 bytes of those write 64k block read same 64k block modify some other 512 bytes of those write 64k block again In particular if writes were very slow or otherwise expensive. You would of course do this, by implementing, as UNIX has always done, a buffer-cache that does the logical/physical translation. BUT, imagine now as a complication, that your filesystem was log-structured in somewhat the same hacked up way that Margo Seltzer did with LFS. The idea behind LFS is important in this context: The objective was to gain write speed by always writing sequentially and basically treat the disk as a circular buffer, hoping that the RAM cache would limit the amount of seeks for reading, and that the disk would have enough free space to reduce the workload of the cleaner process. The trouble with that of course, is that both assumptions were wrong until RAM and disk exploded in size just a few years ago. On a 95% filled filesystem, LFS sunk under the weight of the cleaner, and RAM was never big enough to cache all you wanted and it doesn't help until the second access anyway. The other important aspect of a LFS, is that you need a "cleaner" process to run ahead of the write pointer, and scavenge space. If it finds a fully used big block, it leaves it alone, but if it finds an 64k block with only 512 bytes of data, it copies those 512 bytes into the write stream so it can mark the 64k block as free, and recycle it. Margos LFS was a fiasco, but we can still learn from it: The source of trouble, as far as I have been able to find out, is that the filesystem naming layer (in her case UFS) need a logical block number which must be determined before the physical block number has been allocated, so the logical block number must be translated to a physical number through some sort of means or table. You obviously would _not_ want two copies of the data in the cache, one under the logical and one under the physical blocknumber, so you have to pick one or the other. Margos choice for the easy solution to the logical/physical mapping problem in LFS, sucked badly when it came to write the "cleaner" process: A mapping that gives you only a logical->physical translation cheaply, but requires you to read many blocks of disk to reverse the mapping, doesn't help you when you read a physical sector and need to find out if it is used in, and where it belongs in the logical space. Which is exactly what the cleaner needs to do. I belive in the end her choice made it so damn hard that the cleaner never happened during the time she took an interest in LFS (exactly until she got her phd I belive ?) Ousterhout had some very good and relevant, but harsh words for her about that. (Sprites LFS, by Ousterhout, is also worth a study, but it was better designed but also more narrowly tailored to the Sprite OS, and thus we cannot learn as much from it today.) This is all from memory, I havn't bothered to look up the LFS source code or the correspondence on Ousterhouts page, so some details may be slightly off, for which I apologize. Poul-Henning [1] Its interesting that Sun gave up on this and had to get special firmware to CD-ROM drives, but that's an entirely different story and not relevant :-) -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.