From owner-freebsd-fs@FreeBSD.ORG Sun Oct 28 06:50:48 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9965916A421 for ; Sun, 28 Oct 2007 06:50:48 +0000 (UTC) (envelope-from scode@hyperion.scode.org) Received: from hyperion.scode.org (cl-1361.ams-04.nl.sixxs.net [IPv6:2001:960:2:550::2]) by mx1.freebsd.org (Postfix) with ESMTP id 5EB0F13C4B5 for ; Sun, 28 Oct 2007 06:50:48 +0000 (UTC) (envelope-from scode@hyperion.scode.org) Received: by hyperion.scode.org (Postfix, from userid 1001) id 1441A23C42A; Sun, 28 Oct 2007 07:50:46 +0100 (CET) Date: Sun, 28 Oct 2007 07:50:46 +0100 From: Peter Schuller To: Indigo 23 Message-ID: <20071028065046.GA17848@hyperion.scode.org> References: <6f50eac40710271538y14c54341rc90cf78429de8b68@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="XsQoSWH+UP9D9v3l" Content-Disposition: inline In-Reply-To: <6f50eac40710271538y14c54341rc90cf78429de8b68@mail.gmail.com> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-fs@freebsd.org Subject: Re: Configuring an existing zpool to raidz2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Oct 2007 06:50:48 -0000 --XsQoSWH+UP9D9v3l Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable > For example, > You have a zpool called 'mypool' and it has ad0,ad1, and ad2. This > pool has data on it. > Is it safe to do: "zpool create mypool raidz2 ad0 ad1 ad2" and have > that pool use raidz2 without any loss of data occurring? Conerting pools between different types is not possible, with the exception of adding mirrors. You would have to create a new pool and move your data onto it. > Also, which is more recommended to use, raidz or raidz2? That is completely dependent on the situation. raidz offers n+1 redundancy (you can loose up to one drive), while raidz2 offers n+2 redundancy (you can loose up to two drives). In both cases at the cost of diskspace. In your case with three drives, you would probably want raidz or you would only get the disk space of a single drive (thus wasting two thirds). I recommend you have a look at the ZFS administrator guide: http://www.opensolaris.org/os/community/zfs/docs/zfsadmin.pdf --=20 / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org --XsQoSWH+UP9D9v3l Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFHJDFGDNor2+l1i30RApTWAKCxlMYqLuuDM+/g7GH3HuRT7O9+wQCgjhfA AYaT+097uuP6X9ZtwmE9y9I= =gr5v -----END PGP SIGNATURE----- --XsQoSWH+UP9D9v3l-- From owner-freebsd-fs@FreeBSD.ORG Mon Oct 29 11:07:03 2007 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B8D616A473 for ; Mon, 29 Oct 2007 11:07:02 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 2A17A13C48A for ; Mon, 29 Oct 2007 11:07:02 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l9TB72OA090090 for ; Mon, 29 Oct 2007 11:07:02 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l9TB71A2090086 for freebsd-fs@FreeBSD.org; Mon, 29 Oct 2007 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 29 Oct 2007 11:07:01 GMT Message-Id: <200710291107.l9TB71A2090086@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2007 11:07:19 -0000 Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/114856 fs [ntfs] [patch] Bug in NTFS allows bogus file modes. o kern/116170 fs Kernel panic when mounting /tmp 4 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/114847 fs [ntfs] [patch] dirmask support for NTFS ala MSDOSFS 1 problem total. From owner-freebsd-fs@FreeBSD.ORG Mon Oct 29 17:22:33 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F64F16A41B for ; Mon, 29 Oct 2007 17:22:33 +0000 (UTC) (envelope-from julien.bellanger@capgemini.com) Received: from mxefra01.capgemini.com (mxefra01.capgemini.com [194.11.254.68]) by mx1.freebsd.org (Postfix) with ESMTP id A8DC613C48A for ; Mon, 29 Oct 2007 17:22:16 +0000 (UTC) (envelope-from julien.bellanger@capgemini.com) Received: from mxifra01.capgemini.com (MXIFRA01 [205.223.229.10]) by mxefra01.capgemini.com (8.13.6/8.13.6) with ESMTP id l9TCtDZ5018615 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Mon, 29 Oct 2007 13:55:13 +0100 (MET) Received: from mxifra01.capgemini.com (localhost [127.0.0.1]) by mxifra01.capgemini.com (8.13.6/8.13.6) with ESMTP id l9TCtBuK007188 for ; Mon, 29 Oct 2007 13:55:11 +0100 (MET) Received: from corpmta13.corp.capgemini.com (corpmta13.corp.capgemini.com [205.223.229.53]) by mxifra01.capgemini.com (8.13.6/8.13.6) with ESMTP id l9TCtBYi007180 for ; Mon, 29 Oct 2007 13:55:11 +0100 (MET) Received: from CORPMAIL31.corp.capgemini.com ([205.223.229.30]) by corpmta13.corp.capgemini.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 29 Oct 2007 13:55:11 +0100 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Date: Mon, 29 Oct 2007 13:55:10 +0100 Message-ID: <6B8707EF978E3E4E908642088C6E7011B595EC@CORPMAIL31.corp.capgemini.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Problem concerning the UFS consistency with embedded FreeBSD system Thread-Index: AcgaKvBXaKqbU7HIRO6yX8sgNZFk9Q== From: "Bellanger, Julien" To: X-OriginalArrivalTime: 29 Oct 2007 12:55:11.0079 (UTC) FILETIME=[F0CA2F70:01C81A2A] Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Problem concerning the UFS consistency with embedded FreeBSD system X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2007 17:22:33 -0000 Hi, =20 Summary : After I shut down the power and ran fsck on FS, the Size field of a file that was written during the power cut off is corresponding to the final size of the file to write and not the actual length of the block really written=2E FSCK only corrects he FS block metadata information, not the real file size written=2E =20 =20 I have installed i386 FreeBSD 6=2E2 system on an embedded card (Kontron card)=2E This system is installed in an environment where power is cut off many time a day=2E=20 =20 In that condition, I try to maintain and detect as much as possible the consistency of the FS data flushed on the disk and I discovered a problem concerning the consistency of the SIZE wrote in I-node descriptor with the real BLOCKCOUNT associate to this I-node=2E =20 More precisely if I'm coping 50 files of 10Mb from a directory to another (via the cp command) and cut the power off in the middle of this operation : =20 - on reboot the FSCK program detects the incoherence and prints : /dev/ad2s1f: INCORRECT BLOCK COUNT I=3D5205090 (20160 should be 19360) (CORRECTED) so it corrects the BLOCKCOUNT in the associated I-node 5205090 =20 - The problem is that the fsck doesn't correct the SIZE in this I-node, and the announced size is set to the final waiting size even if fsck has detect the loss of data block=2E=2E=2E (I can saw that with the fsdb= command) =20 - Thus when I remount this File System, the file seems OK with the good size, but If I read it with VI, I actually only read correct data at the start of the file (corresponding to the data block that could be read) and the end of file is filled with bad character (uninitialized data in the extended memory alloc due to the incorrect SIZE file information, I supposed)=2E =20 Is some one able to explain me :=20 - why the SIZE memorized in I-node seems always reflect the final size of the file and not the actual size wrote on disk ? - why FSCK doesn't correct the SIZE metadata in the I-node block ? - how I can detect such inconsistency (FSCK doesn't return an error code in this case) =20 (For your information I have the same problem with a FS where the SoftUpdate option is disabled) =20 In your advice can we consider that as a bug ? In my mind FSCK should be able to set at minima the SIZE equal to BLOCKCOUNT data really wrote=2E =20 =20 Thanks in advances for your answers=2E =20 =20 Julien BELLANGER Capgemini Telecom Media =20 This message contains information that may be privileged or confidential= and is the property of the Capgemini Group=2E It is intended only for the= person to whom it is addressed=2E If you are not the intended recipient, = you are not authorized to read, print, retain, copy, disseminate, = distribute, or use this message or any part thereof=2E If you receive this= message in error, please notify the sender immediately and delete all = copies of this message=2E From owner-freebsd-fs@FreeBSD.ORG Tue Oct 30 08:22:41 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EEFC116A417; Tue, 30 Oct 2007 08:22:41 +0000 (UTC) (envelope-from david.cecil@nokia.com) Received: from mgw-ext13.nokia.com (smtp.nokia.com [131.228.20.172]) by mx1.freebsd.org (Postfix) with ESMTP id BC48213C4B2; Tue, 30 Oct 2007 08:22:39 +0000 (UTC) (envelope-from david.cecil@nokia.com) Received: from esebh105.NOE.Nokia.com (esebh105.ntc.nokia.com [172.21.138.211]) by mgw-ext13.nokia.com (Switch-3.2.5/Switch-3.2.5) with ESMTP id l9U7GGtc025422; Tue, 30 Oct 2007 09:16:24 +0200 Received: from siebh102.NOE.Nokia.com ([172.30.195.29]) by esebh105.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 30 Oct 2007 09:16:17 +0200 Received: from syebe101.NOE.Nokia.com ([172.30.128.65]) by siebh102.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 30 Oct 2007 15:16:05 +0800 Received: from [172.30.10.31] ([172.30.10.31]) by syebe101.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Tue, 30 Oct 2007 18:16:03 +1100 Message-ID: <4726DA32.9050803@nokia.com> Date: Tue, 30 Oct 2007 17:16:02 +1000 From: David Cecil User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) MIME-Version: 1.0 To: David Cecil References: <45F776AE.8090702@nokia.com> <20070314161041.GI7847@garage.freebsd.pl> <45F8EE27.6070208@nokia.com> <20070315090031.GB80993@deviant.kiev.zoral.com.ua> <4689EFAA.4080009@nokia.com> In-Reply-To: <4689EFAA.4080009@nokia.com> X-OriginalArrivalTime: 30 Oct 2007 07:16:03.0093 (UTC) FILETIME=[BADBAC50:01C81AC4] X-Nokia-AV: Clean Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek , "Bhatt Kartikey \(Nokia-ES/Robina\)" Subject: Re: FFS writes to read-only mount when remounting ro X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2007 08:22:42 -0000 Hi all, a work colleague and I have been looking at the filesystem problem I posted to the list some time ago. Note the stack trace about half way through this message. This problem exists on 6.1, but from what I can tell, it should exist in current too, or at least RELENG_7. Anyhow, we may have a fix to this problem (thanks to my mate) which involves a simple change to vop_stdfsync(). bp->b_vflags |= BV_SCANNED is moved to after the attempt to lock the buffer. The theory is that we shouldn't mark the buffer as scanned if we couldn't get the buffer lock. However, I'm not really happy with that. I think something else is involved, as I believe the logic in the code further down (where it checks MNT_WAIT) attempts to catch the case where the buffer was busy (write in progress) by waiting for it. At this point (when it wakes up), it assumes either it's no longer dirty (so doesn't need to be written) or if it is dirty, the dirty count will be non-zero. Note though from the info I sent earlier (below) the dirty count is zero. Something doesn't add up... To muddy the waters further, we can't reproduce the problem with the "fix" mentioned above. My guess is that we are not getting the lock sometimes. Now that it doesn't mark the buffer as scanned if it can't get the lock, it gets more iterations to try and sync it, and it eventually does. Again, I'm not sure that this is the correct solution. Please let me know what you think. Regards, Dave David Cecil wrote: > Hi, > > in March I mailed the list about errors I'm seeing that look something > like this: > g_vfs_done():mirror/gmroots1f[WRITE(offset=1349091328, > length=16384)]error = 1 > (Note that gmroots1f is not a complete mirror set, just one disk ,and > was never part of a complete mirror set. Also, partition f is the > root partition.) > > The problem comes about because the underlying geom object has had its > access permissions changed so that g_io_check fails due to acw being 0 > (as per my original message at the end of this email). This problem > is most easily reproducible when during boot if the root filesystem is > changed from rw to ro mount. Our code is now 6.1-based. > > I have recently returned to this problem, but so far I have not been > able to determine why this buffer was not sync'ed along with the > others at the time the filesystem was remounted ro. > > I have included some information that was requested by Kostik back in > March, in the hope it might trigger something useful. I have caused > the system to panic in g_io_check when an illegal write attempt is found: > #9 0x8057a3df in panic (fmt=0x80743c24 "Illegal write attempt!\n") > at ../../../kern/kern_shutdown.c:572 > #10 0x8052f0f0 in g_io_check (bp=0x866dbab0) at > ../../../geom/geom_io.c:205 > ---Type to continue, or q to quit--- > #11 0x8052f5da in g_io_schedule_down (tp=0x852bd7d0) > at ../../../geom/geom_io.c:387 > #12 0x8052fb4a in g_down_procbody () at ../../../geom/geom_kern.c:118 > #13 0x80564a08 in fork_exit (callout=0x8052faf0 , > arg=0x0, > frame=0xda68dd38) at ../../../kern/kern_fork.c:805 > #14 0x806ebd7c in fork_trampoline () at ../../../i386/i386/exception.s:208 > > I have used bp to get the vnode, dirty buffer header, and buffer data. > vnode: > (kgdb) p *((struct buf *)(bp->bio_caller2))->b_bufobj->__bo_vnode > $5 = {v_type = VCHR, v_tag = 0x807419d2 "devfs", v_op = 0x80fbb5a0, > v_data = 0x866b5680, v_mount = 0x8664f000, v_nmntvnodes = { > tqe_next = 0x866b1104, tqe_prev = 0x866b8118}, v_un = { > vu_mount = 0x8664cb00, vu_socket = 0x8664cb00, vu_cdev = 0x8664cb00, > vu_fifoinfo = 0x8664cb00}, v_hashlist = {le_next = 0x0, le_prev = > 0x0}, > v_hash = 0, v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first > = 0x0, > tqh_last = 0x866b8030}, v_dd = 0x0, v_cstart = 0, v_lasta = 0, > v_lastw = 0, v_clen = 0, v_lock = {lk_interlock = 0x80fed128, > lk_flags = 128, lk_sharecount = 0, lk_waitcount = 0, > lk_exclusivecount = 0, lk_prio = 80, lk_wmesg = 0x807419d2 "devfs", > lk_timo = 51, lk_lockholder = 0xffffffff, lk_newlock = 0x0}, > v_interlock = {mtx_object = {lo_name = 0x80750688 "vnode interlock", > lo_type = 0x80750688 "vnode interlock", lo_flags = 16973824, > lo_witness_data = {lod_list = {stqe_next = 0x80ffd508}, > lod_witness = 0x80ffd508}}, mtx_lock = 4, mtx_recurse = 0}, > v_vnlock = 0x866b8058, v_holdcnt = 46, v_usecount = 1, v_iflag = 0, > v_vflag = 0, v_writecount = 0, v_freelist = {tqe_next = 0x0, > tqe_prev = 0x0}, v_bufobj = {bo_mtx = 0x866b807c, bo_clean = > {bv_hd = { > tqh_first = 0xcd364fc8, tqh_last = 0xcd365290}, bv_root = > 0xcd38b060, > bv_cnt = 44}, bo_dirty = {bv_hd = {tqh_first = 0x0, > tqh_last = 0x866b80c8}, bv_root = 0x0, bv_cnt = 0}, > bo_numoutput = 1, > bo_flag = 0, bo_ops = 0x80fda81c, bo_bsize = 512, bo_object = > 0x8186d7c0, > bo_synclist = {le_next = 0x867d0ce4, le_prev = 0x860c21c4}, > bo_private = 0x8667f040, __bo_vnode = 0x866b8000}, v_pollinfo = 0x0, > v_label = 0x0} > > bufobj and buf: > (kgdb) p *((struct buf *)(bp->bio_caller2))->b_bufobj > $6 = {bo_mtx = 0x866b807c, bo_clean = {bv_hd = {tqh_first = 0xcd364fc8, > tqh_last = 0xcd365290}, bv_root = 0xcd38b060, bv_cnt = 44}, > bo_dirty = { > bv_hd = {tqh_first = 0x0, tqh_last = 0x866b80c8}, bv_root = 0x0, > bv_cnt = 0}, bo_numoutput = 1, bo_flag = 0, bo_ops = 0x80fda81c, > bo_bsize = 512, bo_object = 0x8186d7c0, bo_synclist = {le_next = > 0x867d0ce4, > le_prev = 0x860c21c4}, bo_private = 0x8667f040, __bo_vnode = > 0x866b8000} > (kgdb) p *((struct buf *)(bp->bio_caller2)) > $7 = {b_bufobj = 0x866b80b4, b_bcount = 16384, b_caller1 = 0x0, > b_data = 0xce0ab000 "\001", b_error = 0, b_iocmd = 2 '\002', > b_ioflags = 2 '\002', b_iooffset = 1541685248, b_resid = 0, b_iodone > = 0, > b_blkno = 3011104, b_offset = 1541685248, b_bobufs = {tqe_next = > 0xcd38ac88, > tqe_prev = 0xcd383828}, b_left = 0xcd3837f0, b_right = 0xcd369b60, > b_vflags = 1, b_freelist = {tqe_next = 0xcd36c318, tqe_prev = > 0xcd389ae4}, > b_qindex = 2, b_flags = 2684485668, b_xflags = 2 '\002', b_lock = { > lk_interlock = 0x80fed170, lk_flags = 262144, lk_sharecount = 0, > lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, > lk_wmesg = 0x8075476d "bufwait", lk_timo = 0, lk_lockholder = > 0xfffffffe, > lk_newlock = 0x0}, b_bufsize = 16384, b_runningbufspace = 16384, > b_kvabase = 0xce0ab000 "\001", b_kvasize = 16384, b_lblkno = 3011104, > b_vp = 0x866b8000, b_dirtyoff = 0, b_dirtyend = 0, b_rcred = 0x0, > b_wcred = 0x0, b_saveaddr = 0xce0ab000, b_pager = {pg_reqpage = 0}, > b_cluster = {cluster_head = {tqh_first = 0x0, tqh_last = 0x0}, > cluster_entry = {tqe_next = 0x0, tqe_prev = 0x0}}, b_pages = > {0x8253ce28, > 0x8251bb70, 0x824ff9b8, 0x8251c500, 0x0 }, > b_npages = 4, > b_dep = {lh_first = 0x0}} > > data: > (kgdb) x/32x 0xce0ab000 > 0xce0ab000: 0x0001a1ed 0x00000000 0x00000000 0x00000000 > 0xce0ab010: 0x00000010 0x00000000 0x00000000 0x00000000 > 0xce0ab020: 0x00000000 0x00000000 0x4689bfd2 0x00000000 > 0xce0ab030: 0x4689d40d 0x00000000 0x4689bfd2 0x00000000 > 0xce0ab040: 0x00000000 0x00000000 0x00000000 0x00000000 > 0xce0ab050: 0x76688876 0x00000000 0x00000000 0x00000000 > 0xce0ab060: 0x00000000 0x00000000 0x00000000 0x00000000 > 0xce0ab070: 0x5f696c63 0x746d7373 0x6f732e70 0x302e322e > > I appreciate any thoughts you have that might help me solve this issue. > > Also, in looking at a few geom related issues, I have a couple of > changes that are generally applicable to FreeBSD that I plan to send > for your comments soon. > > Thanks, > Dave > > > ext Kostik Belousov wrote: >> On Thu, Mar 15, 2007 at 04:56:39PM +1000, David Cecil wrote: >> >>> Hi Pawel, >>> >>> here is what I've found. Note that this problem doesn't occur >>> frequently, but does happen often enough to be a problem. I am still >>> debugging the live system. >>> >>> I put breakpoints in g_print_bio and g_io_request. I then continued >>> until the breakpoint at g_print_bio breakpoint was hit, but each time >>> g_io_request was hit, I printed the backtrace. Then, at g_print_bio I >>> could identify the struct bio * and look for the corresponding >>> g_io_request trace with a matching struct bio *. >>> >>> That trace looks like this: >>> db> bt >>> Tracing pid 47 tid 100050 td 0x860d0640 >>> g_io_request(864258c4,860ec440,a0020024,98b92188,a5281bf0,...) at >>> g_io_request+0x107 >>> g_vfs_strategy(86104ce4,98b92188,a0020024,98b92188,86104c30,...) at >>> g_vfs_strategy+0x63 >>> ffs_geom_strategy(86104ce4,98b92188) at ffs_geom_strategy+0x129 >>> bufwrite(98b92188,4000,98b92188,a5281c54,805b20c1,...) at bufwrite+0x146 >>> ffs_bufwrite(98b92188) at ffs_bufwrite+0x282 >>> vfs_bio_awrite(98b92188) at vfs_bio_awrite+0x221 >>> vop_stdfsync(a5281cbc,a5281cbc,a5281c98,806eff3e,a5281cbc,...) at >>> vop_stdfsync+0x121 >>> >> It would be helpful, at this frame, to print the actual vnode, as well as >> the dirty buffer header together with dump of buffer data. It seems that >> the easiest way to do that would be to obtain coredump and then use kgdb. >> >> >>> devfs_fsync(a5281cbc) at devfs_fsync+0x23 >>> VOP_FSYNC_APV(80795080,a5281cbc) at VOP_FSYNC_APV+0x7e >>> sync_vnode(86104ce4,860d0640) at sync_vnode+0x100 >>> sched_sync(0,a5281d38,0,805c068c,0,...) at sched_sync+0x1ed >>> fork_exit(805c068c,0,a5281d38) at fork_exit+0xa0 >>> fork_trampoline() at fork_trampoline+0x8 >>> >>> >From ps, it identifies pid 47 as "syncer". >>> >>> So it appears that all I've discovered is that the syncer thread is >>> writing all the dirty buffers for gm0s1a to the disk :-( I guess the >>> question is, why are there any dirty buffers for the read-only >>> partition? I think the partition was mounted rw for a short while >>> during the boot process. Maybe something there that didn't get >>> flushed? Or is it something internal to FFS? >>> >>> The output of ps is listed below. Note that some processes might differ >>> from a standard FreeBSD system as we're modifying the code. As this >>> problem seems to be generic to some FreeBSD systems, I thought it valid >>> to pursue the problem here. I hope that's okay, and anticipate the fix >>> would apply to FBSD, as I've seen what appears to be the same problem >>> logged elsewhere. >>> >>> db> ps >>> pid ppid pgrp uid state wmesg wchan cmd >>> 856 841 856 0 S+ ttyin 0x858bd810 csh >>> 841 1 841 0 Ss+ wait 0x8612f800 login >>> 791 551 791 0 Ss select 0x8081231c sshd-x >>> 790 551 790 0 Ss nanslp 0x807c6400 cron >>> 789 551 789 0 Ss select 0x8081231c monitord >>> 788 551 788 0 Ss select 0x8081231c snmpd >>> 787 551 787 0 Ss select 0x8081231c clishd >>> 785 551 785 0 Ss select 0x8081231c inetd >>> 731 727 727 80 S accept 0x861cde7e httpd >>> 730 727 727 80 S accept 0x861cde7e httpd >>> 729 551 729 0 Ss select 0x8081231c ifm >>> 728 551 728 0 Ss select 0x8081231c xpand >>> 727 551 727 0 Ss select 0x8081231c httpd >>> 725 551 725 0 Ss select 0x8081231c ipsrd >>> 559 551 559 0 Ss select 0x8081231c syslogd >>> 551 1 551 0 Ss select 0x8081231c pm >>> 49 0 0 0 SL - 0xa5071d04 [schedcpu] >>> 48 0 0 0 SL sdflush 0x808ca6cc [softdepflush] >>> 47 0 0 0 SL syncer 0x807c5da4 [syncer] >>> 46 0 0 0 SL vlruwt 0x860ce600 [vnlru] >>> 45 0 0 0 SL psleep 0x8081276c [bufdaemon] >>> 44 0 0 0 SL pgzero 0x808cb41c [pagezero] >>> 43 0 0 0 SL psleep 0x808caf84 [vmdaemon] >>> 42 0 0 0 SL psleep 0x808caf40 [pagedaemon] >>> 41 0 0 0 SL m:w1 0x860a1800 [g_mirror gm0s1] >>> 40 0 0 0 SL ip_inp 0x80854c00 [ip_in_0] >>> 39 0 0 0 SL ip6_inpu 0x808c9560 [ip6 input [1]] >>> 38 0 0 0 SL ip6_inpu 0x808c9560 [ip6 input [0]] >>> 37 0 0 0 WL [swi0: sio] >>> 36 0 0 0 WL [irq17: ichsmb0] >>> 35 0 0 0 WL [irq18: atapci1] >>> 34 0 0 0 WL [irq15: ata1] >>> 33 0 0 0 WL [irq14: ata0] >>> 32 0 0 0 SL usbevt 0x84b7d210 [usb2] >>> 31 0 0 0 WL [irq23: ehci0] >>> 30 0 0 0 SL usbevt 0x8588c210 [usb1] >>> 29 0 0 0 WL [irq19: uhci1] >>> 28 0 0 0 SL usbtsk 0x807c2c64 [usbtask] >>> 27 0 0 0 SL usbevt 0x8587a210 [usb0] >>> 26 0 0 0 WL [irq16: uhci0] >>> 25 0 0 0 WL [irq144: cavium0] >>> 24 0 0 0 SL cbb cv 0x84b7dbe4 [cbb1] >>> 23 0 0 0 SL cbb cv 0x84b7f3e4 [cbb0] >>> 22 0 0 0 WL [irq96: cbb0 cbb1] >>> 21 0 0 0 WL [irq9: acpi0] >>> 20 0 0 0 WL [swi5: +] >>> 9 0 0 0 SL - 0x84b56bc0 [thread taskq] >>> 19 0 0 0 WL [swi6: +] >>> 18 0 0 0 WL [swi6: task queue] >>> 8 0 0 0 SL - 0x84b56d00 [acpi_task2] >>> 7 0 0 0 SL - 0x84b56d00 [acpi_task1] >>> 6 0 0 0 SL - 0x84b56d00 [acpi_task0] >>> 17 0 0 0 WL [swi2: cambio] >>> 5 0 0 0 SL - 0x84b56e80 [kqueue taskq] >>> 16 0 0 0 SL - 0x807c2440 [yarrow] >>> 4 0 0 0 SL - 0x807c335c [g_down] >>> 3 0 0 0 RL CPU 0 [g_up] >>> 2 0 0 0 SL - 0x807c3350 [g_event] >>> 15 0 0 0 WL [swi1: net] >>> 14 0 0 0 WL [swi3: vm] >>> 13 0 0 0 WL [swi4: clock sio] >>> 12 0 0 0 RL [idle: cpu0] >>> 11 0 0 0 RL CPU 1 [idle: cpu1] >>> 1 0 1 0 SLs wait 0x84ac8e00 [init] >>> 10 0 0 0 SL ktrace 0x807c4610 [ktrace] >>> 0 0 0 0 WLs [swapper] >>> >>> One list member asked if I have mounted the root partition with noatime >>> as well as ro. I have not specified noatime. It's worth noting that >>> files in / are regularly read/accessed, as you'd expect, but this bug >>> only occurs once a week or so on a number of machines. >>> >>> Thanks a lot for your time. >>> >>> Regards, >>> Dave >>> >>> ext Pawel Jakub Dawidek wrote: >>> >>>> On Wed, Mar 14, 2007 at 02:14:38PM +1000, David Cecil wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> I have seen the following message (or equivalent) occasionally on a >>>>> FreeBSD 6.1 system: >>>>> g_vfs_done():mirror/gm0s1a[WRITE(offset=1349091328, length=16384)]error = >>>>> 1 >>>>> >>>>> The partition in question is the root partition, and it is mounted >>>>> read-only. I have verified that the problem occurs due to the write >>>>> request returning EPERM due to the check in g_io_check: >>>>> case BIO_WRITE: >>>>> case BIO_DELETE: >>>>> if (cp->acw == 0) >>>>> return (EPERM); >>>>> >>>>> I have been trying to determine what within FFS would be trying to write >>>>> to the partition. The "bio_from" in the bio structure indicates (in the >>>>> geom) that it's ffs.mirror/gm0s1a that's trying to write. The contents >>>>> of the buffer looks somewhat like a directory (lots of files listed, but >>>>> comparison to the actual directory that contians these files reveals it's >>>>> somewhat different), followed by a binary (ELF header). However, I'm at >>>>> a loss to understand who's actually doing the writing. Is it coming from >>>>> within FFS or is there an application that's done the write? (I can't >>>>> understand how an application would be permitted to do it though.) >>>>> >>>>> I have seen this sort of problem (same error number) reported on the >>>>> Internet occasionally, but it doesn't seem it's been satisfactorily >>>>> resolved in all instances. >>>>> >>>>> Any help you can provide would be much appreciated. >>>>> >>>>> >>>> Will be good if you could place kdb_enter() into g_vfs_done() error path >>>> and once in DDB try which processes wait for I/O and collect their >>>> backtraces or just put output of 'alltrace' on some web page. >>>> >>>> >>>> >>> -- >>> Software Engineer >>> Secure and Mobile Connectivity >>> Nokia Enterprise Solutions >>> +61 7 5553 8307 (office) >>> +61 412 728 222 (cell) >>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> > > -- Software Engineer Secure and Mobile Connectivity Enterprise Solutions Nokia +61 7 5553 8307 (office) +61 412 728 222 (cell) From owner-freebsd-fs@FreeBSD.ORG Wed Oct 31 12:52:54 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC5EF16A418; Wed, 31 Oct 2007 12:52:54 +0000 (UTC) (envelope-from henrik@brixandersen.dk) Received: from solow.pil.dk (relay.pil.dk [195.41.47.164]) by mx1.freebsd.org (Postfix) with ESMTP id 9750813C48E; Wed, 31 Oct 2007 12:52:54 +0000 (UTC) (envelope-from henrik@brixandersen.dk) Received: from tirith.brixandersen.dk (osgiliath.brixandersen.dk [87.53.223.189]) by solow.pil.dk (Postfix) with ESMTP id 92D9B1CC0B8; Wed, 31 Oct 2007 13:15:49 +0100 (CET) Received: by tirith.brixandersen.dk (Postfix, from userid 1001) id C283F1141D; Wed, 31 Oct 2007 13:15:48 +0100 (CET) Date: Wed, 31 Oct 2007 13:15:48 +0100 From: Henrik Brix Andersen To: Pawel Jakub Dawidek Message-ID: <20071031121548.GD988@tirith.brixandersen.dk> Mail-Followup-To: Pawel Jakub Dawidek , Claus Guttesen , freebsd-fs@freebsd.org, freebsd-current@freebsd.org References: <20071005000046.GC92272@garage.freebsd.pl> <20071008121523.GM2327@garage.freebsd.pl> <20071015082629.GB1290@tirith.brixandersen.dk> <20071015094716.GG10170@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Izn7cH1Com+I3R9J" Content-Disposition: inline In-Reply-To: <20071015094716.GG10170@garage.freebsd.pl> X-PGP-Key: http://www.brixandersen.dk/files/HenrikBrixAndersen.asc User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org, Claus Guttesen Subject: Re: ZFS kmem_map too small. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Oct 2007 12:52:55 -0000 --Izn7cH1Com+I3R9J Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Pawel, On Mon, Oct 15, 2007 at 11:47:16AM +0200, Pawel Jakub Dawidek wrote: > This looks like VFS deadlock between process 42575 and 42585. You > debugging is really nice, but unfortunately there is no backtrace of > process 42575, so we don't know where and why it's stuck... >=20 > Can you next time take 'show lockedvnods', 'show alllocks' (as you did > this time), but also 'alltrace'? Here's a new debug session with the above requested information: http://www.brixandersen.dk/tmp/zfs-rsync-debug.2.txt As you can see, there were a couple of LORs before I entered the debugger. At least one of these (the last one) appeared when trying to list the files in one of the zfs filesystems, where rsync was hanging. This is with RELENG_7 from Thu Oct 25 14:30:37 CEST 2007. Regards, Brix --=20 Henrik Brix Andersen --Izn7cH1Com+I3R9J Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) Comment: GnuPG signed iD8DBQFHKHHzv+Q4flTiePgRApHAAKCyyZIAFik2FZckqseows4I+3gszgCeNsaS i9pP1XQkONml0Ts3/3EYJ4s= =53rb -----END PGP SIGNATURE----- --Izn7cH1Com+I3R9J-- From owner-freebsd-fs@FreeBSD.ORG Wed Oct 31 16:57:06 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EFA0E16A4C7 for ; Wed, 31 Oct 2007 16:57:06 +0000 (UTC) (envelope-from mashtizadeh@gmail.com) Received: from el-out-1112.google.com (el-out-1112.google.com [209.85.162.180]) by mx1.freebsd.org (Postfix) with ESMTP id 8F71D13C4B2 for ; Wed, 31 Oct 2007 16:57:06 +0000 (UTC) (envelope-from mashtizadeh@gmail.com) Received: by el-out-1112.google.com with SMTP id s27so51906ele for ; Wed, 31 Oct 2007 09:56:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; bh=fMyaM6SmYJU2lAjqU2tq3/jlEQWXHdSNJ2UW7DcdzQo=; b=lNDMVY7X8U3Z64VOJE8V1D1Q1/S9+EP/JTR9Bdofto125oX6Z5AQpm7I2+ru32N4Sul9L6NXe6evxbOxG903MKZpKTkbTQ7dYV6PS8Ovh5JdL78U777PGHN2H190qQbIGn2IRo2BJZlTeiRYtV/P47pjcR8qw7VuHnCQU8fz2ZA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=iZ+UA+StE1eEzpcSm1bz2PkOzHgePhm7Q8xBLSuwwXuy3ZIbbA52jiUtgyq6K0Ho0dzkOXF+6Rbv04VJ9kuJv8vYidYdlFw9JQFADkMffs4M0AFeST4fR9izLouEJOo2CZbunop4Q6k78Gp8KJAPJf6r6VwuUdQC2jBPNUSlhPA= Received: by 10.142.141.5 with SMTP id o5mr2103887wfd.1193845900506; Wed, 31 Oct 2007 08:51:40 -0700 (PDT) Received: by 10.142.254.15 with HTTP; Wed, 31 Oct 2007 08:51:40 -0700 (PDT) Message-ID: <440b3e930710310851x27325d3ane5804c82bf8edd2c@mail.gmail.com> Date: Wed, 31 Oct 2007 08:51:40 -0700 From: "Ali Mashtizadeh" To: "Pawel Jakub Dawidek" In-Reply-To: <20071009193051.GA13519@garage.freebsd.pl> MIME-Version: 1.0 References: <20071005000046.GC92272@garage.freebsd.pl> <20071008121523.GM2327@garage.freebsd.pl> <86bqb97mym.fsf@ds4.des.no> <20071008133846.GP2327@garage.freebsd.pl> <470BD649.9050505@freebsd.org> <20071009193051.GA13519@garage.freebsd.pl> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: base64 Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Darren Reed , freebsd-fs@freebsd.org, freebsd-current@freebsd.org, Dag-Erling Sm??rgrav Subject: Re: ZFS kmem_map too small. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Oct 2007 16:57:07 -0000 Rm9yIFpGUyBLVkFfUEFHRVM9NTEyIHNlZW1zIHRvIHJ1biB2ZXJ5IHN0YWJsZSBzbyBpdCdzIHVu Zm9ydHVuYXRlIHRoYXQgd2UKaGF2ZSB0byBnaXZlIHVwIG91ciBWQSBzcGFjZSBpbiB0aGUgdXNl cmxhbmQuIEkgd291bGQgcGVyc29uYWxseSBsaWtlIHRvIHNlZQphIG1vdmUgdG8gYSA0LTQgTWVt b3J5IG1vZGVsIG9yIGF0IGxlYXN0IGhhdmUgaXQgYXMgYSBjb21waWxlIHRpbWUgb3B0aW9uLgpE b2VzIGFueW9uZSBrbm93IGlmIHRoYXQncyBiZWVuIGF0dGVtcHRlZCBvbiBGcmVlQlNEPwoKQWxp CgpPbiAxMC85LzA3LCBQYXdlbCBKYWt1YiBEYXdpZGVrIDxwamRAZnJlZWJzZC5vcmc+IHdyb3Rl Ogo+Cj4gT24gVHVlLCBPY3QgMDksIDIwMDcgYXQgMTI6Mjg6MDlQTSAtMDcwMCwgRGFycmVuIFJl ZWQgd3JvdGU6Cj4gPiBQYXdlbCBKYWt1YiBEYXdpZGVrIHdyb3RlOgo+ID4gPi4uLgo+ID4gPgo+ ID4gPkZvciBpMzg2IG9uZSBoYXMgdG8gc2V0LCBlZy4gJ29wdGlvbnMgS1ZBX1BBR0VTPTUxMicg dG8gdGhlIGtlcm5lbAo+ID4gPmNvbmZpZyB0byBiZSBhYmxlIHRvIGRlZmluZSBrbWVtIGxhcmdl ciB0aGFuIH43MDBNQi4gSSBndWVzcyB5b3UncmUKPiA+ID5ydW5uaW5nIGFtZDY0LCBtYXliZSB0 aGVyZSBpcyBzaW1pbGFyIHJlcXVpcmVtZW50Pwo+ID4gPgo+ID4KPiA+IEdpdmVuIGhvdyBtdWNo IFJBTSBQQ3MgaGF2ZSB0aGVzZSBkYXlzLCB3aHkgaXNuJ3QgdGhpcyBhIGRlZmF1bHQgZm9yCj4g PiBHRU5FUklDPwo+ID4KPiA+IE9yIHdoeSBpc24ndCBpdCBhdCBsZWFzdCBhIHR1bmFibGUgcmF0 aGVyIHRoYW4gYW4gb3B0aW9uPwo+Cj4gVGhpcyBtYXkgYmUgYSBnb29kIHJlYXNvbiAtIHRvZGF5 J3MgUENzIGhhdmUgYSBsb3Qgb2YgUkFNIGFuZCBLVkFfUEFHRVMKPiBzcGxpdHMgYWRkcmVzcyBz cGFjZSBiZXR3ZWVuIHVzZXJsYW5kIGFuZCBrZXJuZWwgLSB0aGUgbW9yZSBhZGRyZXNzCj4gc3Bh Y2UgZm9yIHRoZSBrZXJuZWwsIHRoZSBsZXNzIGFkZHJlc3Mgc3BhY2UgZm9yIHRoZSB1c2VybGFu ZC4KPiBLVkFfUEFHRVM9NTEyIHNwbGl0cyA0R0IgYWRkcmVzcyBzcGFjZSBpbiBoYWxmLCBzbyB1 c2VybGFuZCBwcm9jZXNzZXMKPiBjYW4gYWRkcmVzcyBhdCBtb3N0IDJHQiBvZiBtZW1vcnkuCj4K PiAtLQo+IFBhd2VsIEpha3ViIERhd2lkZWsgICAgICAgICAgICAgICAgICAgICAgIGh0dHA6Ly93 d3cud2hlZWwucGwKPiBwamRARnJlZUJTRC5vcmcgICAgICAgICAgICAgICAgICAgICAgICAgICBo dHRwOi8vd3d3LkZyZWVCU0Qub3JnCj4gRnJlZUJTRCBjb21taXR0ZXIgICAgICAgICAgICAgICAg ICAgICAgICAgQW0gSSBFdmlsPyBZZXMsIEkgQW0hCj4KPgoKCi0tIApBbGkgTWFzaHRpemFkZWgK 2LnZhNuMINmF2LTYqtuMINiy2KfYr9mHCg==