From owner-freebsd-bugs@freebsd.org Wed Jul 12 22:12:25 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C1079DA639C for ; Wed, 12 Jul 2017 22:12:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AB55B79C37 for ; Wed, 12 Jul 2017 22:12:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v6CMCPDS067554 for ; Wed, 12 Jul 2017 22:12:25 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 220693] head -r320570 & -r320760 (e.g.): ufs snapshot creation broken & leads to fsck -B related SSD-trim "freeing free block" panics; more Date: Wed, 12 Jul 2017 22:12:25 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: markmi@dsl-only.net X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jul 2017 22:12:25 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D220693 Bug ID: 220693 Summary: head -r320570 & -r320760 (e.g.): ufs snapshot creation broken & leads to fsck -B related SSD-trim "freeing free block" panics; more Product: Base System Version: CURRENT Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: markmi@dsl-only.net See also the exchange of list submittals associated with: https://lists.freebsd.org/pipermail/freebsd-current/2017-July/066505.html and: https://lists.freebsd.org/pipermail/freebsd-current/2017-July/066508.html I free quote material from these without attribution here. . . Basic context material . . . As I remember it happened to be that the reporting folks were using non-debug/non-invariant kernel builds. Multiple TARGET_ARCH's, 32-bit and 64-bit, little-endian and big-endian. The basic create-snapshot test that fails: After a short pause with disk activity, the same sorts of errors are=20 logged when using "mksnap_ffs /.snap2" where .snap2 did not previously=20 exist The type of messages was (e.g.): g_vfs_done():ada0s3a[READ(offset=3D6050375794688, length=3D32768)]error =3D= 5 Jul 7 00:10:24 toshi kernel Note the huge offset: such is true of the messages in general. Also the messages are from the kernel and its nmount related snapshot creation activity, not from the user-space program. The original list-notice was about dump (and its snapshot creation) but the issue is not specific to dump. fsck -B related panic material. . . My original context for this: 32-bit powerpc. boot -s (so: single user mode) # The next 3 lines are the content of a generic, manually-run script. mount -u / mount -a -t ufs (but there is no other file system) swapon -a (there is a swap partition) # fsck -B That "fsck -B" caused the same kinds of lines reported by Michael Butler, happening as fsck makes a snapshot for the background processing to use. After the g_vfs_done lines was text like (typed in from an example camera picture): ** //.snap/fsck_snapshot ** Last Mount on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups Reclaimed: 0 directories, 1 files, 22680 fragments 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 1.= 8% fragmentation) ***** FILE SYSTEM MARKED CLEAN ***** But always waiting a while leads to a panic that looks like (showing an example): (Note: context is an SSD with trim enabled) (typed in from camera picture) panic: ffs_blkfree_cq: freeing free block cpuid =3D 2 (varies, of course) time =3D (varies) KDB: stack backtrace (stack addresses can vary: just an example here) 0xd23b17e0: at kdb_backtrace+0x5c 0xd23b1850: at vpanic+0x1e8 0xd23b18c0: at panic+0x54 0xd23b1910: at ffs_blkfree_cq+0x278 0xd23b1980: at ffs_blkfree_trim_task+0x60 0xd23b19b0: at taskqueue_run_locked+0x10 0xd23b1a10: at taskqueue_thread_loop+0x174 0xd23b1a50: at fork_exit+0xf4 0xd23b1a80: at fork_trampoline+0xc KDB: enter: panic [ thread pid 0 tid 1000082 ] Stopped at kdb_enter_0x70: addi r0,r0,0x0 I've tried this on a powerpc64 and it works the same, complete with the "freeing free block" issue. I've also had the problem with a normal multi-user boot that initiated a fsck -B automatically in a context where the SSD had not been marked clean. To avoid this and fix such file systems I've been booting with "boot -s" and using "fsck -F" from the single-user command prompt. Unfortunately two problems with major consequences for my involved context limit the svn range that I can cover for the activity, the problem version ranges being: -r319722 through -r320651 (fixed by -r320652) (actually this is why I had originally used "boot -s" in what I report above: I could get to a shell prompt that way instead of crashing before any login prompt; the crashes left the file system in need of repair) -r320509 through -r320561 (fixed by -r320570) So I was using -r320570 to avoid one of the two problems, now with a trail patch for what was later fixed in -r320652. I do not know if the problem was present back before -r319722 or before -r320509. --=20 You are receiving this mail because: You are the assignee for the bug.=