From owner-freebsd-stable@FreeBSD.ORG  Tue Oct 14 11:20:06 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0EEEACF5;
 Tue, 14 Oct 2014 11:20:06 +0000 (UTC)
Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35])
 by mx1.freebsd.org (Postfix) with ESMTP id 775F7C84;
 Tue, 14 Oct 2014 11:20:04 +0000 (UTC)
Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534)
 id 99E9F20E70942; Tue, 14 Oct 2014 11:20:02 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
 smtp1.multiplay.co.uk
X-Spam-Level: *
X-Spam-Status: No, score=2.0 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX,
 FSL_HELO_NON_FQDN_1,RDNS_DYNAMIC autolearn=no version=3.3.1
Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170])
 by smtp1.multiplay.co.uk (Postfix) with ESMTPS id E634D20E70940;
 Tue, 14 Oct 2014 11:20:00 +0000 (UTC)
Message-ID: <138CF459AA0B41EB8CB4E11B3DE932CF@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Steven Hartland" <killing@multiplay.co.uk>, "K. Macy" <kmacy@freebsd.org>
References: <54372173.1010100@ijs.si>
 <644FA8299BF848E599B82D2C2C298EA7@multiplay.co.uk> <54372EBA.1000908@ijs.si>
 <DE7DD7A94E9B4F1FBB3AFF57EDB47C67@multiplay.co.uk> <543731F3.8090701@ijs.si>
 <543AE740.7000808@ijs.si> <A5BA41116A7F4B23A9C9E469C4146B99@multiplay.co.uk>
 <CAHM0Q_N+C=3qgUnyDkEugOFcL=J8gBjbTg8v45Vz3uT=e=Fn2g@mail.gmail.com>
 <6E01BBEDA9984CCDA14F290D26A8E14D@multiplay.co.uk>
 <CAHM0Q_OpV2sAQQAH6Cj_=yJWAOt8pTPWQ-m45JSiXDpBwT6WTA@mail.gmail.com>
 <E2E24A91B8B04C2DBBBC7E029A12BD05@multiplay.co.uk>
 <CAHM0Q_Oeka25-kdSDRC2evS1R8wuQ0_XgbcdZCjS09aXJ9_WWQ@mail.gmail.com>
 <14ADE02801754E028D9A0EAB4A16527E@multiplay.co.uk> <543C3C47.4010208@ijs.si>
 <E3C3C359999140B48943A0E1A04F83A9@multiplay.co.uk>
 <CAHM0Q_O7LNBiQAEjygANa+0rqm9cywjTPbNXabB4TePfEHAZsA@mail.gmail.com>
 <A7771879317F4194A1D5E4921CD33593@multiplay.co.uk>
Subject: Re: zpool import hangs when out of space - Was: zfs pool import hangs
 on [tx->tx_sync_done_cv]
Date: Tue, 14 Oct 2014 12:19:58 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
 reply-type=response
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>,
 FreeBSD Stable <freebsd-stable@freebsd.org>, mark <Mark.Martinec@ijs.si>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Oct 2014 11:20:06 -0000


----- Original Message ----- 
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "K. Macy" <kmacy@freebsd.org>
Cc: "freebsd-fs@FreeBSD.org" <freebsd-fs@freebsd.org>; "mark" <Mark.Martinec@ijs.si>; "FreeBSD Stable" <freebsd-stable@freebsd.org>
Sent: Tuesday, October 14, 2014 9:14 AM
Subject: Re: zpool import hangs when out of space - Was: zfs pool import hangs on [tx->tx_sync_done_cv]


> ----- Original Message ----- 
> From: "K. Macy" <kmacy@freebsd.org>
> 
> 
>>>> Thank you both for analysis and effort!
>>>>
>>>> I can't rule out the possibility that my main system pool
>>>> on a SSD was low on space at some point in time, but the
>>>> three 4 GiB cloned pools (sys1boot and its brothers) were all
>>>> created as a zfs send / receive copies of the main / (root)
>>>> file system and I haven't noticed anything unusual during
>>>> syncing. This syncing was done manually (using zxfer) and
>>>> independently from the upgrade on the system - on a steady/quiet
>>>> system, when the source file system definitely had sufficient
>>>> free space.
>>>>
>>>> The source file system now shows 1.2 GiB of usage shown
>>>> by df:
>>>>   shiny/ROOT  61758388  1271620  60486768  2%  /
>>>> Seems unlikely that the 1.2 GiB has grown to 4 GiB space
>>>> on a cloned filesystem.
>>>>
>>>> Will try to import the main two pools after re-creating
>>>> a sane boot pool...
>>>
>>>
>>> Yer zfs list only shows around 2-3GB used too but zpool list
>>> shows the pool is out of space. Cant rule out an accounting
>>> issue though.
>>>
>> 
>> What is using the extra space in the pool? Is there an unmounted
>> dataset or snapshot? Do you know how to easily tell? Unlike txg and
>> zio processing I don't have the luxury of having just read that part
>> of the codebase.
> 
> Its not clear but I believe it could just be fragmention even though
> its ashift=9.
> 
> I sent the last snapshot to another pool of the same size and it
> resulted in:
> NAME       SIZE  ALLOC   FREE   FRAG  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
> sys1boot  3.97G  3.97G   190K     0%         -    99%  1.00x  ONLINE  -
> sys1copy  3.97G  3.47G   512M    72%         -    87%  1.00x  ONLINE  -
> 
> I believe FRAG is 0% as the feature wasn't enabled for the lifetime of
> the pool hence its simply not showing a valid value.
> 
> zfs list -t all -r sys1boot
> NAME                                  USED  AVAIL  REFER  MOUNTPOINT
> sys1boot                             1.76G  2.08G    11K  /sys1boot
> sys1boot/ROOT                        1.72G  2.08G  1.20G  /sys1boot/ROOT
> sys1boot/ROOT@auto-2014-08-16_04.00     1K      -  1.19G  -
> sys1boot/ROOT@auto-2014-08-17_04.00     1K      -  1.19G  -
..

Well interesting issue I left this pool alone this morning literally doing
nothing, and its now out of space.
zpool list
NAME       SIZE  ALLOC   FREE   FRAG  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
sys1boot  3.97G  3.97G   190K     0%         -    99%  1.00x  ONLINE  -
sys1copy  3.97G  3.97G     8K     0%         -    99%  1.00x  ONLINE  -

There's something very wrong here as nothing has been accessing the pool.

  pool: zfs
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://illumos.org/msg/ZFS-8000-HC
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zfs         ONLINE       0     2     0
          md1       ONLINE       0     0     0

I tried destroying the pool and ever that failed, presumably because
the pool has suspended IO.

    Regards
    Steve