Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 May 2021 05:52:10 -0700 (PDT)
From:      "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>, freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: FYI for aarch64 main [14] running a mid March version: I ended up with [usb{usbus2}] stuck at (near) 100% cpu
Message-ID:  <202105141252.14ECqA0h081135@gndrsh.dnsmgr.net>
In-Reply-To: <F8475C15-FBE4-40D3-B3D3-1F7E5A671D86@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> Note: The context was using a non-debug main build
>       from mid-2021-Mar. (More details identified
>       later.)
> 
> The issue happend while attempting a:
> 
> # zfs send -R zpold@for-copy | zfs recv -Fdv zpnew
> 
> where the drives involved in the command were:
> 
> zpold: a USB3 SSD, using /dev/da0p3
> zpnew: an 480 GiByte Optane in the PCIe slot, using /dev/nda0p3
> 
> with:
> 
> # gpart show -pl
> =>       40  468862048    da0  GPT  (224G)
>          40     532480  da0p1  4C8GCA72EFI  (260M)
>      532520       2008         - free -  (1.0M)
>      534528   29360128  da0p2  4C8GCA72swp14  (14G)
>    29894656    4194304         - free -  (2.0G)
>    34088960   33554432  da0p4  4C8GCA72swp16  (16G)
>    67643392  401217536  da0p3  4C8GCA72zfs  (191G)
>   468860928       1160         - free -  (580K)
> 
> =>        40  2000409184    ada0  GPT  (954G)
>           40      409600  ada0p1  (null)  (200M)
>       409640  1740636160  ada0p2  FBSDmacchroot  (830G)
>   1741045800    58720256  ada0p3  FBSDmacchswp0  (28G)
>   1799766056   176160768  ada0p4  FBSDmacchswp1  (84G)
>   1975926824    24482400          - free -  (12G)
> 
> =>       40  937703008    nda0  GPT  (447G)
>          40     532480  nda0p1  CA72opt0EFI  (260M)
>      532520       2008          - free -  (1.0M)
>      534528  117440512  nda0p2  CA72opt0swp56  (56G)
>   117975040   16777216          - free -  (8.0G)
>   134752256  134217728  nda0p4  CA72opt0swp64  (64G)
>   268969984  668731392  nda0p3  CA72opt0zfs  (319G)
>   937701376       1672          - free -  (836K)
> 
> The system running was that on /dev/ada0p2 (FBSDmacchroot,
> which is UFS instead of ZFS).
> 
> The [usb{usbus2}] process eventually got stuck-busy, no
> more I/O:
> 
> CPU 0:  0.0% user,  0.0% nice,  100% system,  0.0% interrupt,  0.0% idle
> CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> CPU 3:  0.4% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.6% idle
> 
>   PID USERNAME    PRI NICE     SIZE       RES STATE    C   TIME     CPU COMMAND
>    15 root        -72    -       0B   262144B CPU0     0   8:51  99.95% [usb{usbus2}]
> 
>  1295 root         -8    0  20108Ki    8092Ki q->bq_   2   0:04   0.00% zfs recv -Fdv zpnew{receive_writer_thre}
>  1295 root         48    0  20108Ki    8092Ki piperd   2   0:22   0.00% zfs recv -Fdv zpnew{zfs}
>  1294 root         -8    0  17544Ki    7740Ki q->bq_   2   0:01   0.00% zfs send -R zpold@for-copy{send_reader_thread}
>  1294 root         -8    0  17544Ki    7740Ki q->bq_   0   0:00   0.00% zfs send -R zpold@for-copy{send_merge_thread}
>  1294 root         -8    0  17544Ki    7740Ki hdr->b   2   0:00   0.00% zfs send -R zpold@for-copy{send_traverse_threa}
>  1294 root         52    0  17544Ki    7740Ki range-   3   0:20   0.00% zfs send -R zpold@for-copy{zfs}
> 
>  1036 root         -8    -       0B    1488Ki t->zth   0   0:00   0.00% [zfskern{z_checkpoint_discar}]
>  1036 root         -8    -       0B    1488Ki t->zth   1   0:00   0.00% [zfskern{z_livelist_condense}]
>  1036 root         -8    -       0B    1488Ki t->zth   2   0:00   0.00% [zfskern{z_livelist_destroy}]
>  1036 root         -8    -       0B    1488Ki t->zth   1   0:00   0.00% [zfskern{z_indirect_condense}]
>  1036 root         -8    -       0B    1488Ki mmp->m   3   0:00   0.00% [zfskern{mmp_thread_enter}]
>  1036 root         -8    -       0B    1488Ki tx->tx   1   0:00   0.00% [zfskern{txg_thread_enter}]
>  1036 root         -8    -       0B    1488Ki tx->tx   2   0:00   0.00% [zfskern{txg_thread_enter}]
> 
> I was unable to ^c or ^z the process where I
> typed the command. I eventually stopped the
> system with "shutdown -p now" from a ssh
> session (that had already been in place).

Should this occur again before doing the shutdown run a
zpool status &
I have gotten in this state when the recv pool was a usb device
and for some reason it had a timeout and gone offline.  The clue
this occured are in dmesg, and zpool status.

Unplug/plug the USB device, check dmesg that it came online,
and do a zpool clear.

> 
> When I retried after rebooting and scrubbing (no
> problems found), the problem did not repeat.
> 
> I do not have more information nor a way to repeat
> the problem on demand, unfortunately.
> 
> Details of the vintage of the system software and
> such:
> 
> # ~/fbsd-based-on-what-freebsd-main.sh 
> FreeBSD FBSDmacch 14.0-CURRENT FreeBSD 14.0-CURRENT mm-src-n245445-def0058cc690 GENERIC-NODBG  arm64 aarch64 1400005 1400005
> def0058cc690 (HEAD -> mm-src) mm-src snapshot for mm's patched build in git context.
> merge-base: 7381bbee29df959e88ec59866cf2878263e7f3b2
> merge-base: CommitDate: 2021-03-12 20:29:42 +0000
> 7381bbee29df (freebsd/main, freebsd/HEAD, pure-src, main) cam: Run all XPT_ASYNC ccbs in a dedicated thread
> n245444 (--first-parent --count for merge-base)
> 
> The system was a MACCHIATObin Double Shot.
> 
> ===
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)
> 
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
> 

-- 
Rod Grimes                                                 rgrimes@freebsd.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202105141252.14ECqA0h081135>