Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Oct 2019 10:26:32 +0530
From:      Reshad Patuck <reshadpatuck1@gmail.com>
To:        freebsd-fs@freebsd.org
Subject:   [zfs] filesystem reads hanging
Message-ID:  <CADaJeD24HV0eW7nQT9jaQwEWp=1f4J2WL3OOLZiv--v1zyepwQ@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi,

I have a FreeBSD 12.0-RELEASE-p9 system running ZFS.
The system runs an application that uses postgres, and python (among other
services).

I have noticed that python suddenly is not able to connect to postgres.
When I try to investigate further, certain files on disk can not be read.
The commands `cat` and `ls -l` hang (no output and I can not ctrl-c or kill
-9 them), ps -aux shows them in a D+ state.
On killing the SSH session these processes continue running in orphans, I
am not able to kill them.

Someone on IRC suggested running a zfs scrub to check for data corruption,
but running `zpool scrub zroot` has the same effect.
The command does not return, ctrl-c does not kill it and `zpool scrub -s
zroot` says "cannot cancel scrubbing zroot: there is no active scrub".

This has happened in the past 1 month to two of my production servers and
since the application was critical they were rebooted and the boxes
function as normal after the reboot.
Files that were not cat-able on the production servers were working fine
and a zfs scrub worked fine to show 0 errors and 0 fixes.
One of these boxes needed a hard reboot as it got stuck in the shutting
down stage of a soft reboot.

I am not sure where to start debugging this or if there are any ways to get
metrics on a box stuck in this state.
Please let me know if you would like me to fetch any metrics or run and
commands, etc. for you.
Any help would be much appreciated.

Best regards,

Reshad



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADaJeD24HV0eW7nQT9jaQwEWp=1f4J2WL3OOLZiv--v1zyepwQ>