From owner-freebsd-fs@freebsd.org Tue Oct 1 12:09:41 2019 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 9940713513C for ; Tue, 1 Oct 2019 12:09:41 +0000 (UTC) (envelope-from reshadpatuck1@gmail.com) Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46jJ3m2ZNfz3Jbv for ; Tue, 1 Oct 2019 12:09:40 +0000 (UTC) (envelope-from reshadpatuck1@gmail.com) Received: by mail-qk1-x72a.google.com with SMTP id x134so10955497qkb.0 for ; Tue, 01 Oct 2019 05:09:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4FS0zHAQ0CdM4B586NH/oEb8LsZ+gkJrh/CSxdPwrV8=; b=lEl6p8VbQq5mMhY/LUA41CdwxIR3eLOSi1wfNVIXkrkRq0V/QT2dYcHEFg940XLmqc m9AzKyQ+SqCcIqTHkzHS2j1EEzmY9d83i2gvDyLGth45W68YuTbGsxq+RyJsJQowMV8Y 6RB6gwsg0Q4RbE/QhxdrufSAiA6H8B/lb6YL0wOnuAeF6oW8dadVtCGIONG4FK6bFAdg XF/hUJRYDm25upVEUn68fFh1F/cDTHZfgc/WyczX1BUcz8gliJTfUpdis3ol50D+BBVE 7wd145HfAzuG8unGzdUI+bTj/Q+r0xQBz08ad5voHKURGBjK+u9YDi448s2zODoTY9YU 3S+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4FS0zHAQ0CdM4B586NH/oEb8LsZ+gkJrh/CSxdPwrV8=; b=N2ILOYb1dKwNz+fo/IZ+hVDYAPwuOKFExeplmtOBj3NTCyYhQAo/8diG/6jox29t/v cK9OSY0lpQW/QEvNRv1A31PkLGJCJb1J+1cDDvjqS/fTVeizOYOl+nmob8/xddr6fjnn TJ7BTCrU+7Qa38OpbcHSdywXdrjtbHmdktlH/FETOiwyANsIw+JC8ogiHcVUpw9iKMSI k+mePRVGUDKPN4k1j9tkY526Pd9Z1gZmQ3J4mI3dBf+2lmELUbYQ2PWnhZfnfN2SlSmd obhoQQLsPp4XiFX3bpYu+513Rufxs/oIggrN2ElUaCzk/E+3Rpv3s73y8yhBjHKqsNtz IE5A== X-Gm-Message-State: APjAAAUUGmYoQaKxUwJng5rNyhyWG/5G39XqfEGcBQcvWC3tpv7CxrQm 9ZormI1tpZXr/ESff9Ot473Md81QpciPBfRQzlg= X-Google-Smtp-Source: APXvYqzcvO+7WpkZUaBgxHCE5BdWxYY8IQ4a7K0QVh5rwCw5NkmO875KpEuBiI/nfQuWkOUerGnIHZMM+253xu52GD0= X-Received: by 2002:ae9:e609:: with SMTP id z9mr5579953qkf.50.1569931779109; Tue, 01 Oct 2019 05:09:39 -0700 (PDT) MIME-Version: 1.0 References: <20191001082837.GF49734@home.lan> <20191001110901.GL49734@home.lan> In-Reply-To: <20191001110901.GL49734@home.lan> From: Reshad Patuck Date: Tue, 1 Oct 2019 17:39:27 +0530 Message-ID: Subject: Re: [zfs] filesystem reads hanging To: Julien Cigar Cc: FreeBSD FS X-Rspamd-Queue-Id: 46jJ3m2ZNfz3Jbv X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=lEl6p8Vb; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of reshadpatuck1@gmail.com designates 2607:f8b0:4864:20::72a as permitted sender) smtp.mailfrom=reshadpatuck1@gmail.com X-Spamd-Result: default: False [-3.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[a.2.7.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; IP_SCORE(0.00)[ip: (-9.36), ipnet: 2607:f8b0::/32(-2.58), asn: 15169(-2.16), country: US(-0.05)]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Oct 2019 12:09:41 -0000 Hi Julien, Thanks, I will give it a shot, and check if it occurs again. Best, Reshad On Tue, Oct 1, 2019 at 4:39 PM Julien Cigar wrote: > On Tue, Oct 01, 2019 at 03:46:40PM +0530, Reshad Patuck wrote: > > Hi Julien, > > Hi Reshad, > > > > > I did come across that one an hour or so back, can you let me know if > there > > is any way to confirm that it is the same issue I am running up against. > > The command `procstat -kka` does have very similar (and in some cases > > identical) output to the lines in the PR mentioned. > > > > I'm confident that it's the same issue. > > > Unfortunately I need to stick to 12.0 till 12.1 is out, any idea if I can > > merge the same change into 12.0 and compile it? > > I can see the changes in the 12.1 branch, just wondering if I should jump > > to the beta or wait it out if I cant compile it into 12.0. > > > > I can speak only for myself, but applying > https://bugs.freebsd.org/bugzilla/attachment.cgi?id=202890&action=diff > fixed the issue for me. > > > Thanks for your help, > > > > Reshad > > > > cheers, > Julien > > > > > On Tue, Oct 1, 2019 at 1:58 PM Julien Cigar > wrote: > > > > > On Tue, Oct 01, 2019 at 10:26:32AM +0530, Reshad Patuck wrote: > > > > Hi, > > > > > > Hello, > > > > > > > > > > > I have a FreeBSD 12.0-RELEASE-p9 system running ZFS. > > > > The system runs an application that uses postgres, and python (among > > > other > > > > services). > > > > > > > > I have noticed that python suddenly is not able to connect to > postgres. > > > > When I try to investigate further, certain files on disk can not be > read. > > > > The commands `cat` and `ls -l` hang (no output and I can not ctrl-c > or > > > kill > > > > -9 them), ps -aux shows them in a D+ state. > > > > On killing the SSH session these processes continue running in > orphans, I > > > > am not able to kill them. > > > > > > > > Someone on IRC suggested running a zfs scrub to check for data > > > corruption, > > > > but running `zpool scrub zroot` has the same effect. > > > > The command does not return, ctrl-c does not kill it and `zpool > scrub -s > > > > zroot` says "cannot cancel scrubbing zroot: there is no active > scrub". > > > > > > > > This has happened in the past 1 month to two of my production > servers and > > > > since the application was critical they were rebooted and the boxes > > > > function as normal after the reboot. > > > > Files that were not cat-able on the production servers were working > fine > > > > and a zfs scrub worked fine to show 0 errors and 0 fixes. > > > > One of these boxes needed a hard reboot as it got stuck in the > shutting > > > > down stage of a soft reboot. > > > > > > > > I am not sure where to start debugging this or if there are any ways > to > > > get > > > > metrics on a box stuck in this state. > > > > Please let me know if you would like me to fetch any metrics or run > and > > > > commands, etc. for you. > > > > Any help would be much appreciated. > > > > > > This is a known problem (see PR 236220) and has been fixed by r350894 > > > (and MFC-ed into 12-STABLE, so I guess it should be in the upcoming > > > 12.1-RELEASE) > > > > > > > > > > > Best regards, > > > > > > > > Reshad > > > > _______________________________________________ > > > > freebsd-fs@freebsd.org mailing list > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > " > > > > > > -- > > > Julien Cigar > > > Belgian Biodiversity Platform (http://www.biodiversity.be) > > > PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 > > > No trees were killed in the creation of this message. > > > However, many electrons were terribly inconvenienced. > > > > > -- > Julien Cigar > Belgian Biodiversity Platform (http://www.biodiversity.be) > PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 > No trees were killed in the creation of this message. > However, many electrons were terribly inconvenienced. >