From owner-freebsd-current@FreeBSD.ORG  Sat May 25 17:52:13 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id E63CDBE;
 Sat, 25 May 2013 17:52:13 +0000 (UTC)
 (envelope-from roger.pau@citrix.com)
Received: from SMTP.EU.CITRIX.COM (smtp.eu.citrix.com [46.33.159.39])
 by mx1.freebsd.org (Postfix) with ESMTP id B366FCFB;
 Sat, 25 May 2013 17:52:12 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="4.87,741,1363132800"; 
   d="scan'208";a="4989012"
Received: from lonpex01cl01.citrite.net ([10.30.203.101])
 by LONPIPO01.EU.CITRIX.COM with ESMTP/TLS/AES128-SHA;
 25 May 2013 17:52:06 +0000
Received: from [192.168.1.30] (10.30.203.1) by LONPEX01CL01.citrite.net
 (10.30.203.101) with Microsoft SMTP Server id 14.2.342.3; Sat, 25 May 2013
 18:52:04 +0100
Message-ID: <51A0FA43.2040503@citrix.com>
Date: Sat, 25 May 2013 19:52:03 +0200
From: =?ISO-8859-1?Q?Roger_Pau_Monn=E9?= <roger.pau@citrix.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
 rv:17.0) Gecko/20130509 Thunderbird/17.0.6
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
Subject: Re: FreeBSD-HEAD gets stuck on vnode operations
References: <5190CBEC.5000704@citrix.com> <20130514163149.GS3047@kib.kiev.ua>
 <51927143.4080102@citrix.com> <201305201434.55406.jhb@freebsd.org>
In-Reply-To: <201305201434.55406.jhb@freebsd.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.30.203.1]
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-current@freebsd.org,
 "current@freebsd.org" <current@freebsd.org>
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 May 2013 17:52:14 -0000

On 20/05/13 20:34, John Baldwin wrote:
> On Tuesday, May 14, 2013 1:15:47 pm Roger Pau Monné wrote:
>> On 14/05/13 18:31, Konstantin Belousov wrote:
>>> On Tue, May 14, 2013 at 06:08:45PM +0200, Roger Pau Monn? wrote:
>>>> On 13/05/13 17:00, Konstantin Belousov wrote:
>>>>> On Mon, May 13, 2013 at 04:33:04PM +0200, Roger Pau Monn? wrote:
>>>>>> On 13/05/13 13:18, Roger Pau Monn? wrote:
>>>>
>>>> Thanks for taking a look,
>>>>
>>>>>> I would like to explain this a little bit more, the syncer process
>>>>>> doesn't get blocked on the _mtx_trylock_flags_ call, it just continues
>>>>>> looping forever in what seems to be an endless loop around
>>>>>> mnt_vnode_next_active/ffs_sync. Also while in this state there is no
>>>>>> noticeable disk activity, so I'm unsure of what is happening.
>>>>> How many CPUs does your VM have ?
>>>>
>>>> 7 vCPUs, but I've also seen this issue with 4 and 16 vCPUs.
>>>>
>>>>>
>>>>> The loop you describing means that other thread owns the vnode
>>>>> interlock. Can you track what this thread does ? E.g. look at the
>>>>> vp->v_interlock.mtx_lock, which is basically a pointer to the struct
>>>>> thread owning the mutex, clear low bits as needed. Then you can
>>>>> inspect the thread and get a backtrace.
>>>>
>>>> There are no other threads running, only syncer is running on CPU 1 (see
>>>> ps in previous email). All other CPUs are idle, and as seen from the ps
>>>> quite a lot of threads are blocked in vnode related operations, either
>>>> "*Name Cac", "*vnode_fr" or "*vnode in". I've also attached the output
>>>> of alllocks in the previous email.
>>> This is not useful.  You need to look at the mutex which fails the
>>> trylock operation in the mnt_vnode_next_active(), see who owns it,
>>> and then 'unwind' the locking dependencies from there.
>>
>> Sorry, now I get it, let's see if I can find the locked vnodes and the
>> thread that owns them...
> 
> You can use 'show lock <address of vp->v_interlock>' to find an owning
> thread and then use 'show sleepchain <thread>'.  If you are using kgdb on the 
> live system (probably easier) then you can grab my scripts at 
> www.freebsd.org/~jhb/gdb/ (do 'cd /path/to/scripts; source gdb6').  You can 
> then find the offending thread and do 'mtx_owner &vp->v_interlock' and then
> 'sleepchain <tid>'

Hello,

Sorry for the delay in debugging this, but the fact that the issue 
itself is hard to reproduce, and that I didn't have much time prevented 
me from digging into this. It seems like the locked node is always the 
same, or at least I haven't been able to see this loop with more than 
one node locked (so far):

db> show lock 0xfffffe0030cdf448
 class: sleep mutex
 name: vnode interlock
 flags: {DEF}
 state: {OWNED}
 owner: 0xfffffe008d1e9000 (tid 101020, pid 66630, "cc")
db> show sleepchain 0xfffffe008d1e9000
thread 101020 (pid 66630, cc) inhibited
db> tr 66630
Tracing pid 66630 tid 101020 td 0xfffffe008d1e9000
sched_switch() at sched_switch+0x482/frame 0xffffff8120ff3630
mi_switch() at mi_switch+0x179/frame 0xffffff8120ff3670
turnstile_wait() at turnstile_wait+0x3ac/frame 0xffffff8120ff36c0
__mtx_lock_sleep() at __mtx_lock_sleep+0x255/frame 0xffffff8120ff3740
__mtx_lock_flags() at __mtx_lock_flags+0xda/frame 0xffffff8120ff3780
vdropl() at vdropl+0x255/frame 0xffffff8120ff37b0
vputx() at vputx+0x27c/frame 0xffffff8120ff3810
namei() at namei+0x3dd/frame 0xffffff8120ff38c0
kern_statat_vnhook() at kern_statat_vnhook+0x99/frame 0xffffff8120ff3a40
sys_stat() at sys_stat+0x2d/frame 0xffffff8120ff3ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xffffff8120ff3bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffffff8120ff3bf0
--- syscall (188, FreeBSD ELF64, sys_stat), rip = 0x18280ea, rsp = 0x7fffffffa148, rbp = 0x7fffffffa170 ---
db> ps
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
66630 66591 66588     0  L+     *vnode_fr 0xfffffe000bc9fa80 cc
66591 66590 66588     0  S+      wait     0xfffffe0030dbd4b8 cc
66590 66588 66588     0  S+      wait     0xfffffe008de20000 sh
66588 66572 66588     0  S+      wait     0xfffffe008d8a8970 sh
66572 49649 49649     0  S+      select   0xfffffe003029dac0 make
49649 49631 49649     0  S+      wait     0xfffffe000d801970 sh
49631 49629 49629     0  S+      select   0xfffffe00302b3cc0 make
49629 49614 49629     0  S+      wait     0xfffffe008dbfa970 sh
49614 45214 45214     0  S+      select   0xfffffe00303e3640 make
45214 45207 45214     0  S+      wait     0xfffffe008dbfa000 sh
45207 45205 45200     0  S+      select   0xfffffe000d66d3c0 make
45205 45200 45200     0  S+      wait     0xfffffe008da1b4b8 sh
45200   757 45200     0  S+      select   0xfffffe000d6093c0 make
30611   737 30611     0  S+      ttyin    0xfffffe000bd70ca8 bash
 2325   756  2325     0  Ss+     ttyin    0xfffffe008dd6eca8 bash
  757   756   757     0  Ss+     wait     0xfffffe000d7fd4b8 bash
  756   755   756     0  Ss      select   0xfffffe000d783a40 screen
  755   744   755     0  S+      pause    0xfffffe000d9560a8 screen
  744   743   744     0  S+      wait     0xfffffe000d4d6970 bash
  743   742   743  1001  S+      wait     0xfffffe000d4d5000 su
  742   741   742  1001  Ss+     wait     0xfffffe000d6674b8 sh
  741   738   738  1001  S       select   0xfffffe000d60a8c0 sshd
  738   680   738     0  Ss      select   0xfffffe000d60a940 sshd
  737     1   737     0  Ss+     wait     0xfffffe000d605970 login
  736     1   736     0  Ss+     ttyin    0xfffffe000bd718a8 getty
  735     1   735     0  Ss+     ttyin    0xfffffe000bd71ca8 getty
  734     1   734     0  Ss+     ttyin    0xfffffe000bfcc0a8 getty
  733     1   733     0  Ss+     ttyin    0xfffffe000bfcc4a8 getty
  732     1   732     0  Ss+     ttyin    0xfffffe000bfcc8a8 getty
  731     1   731     0  Ss+     ttyin    0xfffffe000bd700a8 getty
  730     1   730     0  Ss+     ttyin    0xfffffe000bd704a8 getty
  729     1   729     0  Ss+     ttyin    0xfffffe000bd708a8 getty
  690     1   690     0  Ss      nanslp   0xffffffff81343734 cron
  686     1   686    25  Ss      pause    0xfffffe000d77a560 sendmail
  683     1   683     0  Ss      select   0xfffffe000d7834c0 sendmail
  680     1   680     0  Ss      select   0xfffffe000d4a5e40 sshd
  580     1   580     0  Ss      select   0xfffffe000d66d040 syslogd
  486     1   486     0  Ss      select   0xfffffe000d783540 devd
  485     1   485    65  Ss      select   0xfffffe000d962040 dhclient
  445     1   445     0  Ss      select   0xfffffe000d66d1c0 dhclient
   18     0     0     0  DL      sdflush  0xffffffff8153675c [softdepflush]
   17     0     0     0  DL      vlruwt   0xfffffe000d4d5970 [vnlru]
    9     0     0     0  RL      CPU 2                       [syncer]
    8     0     0     0  DL      psleep   0xffffffff8152f72c [bufdaemon]
    7     0     0     0  DL      pgzero   0xffffffff81537e8c [pagezero]
    6     0     0     0  DL      psleep   0xffffffff815370d0 [vmdaemon]
    5     0     0     0  DL      psleep   0xffffffff815b3304 [pagedaemon]
    4     0     0     0  DL      ccb_scan 0xffffffff813038b0 [xpt_thrd]
    3     0     0     0  DL      waiting_ 0xffffffff815a73e0 [sctp_iterator]
    2     0     0     0  DL      balloon  0xffffffff8079d310 [balloon]
   16     0     0     0  SL      xbread   0xfffffe00feffc000 [xenstore_rcv]
   15     0     0     0  SL      waitev   0xffffffff81538098 [xenwatch]
   14     0     0     0  DL      -        0xffffffff8131f988 [yarrow]
   13     0     0     0  DL      (threaded)                  [geom]
100021                   D       -        0xffffffff8159bc70 [g_down]
100020                   D       -        0xffffffff8159bc68 [g_up]
100019                   D       -        0xffffffff8159bc60 [g_event]
   12     0     0     0  WL      (threaded)                  [intr]
100050                   I                                   [irq778: xn0]
100049                   I                                   [irq777: xbd1]
100048                   I                                   [irq776: xbd0]
100042                   I                                   [irq7: ppc0]
100041                   I                                   [swi0: uart]
100040                   I                                   [irq12: psm0]
100039                   I                                   [irq1: atkbd0]
100036                   I                                   [irq775: xenstore0]
100035                   I                                   [irq15: ata1]
100034                   I                                   [irq14: ata0]
100032                   I                                   [swi5: fast taskq]
100030                   I                                   [swi6: Giant taskq]
100029                   I                                   [swi6: task queue]
100024                   I                                   [swi2: cambio]
100018                   I                                   [swi3: vm]
100017                   I                                   [swi4: clock]
100016                   I                                   [swi4: clock]
100015                   I                                   [swi4: clock]
100014                   I                                   [swi4: clock]
100013                   I                                   [swi4: clock]
100012                   I                                   [swi4: clock]
100011                   I                                   [swi4: clock]
100010                   I                                   [swi1: netisr 0]
   11     0     0     0  RL      (threaded)                  [idle]
100009                   Run     CPU 6                       [idle: cpu6]
100008                   Run     CPU 5                       [idle: cpu5]
100007                   Run     CPU 4                       [idle: cpu4]
100006                   Run     CPU 3                       [idle: cpu3]
100005                   CanRun                              [idle: cpu2]
100004                   Run     CPU 1                       [idle: cpu1]
100003                   Run     CPU 0                       [idle: cpu0]
    1     0     1     0  SLs     wait     0xfffffe000bc77970 [init]
   10     0     0     0  DL      audit_wo 0xffffffff815aaae0 [audit]
    0     0     0     0  DLs     (threaded)                  [kernel]
100044                   D       -        0xffffffff81342593 [deadlkres]
100043                   D       -        0xfffffe000bfc5300 [mca taskq]
100033                   D       -        0xfffffe000bcaa600 [kqueue taskq]
100031                   D       -        0xfffffe000bcaaa00 [thread taskq]
100028                   D       -        0xfffffe000be5c100 [ffs_trim taskq]
100027                   D       -        0xfffffe000bca1300 [acpi_task_2]
100026                   D       -        0xfffffe000bca1300 [acpi_task_1]
100025                   D       -        0xfffffe000bca1300 [acpi_task_0]
100022                   D       -        0xfffffe000bc6e300 [firmware taskq]
100000                   D       sched    0xffffffff8159bd68 [swapper]