From owner-freebsd-stable@FreeBSD.ORG Thu May 8 16:20:24 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BA98598B; Thu, 8 May 2014 16:20:24 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8E768346; Thu, 8 May 2014 16:20:24 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 9019EB9DD; Thu, 8 May 2014 12:20:23 -0400 (EDT) From: John Baldwin To: hiren panchasara Subject: Re: stable/10 panic Date: Thu, 8 May 2014 12:19:27 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <1398097892.1101.6.camel@powernoodle.corp.yahoo.com> <201405071431.38812.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201405081219.27412.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 08 May 2014 12:20:23 -0400 (EDT) Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 May 2014 16:20:24 -0000 On Thursday, May 08, 2014 12:39:20 am hiren panchasara wrote: > On Wed, May 7, 2014 at 11:31 AM, John Baldwin wrote: > >> 1639. /* > >> 1640. * shutdown the pipe > >> 1641. */ > >> 1642. static void > >> 1643. pipeclose(cpipe) > >> 1644. struct pipe *cpipe; > >> 1645. { > >> 1646. struct pipepair *pp; > >> 1647. struct pipe *ppipe; > >> 1648. > >> 1649. KASSERT(cpipe != NULL, ("pipeclose: cpipe == NULL")); > >> 1650. > >> 1651. PIPE_LOCK(cpipe); > >> 1652. pipelock(cpipe, 0); > >> 1653. pp = cpipe->pipe_pair; > >> 1654. > >> 1655. pipeselwakeup(cpipe); > >> > >> pipeclose() gets called from pipe_dtor() > > > > Eh, so it seems like pipeclose() has been called twice on the same pipe which > > should never happen. > > How did you get to that conclusion? The pipe's state (cpipe->pipe_present) is set to 3 (PIPE_FINALIZED) which only happens at bottom of pipeclose(cpipe). However, in your panic, you are dying at the start of pipeclose() when cpipe's state should be the initial state still. That is also consistent with the panic as the knlist for the pipe has already been destroyed (which happens later in pipeclose()) and that is what zero's the lock fields in knlist causing the actual fault in knote(). > >> 384. void > >> 385. pipe_dtor(struct pipe *dpipe) > >> 386. { > >> 387. ino_t ino; > >> 388. > >> 389. ino = dpipe->pipe_ino; > >> 390. funsetown(&dpipe->pipe_sigio); > >> 391. pipeclose(dpipe); <--------------- > >> 392. if (dpipe->pipe_state & PIPE_NAMED) { > >> 393. dpipe = dpipe->pipe_peer; > >> 394. funsetown(&dpipe->pipe_sigio); > >> 395. pipeclose(dpipe); > >> 396. } > >> 397. if (ino != 0 && ino != (ino_t)-1) > >> 398. free_unr(pipeino_unr, ino); > >> 399. } > >> > >> (kgdb) f 11 > >> #11 0xffffffff80651019 in pipe_dtor (dpipe=) at > >> ../../../kern/sys_pipe.c:395 > >> 395 ../../../kern/sys_pipe.c: No such file or directory. > >> in ../../../kern/sys_pipe.c > >> (kgdb) p *dpipe > >> $8 = {pipe_buffer = {cnt = 2160683902, in = 4294967295, out = > >> 2165433456, size = 4294967295, buffer = 0xfffff803c220be70 ""}, > >> pipe_map = { > >> cnt = 18446735278181777408, pos = 18446735295352500696, npages = > >> -1841788584, ms = {0x0, 0x0, 0xfffff80125a6a798, 0x0, > >> 0xfffff803c2b7b2a0, > >> 0xfffff803c2b7b2c0, 0x0, 0xffffffff80c96b7e, 0x6fb0008, 0x0, > >> 0xfffff80060c08490, 0x1500000000, 0x60, 0xffffffff80cb7fa9, 0x1030000, > >> 0x0, 0x4}}, > >> pipe_sel = {si_tdlist = {tqh_first = 0xfffff801cb464068, tqh_last = > >> 0xfffff8020bfa9b10}, si_note = {kl_list = {slh_first = > >> 0xfffff8022bb72648}, > >> kl_lock = 0xffffffff80cb7fb9 , kl_unlock = > >> 0x5230000, kl_assert_locked = 0, kl_assert_unlocked = 0x1, > >> kl_lockarg = 0xffffffff81105f80}, si_mtx = 0x0}, pipe_atime = > >> {tv_sec = 0, tv_nsec = 0}, pipe_mtime = {tv_sec = -8788387676160, > >> tv_nsec = -8788387676160}, pipe_ctime = {tv_sec = 0, tv_nsec = > >> -8788387675872}, pipe_sigio = 0x0, pipe_peer = 0x0, pipe_pair = 0x0, > >> pipe_state = 3410379072, pipe_busy = -2047, pipe_present = 0, > >> pipe_wgen = 0, pipe_ino = 0} > >> (kgdb) > > > > This looks like complete garbage, but it should be matching 'cpipe' > > from earlier. Can you 'p dpipe' at this frame? > > > > (kgdb) l > 390 in ../../../kern/sys_pipe.c > (kgdb) p dpipe > $3 = > (kgdb) p *dpipe > Cannot access memory at address 0x100000012 > (kgdb) Eh, kgdb should be deterministic given a specific kernel and core file. That is to say, above 'f 11', 'p *dpipe' did something, but here it did not. Are you running kgdb on the same core or a different one? > >> which gets called from fifo_cleanup() > >> > >> 106. static void > >> 107. fifo_cleanup(struct vnode *vp) > >> 108. { > >> 109. struct fifoinfo *fip; > >> 110. > >> 111. ASSERT_VOP_ELOCKED(vp, "fifo_cleanup"); > >> 112. fip = vp->v_fifoinfo; > >> 113. if (fip->fi_readers == 0 && fip->fi_writers == 0) { > >> 114. vp->v_fifoinfo = NULL; > >> 115. pipe_dtor(fip->fi_pipe); <----------------------------- > >> 116. free(fip, M_VNODE); > >> 117. } > >> 118. } > >> > >> (kgdb) p *fip->fi_pipe > >> $7 = {pipe_buffer = {cnt = 7562869, in = 1701147238, out = 761557858, > >> size = 1970170230, > >> buffer = 0x736265657266006d
>> bounds>}, pipe_map = {cnt = 7596447167107706212, pos = > >> 7022344801864938862, > >> npages = 1852402688, ms = {0x78756e696c006d76, 0x696c00646961722d, > >> 0x706177732d78756e, 0x697361622d736d00, 0x6d00617461642d63, > >> 0x61642d6d646c2d73, > >> 0x646c2d736d006174, 0x61646174656d2d6d, 0x65722d736d006174, > >> 0x6e00646576726573, 0x63632d6473627465, 0x64736274656e0064, > >> 0x74656e006467632d, > >> 0x7366662d647362, 0x6c2d64736274656e, 0x736274656e007366, > >> 0x6e00646961722d64}}, pipe_sel = {si_tdlist = {tqh_first = > >> 0x77732d6473627465, > >> tqh_last = 0x7261776d76007061}, si_note = {kl_list = {slh_first > >> = 0x760073666d762d65}, kl_lock = 0x6d762d657261776d, kl_unlock = > >> 0x6d7600676169646b, > >> kl_assert_locked = 0x7365722d65726177, kl_assert_unlocked = > >> 0x6d76006465767265, kl_lockarg = 0x6173762d65726177}, si_mtx = > >> 0x4f4547007264686e}, > >> pipe_atime = {tv_sec = 2322261270122749773, tv_nsec = > >> 2675249699185255283}, pipe_mtime = {tv_sec = 8245071809811652723, > >> tv_nsec = 7450486642355495269}, > >> pipe_ctime = {tv_sec = 2406159642925495145, tv_nsec = > >> 3399705548884410378}, pipe_sigio = 0x2f6d6f65672f2e2e, pipe_peer = > >> 0x705f672f74726170, > >> pipe_pair = 0x5f6700632e747261, pipe_state = 1953653104, pipe_busy = > >> 1886547807, pipe_present = 7233896, pipe_wgen = 623407909, pipe_ino = > >> 620767603} > >> (kgdb) > > > > This should also be the same pipe, but it has an even different set > > of garbage. Can you 'p *fip'? > > kgdb gets confused I believe: > > (kgdb) p *fip > Cannot access memory at address 0xa00000001 Again, there's no way kgdb can figure out how to 'p *fip->fi_pipe' but not be able to 'p *fip'. > (kgdb) p *vp > $1 = {v_tag = 0xffffffff80c96b7e "ufs", v_op = 0xffffffff8111e470, > v_data = 0xfffff80334836498, v_mount = 0xfffff80021805000, > v_nmntvnodes = {tqe_next = 0xfffff803345aa760, tqe_prev = > 0xfffff803343ee5a8}, v_un = {vu_mount = 0x0, vu_socket = 0x0, > vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, > le_prev = 0xfffffe00012de3e8}, v_cache_src = {lh_first = 0x0}, > v_cache_dst = {tqh_first = 0xfffff80334f11d20, tqh_last = > 0xfffff80334f11d40}, v_cache_dd = 0x0, v_lock = {lock_object = { > lo_name = 0xffffffff80c96b7e "ufs", lo_flags = 117112840, > lo_data = 0, lo_witness = 0x0}, lk_lock = 18446735279493186704, > lk_exslpfail = 0, lk_timo = 21, lk_pri = 96}, v_interlock = > {lock_object = {lo_name = 0xffffffff80cb7fa9 "vnode interlock", > lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = > 4}, v_vnlock = 0xfffff803346417c8, v_actfreelist = { > tqe_next = 0xfffff802f3a0c000, tqe_prev = 0xfffff803ab6139f8}, > v_bufobj = {bo_lock = {lock_object = { > lo_name = 0xffffffff80cb7fb9 "bufobj interlock", lo_flags = > 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, > bo_ops = 0xffffffff81105f80, bo_object = 0x0, bo_synclist = > {le_next = 0x0, le_prev = 0x0}, bo_private = 0xfffff80334641760, > __bo_vnode = 0xfffff80334641760, bo_clean = {bv_hd = {tqh_first = > 0x0, tqh_last = 0xfffff80334641880}, bv_root = {pt_root = 0}, > bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = > 0xfffff803346418a0}, bv_root = {pt_root = 0}, bv_cnt = 0}, > bo_numoutput = 0, bo_flag = 0, bo_bsize = 16384}, v_pollinfo = > 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = { > tqh_first = 0x0, tqh_last = 0xfffff803346418e8}, rl_currdep = > 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, > v_holdcnt = 1, v_usecount = 1, v_iflag = 512, v_vflag = 0, > v_writecount = 0, v_hash = 14592100, v_type = VFIFO} v_fifoinfo is cleared before the pipe is destroyed, so v_fifoinfo is already NULL (as expected) > (kgdb) p *fip > $2 = {fi_pipe = 0xffffffff80c96b7e, fi_readers = -2129533840, > fi_writers = -8782327094120} The readers/writers counts here are garbage. I probably wouldn't trust the fi_pipe value either. Would be nice to know if the various pointers line up. Also, I would expect a FIFO to have the 'NAMED' flag set and thus for pipe_peer to be a different pipe (not itself), but I believe that was not true from the earlier posts. -- John Baldwin