Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Mar 2013 14:19:50 -0400
From:      "Mikhail T." <mi+thun@aldan.algebra.com>
To:        stable@FreeBSD.org
Cc:        =?UTF-8?B?0JHQvtGA0LjRgSDQn9C+0L/QvtCy?= <bp@FreeBSD.ORG>
Subject:   smbfus: panic on the second attempt to reach unavailable server
Message-ID:  <5155DB46.3030601@aldan.algebra.com>

next in thread | raw e-mail | index | archive | help
Hello!

I have my FreeBSD-server dump nightly backups onto an entertainment device 
running embedded Linux.

The device has no NFS-server, but does run Samba (3.0.30). It allows access to 
its internal hard-drive, which my server mounts as:

    //dune/hdd750_..._32 /dune smbfs rw,noauto,-N,-Ekoi8-u:utf-8

There are two nightly cronjob using dump(8), xz(1), and ccrypt(1) to dump two 
"important" filesystems (/var/spool/imap and /home). The imap one kicks off at 
3:11am and the home -- at 3:31am.

This normally works perfectly fine every night, except when somebody 
accidentally sits on top of the remote-control of the entertainment device in 
the living room -- or somehow else managed to turn the box off. When this 
happens, the first dump simply fails, as one would expect:

    cannot create /dune/backups/narawntapu.imap.1.Tuesday.dump.xz.cpt: No such file or directory
       DUMP: Date of this level 1 dump: Tue Mar 12 03:11:07 2013
       DUMP: Date of last level 0 dump: Wed Mar  6 01:31:07 2013
       DUMP: Dumping snapshot of /dev/da0a (/var/spool/imap) to standard output
       DUMP: mapping (Pass I) [regular files]
       DUMP: Cache 16 MB, blocksize = 65536
       DUMP: mapping (Pass II) [directories]
       DUMP: estimated 169895 tape blocks.
       DUMP: dumping (Pass III) [directories]
       DUMP: Broken pipe
       DUMP: The ENTIRE dump is aborted.

However, when the second job tries to do the same twenty minutes later, the 
machine panics. This morning I was able to get a kernel coredump:

    ...
    #6  0xffffffff80750f2f in calltrap () at
    /cache/src/sys/amd64/amd64/exception.S:228
    No locals.
    #7  0xffffffff805a46ca in turnstile_broadcast (ts=0x0, queue=0) at
    /cache/src/sys/kern/subr_turnstile.c:838
             _tid = <value optimized out>
             ts1 = <value optimized out>
             td = <value optimized out>
    #8  0xffffffff80550e52 in _mtx_unlock_sleep (m=0xfffffe0105ecd8f0,
    opts=<value optimized out>, file=<value optimized out>, line=<value
    optimized out>) at /cache/src/sys/kern/kern_mutex.c:715
             ts = (struct turnstile *) 0x0
    #9  0xffffffff8101a0cd in smb_iod_invrq (iod=<value optimized out>) at
    /cache/src/sys/modules/smbfs/../../netsmb/smb_iod.c:91
             rqp = (struct smb_rq *) 0xfffffe0105ecd800
    #10 0xffffffff8101b172 in smb_iod_addrq (rqp=0xfffffe0105ecd800) at
    /cache/src/sys/modules/smbfs/../../netsmb/smb_iod.c:418
             vcp = <value optimized out>
             iod = (struct smbiod *) 0xfffffe009483b800
             error = <value optimized out>
             __func__ = "uЪ", '\220' <repeats 12 times>
    #11 0xffffffff81017da2 in smb_rq_simple (rqp=0xfffffe0105ecd800) at
    /cache/src/sys/modules/smbfs/../../netsmb/smb_rq.c:168
             vcp = (struct smb_vc *) 0xfffffe011f957000
             error = <value optimized out>
             i = 0
    #12 0xffffffff81016202 in smb_smb_treeconnect (ssp=0xfffffe015f069200,
    scred=0xfffffe009483b868) at
    /cache/src/sys/modules/smbfs/../../netsmb/smb_smb.c:574
             vcp = (struct smb_vc *) 0xfffffe011f957000
             rq = {sr_state = 1720810032, sr_vc = 0xfffffe0002a8c490, sr_share =
    0xffffff8366917a90, sr_mid = 40352, sr_seqno = 4294967295, sr_rseqno =
    1720810112, sr_rq = {mb_top = 0xffffffff80574fea, mb_cur = 0x100000001,
    mb_mleft = 1458488464, mb_count = -512, mb_copy = 0xffffff8366917a80,
    mb_udata = 0xffffffff80755149}, sr_rqflags = 0 '\0', sr_rqflags2 = 0,
    sr_wcount = 0x0, sr_bcount = 0xffffff8366917ac0, sr_rp = {md_top =
    0xffffffff8057546d, md_cur = 0x0, md_pos = 0xfffffe0056eec490
    "\2005л\200ЪЪЪЪ"}, sr_rpgen = -1803307004, sr_rplast = -512, sr_flags =
    1458488464, sr_rpsize = -512, sr_cred = 0xfffffe009483b804, sr_timo =
    1458488464, sr_rexmit = -512, sr_sendcnt = 1720810208, sr_timesent = {tv_sec
    = 582, tv_nsec = -2196531595260}, sr_lerror = 0, sr_rqsig =
    0xffffff8366917b10
    "\200{\221f\203ЪЪЪ\206╚V\200ЪЪЪЪ\200{\221f\203ЪЪЪ\035є\001\201п\a", sr_rqtid
    = 0xffffffff805a0e97, sr_rquid = 0xffffff8366917b10, sr_errclass = 1 '\001',
    sr_serror = 0, sr_error = 0, sr_rpflags = 208 'п', sr_rpflags2 = 0, sr_rptid
    = 0, sr_rppid = 0, sr_rpuid = 0, sr_rpmid = 0, sr_slock = {lock_object =
    {lo_name = 0xffffff8366917b80
    "Ю{\221f\203ЪЪЪ\032ґ\001\201ЪЪЪЪП{\221f\203ЪЪЪ\230╦\203\224", lo_flags =
    2153163654, lo_data = 4294967295, lo_witness = 0xffffff8366917b80}, mtx_lock
    = 8592098960413}, sr_t2 = 0xffffffff8102517c, sr_link = {tqe_next =
    0x9483b820, tqe_prev = 0x0}}
             rqp = (struct smb_rq *) 0xfffffe0105ecd800
             mbp = (struct mbchain *) 0xfffffe0105ecd828
             pp = <value optimized out>
             pbuf = 0x0
             encpass = 0x0
             error = <value optimized out>
             plen = 1
             upper = 0
    #13 0xffffffff8101ad1a in smb_iod_thread (arg=<value optimized out>) at
    /cache/src/sys/modules/smbfs/../../netsmb/smb_iod.c:206
             iod = (struct smbiod *) 0xfffffe009483b800
    #14 0xffffffff805365df in fork_exit (callout=0xffffffff8101aa83
    <smb_iod_thread>, arg=0xfffffe009483b800, frame=0xffffff8366917c40) at
    /cache/src/sys/kern/kern_fork.c:992
             p = (struct proc *) 0xfffffe0181104000
             td = (struct thread *) 0xfffffe0056eec490
    #15 0xffffffff8075145e in fork_trampoline () at
    /cache/src/sys/amd64/amd64/exception.S:602

Looking inside the smb_iod_invrq (smb_iod.c:91), I'm wondering, if an attempt is 
made to invalidate/release something twice (causing the turnstile_broadcast() to 
be invoked with ts being NULL the second time)? That would explain, why the 
first attempt to use the absent server errors-out as normal, and only the second 
attempt panics.

My kernel is 9.1-PRERELEASE as of Dec 19. Any ideas? Thanks! Yours,

    -mi




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5155DB46.3030601>