From owner-freebsd-current@freebsd.org Sun Nov 18 00:35:59 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1DD0B110D5BF for ; Sun, 18 Nov 2018 00:35:59 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-io1-xd44.google.com (mail-io1-xd44.google.com [IPv6:2607:f8b0:4864:20::d44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 56D9D8979A for ; Sun, 18 Nov 2018 00:35:58 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-io1-xd44.google.com with SMTP id u19so15588963ioc.2 for ; Sat, 17 Nov 2018 16:35:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=q8baKP3wnTXN98j6uvq0rd/YPSoJ5l+TfVRzw7HVtt8=; b=EvMKNGKpunXtpWnYzOz83QMxS7bHKLyaj7jZGZ3z/YF0cFKeJN4dO4Qvo1kd2qrQXE eQ6jk1jx68ypd3f15/+VDfhkwUSihmmspWJEfVmaaP7z6qXpZX6QvWEn1jl6vzEgGzm4 b+RAELkXE3RBLTGbjrihON7+DrltGVPHLeHmjGraeXsu6NdbFkqObCc2q56ItAb76DoO zYr9lWwqoOQkyH4nd1oTD7eaGd42Dn2xDB3/NRMHxkej6cYvxs6p4Kw4GNr6hx9XuyHq cwKt5RjtvuMjV8+p1qHJ/QaIpShEE6jI/bzJiAWCNRlFTgD5sX5NbeMR/Dp7FRDOQgn/ 4YXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=q8baKP3wnTXN98j6uvq0rd/YPSoJ5l+TfVRzw7HVtt8=; b=hDbMkgxLi1QBgrfPXfXKEkWjD/JSqy7xrgBEwIe8leNHnaJy0rAvKI4gbRWiANhO+Z Jitri+n62dbJ9nTOuUpUm02JRHVxs65HE1uRCmhSNFwlvWDQJiY7nTzq6Bw3qEsiA9A1 CR1uOXEgtkbsm3VIkauZzLa2blwVwLKqVa2cg7/0ks3InQ31Hah3FzyY8mr8QCvXKV+E X1RaPu2kAfgSe8osqmYw5A6mw3TPL1zGvAT6Xg6cUSl6Nq+/yBoBFaqM/EtQaYx1eRA5 4bd4y9Jm2dtIEF5u0t+9YALrMb3q6CUMjV9+CUJ1CRacS/gFtJobUfTh73BMLq414oBx dG0g== X-Gm-Message-State: AA+aEWYnuluEqIjXNkOH/fJ2iG2qX+XBEAVQhOR6pZUxMj/9RLSL2kkK LtsNESVujjLZDrKzDHXmNZo8KRlnQYw= X-Google-Smtp-Source: AFSGD/Xm6YW+MW732ouZSWNw/niFzs8LeRpUWglBhwJGFXRrbIKCAXGkve58XSVGFVsG1BrHwVqNNA== X-Received: by 2002:a6b:5116:: with SMTP id f22mr5285398iob.28.1542501357615; Sat, 17 Nov 2018 16:35:57 -0800 (PST) Received: from raichu (toroon0560w-lp130-07-64-229-95-98.dsl.bell.ca. [64.229.95.98]) by smtp.gmail.com with ESMTPSA id o14-v6sm12762585ito.3.2018.11.17.16.35.56 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 17 Nov 2018 16:35:56 -0800 (PST) Sender: Mark Johnston Date: Sat, 17 Nov 2018 19:35:54 -0500 From: Mark Johnston To: Sylvain GALLIANO Cc: freebsd-current@freebsd.org Subject: Re: Panic on kern_event.c Message-ID: <20181118003554.GC2799@raichu> References: <20181107043503.GB30861@raichu> <20181115221019.GA2514@raichu> <20181116154210.GB17379@raichu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Queue-Id: 56D9D8979A X-Spamd-Result: default: False [-1.30 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; NEURAL_HAM_SHORT(-0.95)[-0.947,0]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[4.4.d.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; RCVD_TLS_LAST(0.00)[]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; MID_RHS_NOT_FQDN(0.50)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; IP_SCORE(-0.65)[ip: (1.23), ipnet: 2607:f8b0::/32(-2.61), asn: 15169(-1.76), country: US(-0.10)] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Nov 2018 00:35:59 -0000 On Fri, Nov 16, 2018 at 05:56:54PM +0100, Sylvain GALLIANO wrote: > Le ven. 16 nov. 2018 à 16:42, Mark Johnston a écrit : > > > On Fri, Nov 16, 2018 at 03:47:39PM +0100, Sylvain GALLIANO wrote: > > > Le jeu. 15 nov. 2018 à 23:10, Mark Johnston a écrit > > : > > > > > > > On Thu, Nov 08, 2018 at 05:05:03PM +0100, Sylvain GALLIANO wrote: > > > > > Hi, > > > > > > > > > > I replaced > > > > > << printf("XXX knote %p already in tailq status:%x kq_count:%d [%p > > %p] > > > > > > > > > > > %u\n",kn,kn->kn_status,kq->kq_count,kn->kn_tqe.tqe_next,kn->kn_tqe.tqe_prev,__LINE__); > > > > > by > > > > > >> panic("XXX knote %p already in tailq status:%x kq_count:%d [%p > > %p] > > > > > > > > > > > %u\n",kn,kn->kn_status,kq->kq_count,kn->kn_tqe.tqe_next,kn->kn_tqe.tqe_prev,__LINE__); > > > > > > > > > > Here is the stack during panic: > > > > > panic: XXX knote 0xfffff801e1c6ddc0 already in tailq status:1 > > kq_count:2 > > > > > [0 0xfffff8000957a978] 2671 > > > > > > > > > Could you please give the following patch a try? > > > > > > > > If possible, could you also ktrace one of the active syslog-ng > > processes > > > > for some time, perhaps 15 seconds, and share the kdump? I have been > > > > trying to reproduce the problem without any luck. > > > > > > > Unfortunately patched kernel is not stable: > > > - some processes run at 100% CPU (STOP state) and cannot be killed > > > - sometime the system completely freeze (need a hard reboot) Could you please give this updated a patch a try? diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c index d9c670e29d60..f989d612bac4 100644 --- a/sys/kern/kern_event.c +++ b/sys/kern/kern_event.c @@ -224,6 +224,7 @@ SYSCTL_UINT(_kern, OID_AUTO, kq_calloutmax, CTLFLAG_RW, #define KQ_LOCK(kq) do { \ mtx_lock(&(kq)->kq_lock); \ } while (0) +#define KQ_LOCKPTR(kq) (&(kq)->kq_lock) #define KQ_FLUX_WAKEUP(kq) do { \ if (((kq)->kq_state & KQ_FLUXWAIT) == KQ_FLUXWAIT) { \ (kq)->kq_state &= ~KQ_FLUXWAIT; \ @@ -687,8 +688,11 @@ filt_timerexpire(void *knx) struct kq_timer_cb_data *kc; kn = knx; + KQ_OWNED(kn->kn_kq); kn->kn_data++; - KNOTE_ACTIVATE(kn, 0); /* XXX - handle locking */ + + if (!kn_in_flux(kn) || (kn->kn_status & KN_SCAN) != 0) + KNOTE_ACTIVATE(kn, 1); if ((kn->kn_flags & EV_ONESHOT) != 0) return; @@ -753,7 +757,7 @@ filt_timerattach(struct knote *kn) kn->kn_flags |= EV_CLEAR; /* automatically set */ kn->kn_status &= ~KN_DETACHED; /* knlist_add clears it */ kn->kn_ptr.p_v = kc = malloc(sizeof(*kc), M_KQUEUE, M_WAITOK); - callout_init(&kc->c, 1); + callout_init_mtx(&kc->c, KQ_LOCKPTR(kn->kn_kq), 0); filt_timerstart(kn, to); return (0); @@ -772,8 +776,10 @@ filt_timerstart(struct knote *kn, sbintime_t to) kc->next = to + sbinuptime(); kc->to = to; } + KQ_LOCK(kn->kn_kq); callout_reset_sbt_on(&kc->c, kc->next, 0, filt_timerexpire, kn, PCPU_GET(cpuid), C_ABSOLUTE); + KQ_UNLOCK(kn->kn_kq); } static void @@ -826,7 +832,6 @@ filt_timertouch(struct knote *kn, struct kevent *kev, u_long type) KQ_LOCK(kq); if (kn->kn_status & KN_QUEUED) knote_dequeue(kn); - kn->kn_status &= ~KN_ACTIVE; kn->kn_data = 0; KQ_UNLOCK(kq); @@ -1843,12 +1848,13 @@ kqueue_scan(struct kqueue *kq, int maxevents, struct kevent_copyops *k_ops, } TAILQ_INSERT_TAIL(&kq->kq_head, marker, kn_tqe); + marker->kn_status |= KN_QUEUED; influx = 0; while (count) { KQ_OWNED(kq); kn = TAILQ_FIRST(&kq->kq_head); - if ((kn->kn_status == KN_MARKER && kn != marker) || + if (((kn->kn_status & KN_MARKER) != 0 && kn != marker) || kn_in_flux(kn)) { if (influx) { influx = 0; @@ -1861,24 +1867,21 @@ kqueue_scan(struct kqueue *kq, int maxevents, struct kevent_copyops *k_ops, } TAILQ_REMOVE(&kq->kq_head, kn, kn_tqe); - if ((kn->kn_status & KN_DISABLED) == KN_DISABLED) { - kn->kn_status &= ~KN_QUEUED; - kq->kq_count--; + kn->kn_status &= ~KN_QUEUED; + if ((kn->kn_status & KN_DISABLED) == KN_DISABLED) continue; - } if (kn == marker) { KQ_FLUX_WAKEUP(kq); if (count == maxevents) goto retry; goto done; } + kq->kq_count--; KASSERT(!kn_in_flux(kn), ("knote %p is unexpectedly in flux", kn)); if ((kn->kn_flags & EV_DROP) == EV_DROP) { - kn->kn_status &= ~KN_QUEUED; kn_enter_flux(kn); - kq->kq_count--; KQ_UNLOCK(kq); /* * We don't need to lock the list since we've @@ -1888,9 +1891,7 @@ kqueue_scan(struct kqueue *kq, int maxevents, struct kevent_copyops *k_ops, KQ_LOCK(kq); continue; } else if ((kn->kn_flags & EV_ONESHOT) == EV_ONESHOT) { - kn->kn_status &= ~KN_QUEUED; kn_enter_flux(kn); - kq->kq_count--; KQ_UNLOCK(kq); /* * We don't need to lock the list since we've @@ -1910,10 +1911,9 @@ kqueue_scan(struct kqueue *kq, int maxevents, struct kevent_copyops *k_ops, if (kn->kn_fop->f_event(kn, 0) == 0) { KQ_LOCK(kq); KQ_GLOBAL_UNLOCK(&kq_global, haskqglobal); - kn->kn_status &= ~(KN_QUEUED | KN_ACTIVE | - KN_SCAN); + /* XXXMJ races with EVFILT_TIMER activation. */ + kn->kn_status &= ~(KN_ACTIVE | KN_SCAN); kn_leave_flux(kn); - kq->kq_count--; kn_list_unlock(knl); influx = 1; continue; @@ -1937,11 +1937,10 @@ kqueue_scan(struct kqueue *kq, int maxevents, struct kevent_copyops *k_ops, } if (kn->kn_flags & EV_DISPATCH) kn->kn_status |= KN_DISABLED; - kn->kn_status &= ~(KN_QUEUED | KN_ACTIVE); - kq->kq_count--; - } else - TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe); - + kn->kn_status &= ~KN_ACTIVE; + } else if ((kn->kn_status & KN_QUEUED) == 0) + knote_enqueue(kn); + kn->kn_status &= ~KN_SCAN; kn_leave_flux(kn); kn_list_unlock(knl);