From owner-freebsd-current@freebsd.org Tue May 29 13:50:19 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2A1CEF7E142 for ; Tue, 29 May 2018 13:50:19 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-wm0-f47.google.com (mail-wm0-f47.google.com [74.125.82.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 81A906B4A5; Tue, 29 May 2018 13:50:18 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-wm0-f47.google.com with SMTP id l1-v6so40894175wmb.2; Tue, 29 May 2018 06:50:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:openpgp:autocrypt :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=n1ouPVbYLeQkY8oJoJJhTq2b3SxHfUAN4Ltjfla6KB4=; b=ro37/ZK+OYK8teWQRJS3qimbfFqZitMQ6TqBci3DMK5LBGTLRL2EhFXLUAvMbSmRUk 9mQn+TLkpBpfSn9fYzQuuzqRKtpPAS+tqrpRWSRvoyOgkaDJ4Stak91/GSD4/hX9sYbm hBXqkXp8Ovw3LJhdltPNlXtPoU2T2HHBJBNM4PwZl9wblfybO4ksEXnNpo5R0XRQgr+J +WFoVyK7VhPRc82SFa+n2Y5x+ilsO4lUoSPpUTwc6FwTrgkiL2Z2a+WwY89vBZofW94Z BwkuYM/4jsYIleRsn1RqXp/fCdXZG55HvcwbWSTltTfKaEcSAdO4u+jfYDq+Jm21yHbX eWmA== X-Gm-Message-State: ALKqPwf1x/krrg+3qYVLho3oWxVwsJDQxeVjV62t8YbHKiGy0Or8unBe 5FA/S9kWYmqtYiB649qmQWEeEc+I X-Google-Smtp-Source: AB8JxZrMCQw15e+sk8nPYl2uk9l/s3QkQc7D6ZveuklDltKIQ9LnHJJWUMXk3pYL5yJckeDqJjdF7Q== X-Received: by 2002:a2e:985a:: with SMTP id e26-v6mr11147437ljj.124.1527601816381; Tue, 29 May 2018 06:50:16 -0700 (PDT) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id q25-v6sm1663537lji.41.2018.05.29.06.50.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 May 2018 06:50:15 -0700 (PDT) Subject: Bad link elm in vm_object_terminate [Was: crash on process exit.. current at about r332467] To: freebsd-current References: <9479e941-39be-e6e2-869e-aac475c5e33a@freebsd.org> Cc: Julian Elischer , Bryan Drewery , Mark Johnston From: Andriy Gapon Openpgp: preference=signencrypt Autocrypt: addr=avg@FreeBSD.org; prefer-encrypt=mutual; keydata= xsFNBFm4LIgBEADNB/3lT7f15UKeQ52xCFQx/GqHkSxEdVyLFZTmY3KyNPQGBtyvVyBfprJ7 mAeXZWfhat6cKNRAGZcL5EmewdQuUfQfBdYmKjbw3a9GFDsDNuhDA2QwFt8BmkiVMRYyvI7l N0eVzszWCUgdc3qqM6qqcgBaqsVmJluwpvwp4ZBXmch5BgDDDb1MPO8AZ2QZfIQmplkj8Y6Z AiNMknkmgaekIINSJX8IzRzKD5WwMsin70psE8dpL/iBsA2cpJGzWMObVTtCxeDKlBCNqM1i gTXta1ukdUT7JgLEFZk9ceYQQMJJtUwzWu1UHfZn0Fs29HTqawfWPSZVbulbrnu5q55R4PlQ /xURkWQUTyDpqUvb4JK371zhepXiXDwrrpnyyZABm3SFLkk2bHlheeKU6Yql4pcmSVym1AS4 dV8y0oHAfdlSCF6tpOPf2+K9nW1CFA8b/tw4oJBTtfZ1kxXOMdyZU5fiG7xb1qDgpQKgHUX8 7Rd2T1UVLVeuhYlXNw2F+a2ucY+cMoqz3LtpksUiBppJhw099gEXehcN2JbUZ2TueJdt1FdS ztnZmsHUXLxrRBtGwqnFL7GSd6snpGIKuuL305iaOGODbb9c7ne1JqBbkw1wh8ci6vvwGlzx rexzimRaBzJxlkjNfMx8WpCvYebGMydNoeEtkWldtjTNVsUAtQARAQABzR5BbmRyaXkgR2Fw b24gPGF2Z0BGcmVlQlNELm9yZz7CwZQEEwEIAD4WIQS+LEO7ngQnXA4Bjr538m7TUc1yjwUC WbgsiAIbIwUJBaOagAULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRB38m7TUc1yj+JAEACV l9AK/nOWAt/9cufV2fRj0hdOqB1aCshtSrwHk/exXsDa4/FkmegxXQGY+3GWX3deIyesbVRL rYdtdK0dqJyT1SBqXK1h3/at9rxr9GQA6KWOxTjUFURsU7ok/6SIlm8uLRPNKO+yq0GDjgaO LzN+xykuBA0FlhQAXJnpZLcVfPJdWv7sSHGedL5ln8P8rxR+XnmsA5TUaaPcbhTB+mG+iKFj GghASDSfGqLWFPBlX/fpXikBDZ1gvOr8nyMY9nXhgfXpq3B6QCRYKPy58ChrZ5weeJZ29b7/ QdEO8NFNWHjSD9meiLdWQaqo9Y7uUxN3wySc/YUZxtS0bhAd8zJdNPsJYG8sXgKjeBQMVGuT eCAJFEYJqbwWvIXMfVWop4+O4xB+z2YE3jAbG/9tB/GSnQdVSj3G8MS80iLS58frnt+RSEw/ psahrfh0dh6SFHttE049xYiC+cM8J27Aaf0i9RflyITq57NuJm+AHJoU9SQUkIF0nc6lfA+o JRiyRlHZHKoRQkIg4aiKaZSWjQYRl5Txl0IZUP1dSWMX4s3XTMurC/pnja45dge/4ESOtJ9R 8XuIWg45Oq6MeIWdjKddGhRj3OohsltKgkEU3eLKYtB6qRTQypHHUawCXz88uYt5e3w4V16H lCpSTZV/EVHnNe45FVBlvK7k7HFfDDkryM7BTQRZuCyIARAAlq0slcsVboY/+IUJdcbEiJRW be9HKVz4SUchq0z9MZPX/0dcnvz/gkyYA+OuM78dNS7Mbby5dTvOqfpLJfCuhaNYOhlE0wY+ 1T6Tf1f4c/uA3U/YiadukQ3+6TJuYGAdRZD5EqYFIkreARTVWg87N9g0fT9BEqLw9lJtEGDY EWUE7L++B8o4uu3LQFEYxcrb4K/WKmgtmFcm77s0IKDrfcX4doV92QTIpLiRxcOmCC/OCYuO jB1oaaqXQzZrCutXRK0L5XN1Y1PYjIrEzHMIXmCDlLYnpFkK+itlXwlE2ZQxkfMruCWdQXye syl2fynAe8hvp7Mms9qU2r2K9EcJiR5N1t1C2/kTKNUhcRv7Yd/vwusK7BqJbhlng5ZgRx0m WxdntU/JLEntz3QBsBsWM9Y9wf2V4tLv6/DuDBta781RsCB/UrU2zNuOEkSixlUiHxw1dccI 6CVlaWkkJBxmHX22GdDFrcjvwMNIbbyfQLuBq6IOh8nvu9vuItup7qemDG3Ms6TVwA7BD3j+ 3fGprtyW8Fd/RR2bW2+LWkMrqHffAr6Y6V3h5kd2G9Q8ZWpEJk+LG6Mk3fhZhmCnHhDu6CwN MeUvxXDVO+fqc3JjFm5OxhmfVeJKrbCEUJyM8ESWLoNHLqjywdZga4Q7P12g8DUQ1mRxYg/L HgZY3zfKOqcAEQEAAcLBfAQYAQgAJhYhBL4sQ7ueBCdcDgGOvnfybtNRzXKPBQJZuCyIAhsM BQkFo5qAAAoJEHfybtNRzXKPBVwQAKfFy9P7N3OsLDMB56A4Kf+ZT+d5cIx0Yiaf4n6w7m3i ImHHHk9FIetI4Xe54a2IXh4Bq5UkAGY0667eIs+Z1Ea6I2i27Sdo7DxGwq09Qnm/Y65ADvXs 3aBvokCcm7FsM1wky395m8xUos1681oV5oxgqeRI8/76qy0hD9WR65UW+HQgZRIcIjSel9vR XDaD2HLGPTTGr7u4v00UeTMs6qvPsa2PJagogrKY8RXdFtXvweQFz78NbXhluwix2Tb9ETPk LIpDrtzV73CaE2aqBG/KrboXT2C67BgFtnk7T7Y7iKq4/XvEdDWscz2wws91BOXuMMd4c/c4 OmGW9m3RBLufFrOag1q5yUS9QbFfyqL6dftJP3Zq/xe+mr7sbWbhPVCQFrH3r26mpmy841ym dwQnNcsbIGiBASBSKksOvIDYKa2Wy8htPmWFTEOPRpFXdGQ27awcjjnB42nngyCK5ukZDHi6 w0qK5DNQQCkiweevCIC6wc3p67jl1EMFY5+z+zdTPb3h7LeVnGqW0qBQl99vVFgzLxchKcl0 R/paSFgwqXCZhAKMuUHncJuynDOP7z5LirUeFI8qsBAJi1rXpQoLJTVcW72swZ42IdPiboqx NbTMiNOiE36GqMcTPfKylCbF45JNX4nF9ElM0E+Y8gi4cizJYBRr2FBJgay0b9Cp Message-ID: <9bf4b2b0-65a2-90ef-c8c0-3022e80bc149@FreeBSD.org> Date: Tue, 29 May 2018 16:50:14 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <9479e941-39be-e6e2-869e-aac475c5e33a@freebsd.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 May 2018 13:50:19 -0000 On 23/04/2018 17:50, Julian Elischer wrote: > back trace at:  http://www.freebsd.org/~julian/bob-crash.png > > If anyone wants to take a look.. > > In the exit syscall, while deallocating a vm object. > > I haven't see references to a similar crash in the last 10 days or so.. But if > it rings any bells... We have just got another one: panic: Bad link elm 0xfffff80cc3938360 prev->next != elm Matching disassembled code to C code, it seems that the crash is somewhere in vm_object_terminate_pages (inlined into vm_object_terminate), probably in one of TAILQ_REMOVE-s there: if (p->queue != PQ_NONE) { KASSERT(p->queue < PQ_COUNT, ("vm_object_terminate: " "page %p is not queued", p)); pq1 = vm_page_pagequeue(p); if (pq != pq1) { if (pq != NULL) { vm_pagequeue_cnt_add(pq, dequeued); vm_pagequeue_unlock(pq); } pq = pq1; vm_pagequeue_lock(pq); dequeued = 0; } p->queue = PQ_NONE; TAILQ_REMOVE(&pq->pq_pl, p, plinks.q); dequeued--; } if (vm_page_free_prep(p, true)) continue; unlist: TAILQ_REMOVE(&object->memq, p, listq); } Please note that this is the code before r332974 Improve VM page queue scalability. I am not sure if r332974 + r333256 would fix the problem or if it just would get moved to a different place. Does this ring a bell to anyone who tinkered with that part of the VM code recently? Looking a little bit further, I think that object->memq somehow got corrupted. memq contains just two elements and the reported element is not there. (kgdb) p *(struct vm_page *)0xfffff80cc3938360 $22 = { plinks = { q = { tqe_next = 0xfffff80cd7175398, tqe_prev = 0xfffff80cb9f69170 }, s = { ss = { sle_next = 0xfffff80cd7175398 }, pv = 0xfffff80cb9f69170 }, memguard = { p = 18446735332764767128, v = 18446735332276081008 } }, listq = { tqe_next = 0xfffff80cc3938568, <============= tqe_prev = 0xfffff8078c11b848 <============= }, object = 0x0, pindex = 1548, phys_addr = 14695911424, md = { pv_list = { tqh_first = 0x0, tqh_last = 0xfffff80cc3938398 }, pv_gen = 1205766, pat_mode = 6 }, wire_count = 0, busy_lock = 1, hold_count = 0, flags = 0, aflags = 0 '\000', oflags = 0 '\000', queue = 255 '\377', psind = 0 '\000', segind = 5 '\005', order = 13 '\r', pool = 0 '\000', act_count = 5 '\005', valid = 0 '\000', dirty = 0 '\000' } (kgdb) p object->memq $11 = { tqh_first = 0xfffff80cb861cfb8, tqh_last = 0xfffff80cc3938780 } (kgdb) p *object->memq.tqh_first $25 = { plinks = { q = { tqe_next = 0xfffff80cb9f69108, tqe_prev = 0xfffff80cd7175398 }, s = { ss = { sle_next = 0xfffff80cb9f69108 }, pv = 0xfffff80cd7175398 }, memguard = { p = 18446735332276080904, v = 18446735332764767128 } }, listq = { tqe_next = 0xfffff80cb56eafb0, <============= tqe_prev = 0xfffff8078c11b848 <============= }, object = 0xfffff8078c11b800, pindex = 515, phys_addr = 7299219456, md = { pv_list = { tqh_first = 0xfffff80b99e4ff88, tqh_last = 0xfffff80b99e4ff90 }, pv_gen = 466177, pat_mode = 6 }, wire_count = 0, busy_lock = 2, hold_count = 0, flags = 0, aflags = 0 '\000', oflags = 0 '\000', queue = 255 '\377', psind = 0 '\000', segind = 5 '\005', order = 13 '\r', pool = 0 '\000', act_count = 5 '\005', valid = 255 '\377', dirty = 0 '\000' } (kgdb) p *object->memq.tqh_first->listq.tqe_next $26 = { plinks = { q = { tqe_next = 0x0, tqe_prev = 0xfffff80cc92e1d18 }, s = { ss = { sle_next = 0x0 }, pv = 0xfffff80cc92e1d18 }, memguard = { p = 0, v = 18446735332531379480 } }, listq = { tqe_next = 0x0, <============= tqe_prev = 0xfffff80cb861cfc8 <============= }, object = 0xfffff8078c11b800, pindex = 1548, phys_addr = 5350158336, md = { pv_list = { tqh_first = 0xfffff80a07222808, tqh_last = 0xfffff80a07222810 }, pv_gen = 7085, pat_mode = 6 }, wire_count = 0, busy_lock = 1, hold_count = 0, flags = 0, aflags = 1 '\001', oflags = 0 '\000', queue = 1 '\001', psind = 0 '\000', segind = 5 '\005', order = 13 '\r', pool = 0 '\000', act_count = 5 '\005', valid = 255 '\377', dirty = 255 '\377' } Pages 0xfffff80cc3938360 (the reported one) and 0xfffff80cb56eafb0 (the last one on memq) have the same index 1548. Also, memq.tqh_last points to the reported page, but it is not reachable via tqe_next pointers. It's also potentially interesting is that the reported page looks like it's already freed and the replacement page is both valid and dirty. The object, just in case: (kgdb) p *object $34 = { lock = { lock_object = { lo_name = 0xffffffff81202c27 "vm object", lo_flags = 627245056, lo_data = 0, lo_witness = 0xfffff80cffd6a700 }, rw_lock = 18446735286009226592 }, object_list = { tqe_next = 0xfffff80b2481e200, tqe_prev = 0xfffff80b2481e020 }, shadow_head = { lh_first = 0x0 }, shadow_list = { le_next = 0xfffff809c070f900, le_prev = 0xfffff80869c06c30 }, memq = { tqh_first = 0xfffff80cb861cfb8, tqh_last = 0xfffff80cc3938780 }, rtree = { rt_root = 18446735279843613792 }, size = 1561, domain = { dr_policy = 0x0, dr_iterator = 0 }, generation = 1, ref_count = 0, shadow_count = 0, memattr = 6 '\006', type = 0 '\000', flags = 12296, pg_color = 1809, paging_in_progress = 0, resident_page_count = 5, backing_object = 0x0, backing_object_offset = 0, pager_object_list = { tqe_next = 0x0, tqe_prev = 0x0 }, rvq = { lh_first = 0xfffff80cad278b60 }, handle = 0x0, un_pager = { vnp = { vnp_size = 19444, writemappings = 0 }, devp = { devp_pglist = { tqh_first = 0x4bf4, tqh_last = 0x0 }, ops = 0x0, dev = 0x0 }, sgp = { sgp_pglist = { tqh_first = 0x4bf4, tqh_last = 0x0 } }, swp = { swp_tmpfs = 0x4bf4, swp_blks = { pt_root = 0 } } }, cred = 0xfffff806811adc00, charge = 6393856, umtx_data = 0x0 } Interesting that it is on a shadow list. -- Andriy Gapon