From owner-freebsd-hackers@FreeBSD.ORG Mon Mar 16 18:56:08 2015 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A23DD38E for ; Mon, 16 Mar 2015 18:56:08 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 28B5D188 for ; Mon, 16 Mar 2015 18:56:07 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t2GIu2JJ098886 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 16 Mar 2015 20:56:02 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t2GIu2JJ098886 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t2GIu20q098884; Mon, 16 Mar 2015 20:56:02 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 16 Mar 2015 20:56:02 +0200 From: Konstantin Belousov To: Michael Fuckner Subject: Re: Server with 3TB Crashing at boot Message-ID: <20150316185602.GJ2379@kib.kiev.ua> References: <20150315193202.GS2379@kib.kiev.ua> <2138577776.537937.1426455964006.JavaMail.open-xchange@ptangptang.store> <20150316091758.GY2379@kib.kiev.ua> <5506ADA4.8020207@fuckner.net> <20150316103140.GA2379@kib.kiev.ua> <5506B23F.20400@fuckner.net> <20150316105301.GB2379@kib.kiev.ua> <5506E8D6.30703@fuckner.net> <20150316154022.GD2379@kib.kiev.ua> <55072195.40609@fuckner.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55072195.40609@fuckner.net> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: "freebsd-hackers@freebsd.org" , Ryan Stone , Steven Hartland X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Mar 2015 18:56:08 -0000 On Mon, Mar 16, 2015 at 07:31:49PM +0100, Michael Fuckner wrote: > first patch doesn't look good, looks like ahci explodes The patch I sent only changes ixgbe code. This is not ahci, it is nvme causing some troubles. The bug is genuine DMAR bug, driver specified allocation with size greater than boundary, and the code failed to split the request. > > ahcich0: AHCI reset... > ahcich0: SATA connect timeout time=10000us status=00000000 > ahcich0: AHCI reset: device not found > ahcich1: AHCI reset... > ahcich1: SATA connect time=1800us status=00000113 > ahcich1: AHCI reset: device found > ahcich1: AHCI reset: device ready after 0ms > ahcich2: AHCI reset... > ahcich2: SATA connect timeout time=10000us status=00000000 > ahcich2: AHCI reset: device not found > ahcich3: AHCI reset... > ahcich3: SATA connect timeout time=10000us status=00000000 > ahcich3: AHCI reset: device not found > ahcich4: panic: boundary failed: ctx 0xfffff801a4c2ca00 start 0x131000 > end 0x133000 boundary 0x1000 > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xffffffff81c5e6f0 > vpanic() at vpanic+0x189/frame 0xffffffff81c5e770 > kassert_panic() at kassert_panic+0x132/frame 0xffffffff81c5e7e0 > dmar_bus_dmamap_load_something() at > dmar_bus_dmamap_load_something+0x35e/frame 0xffffffff81c5e890 > dmar_bus_dmamap_load_buffer() at dmar_bus_dmamap_load_buffer+0x246/frame > 0xffffffff81c5e910 > bus_dmamap_load() at bus_dmamap_load+0x8d/frame 0xffffffff81c5e990 > _nvme_qpair_submit_request() at _nvme_qpair_submit_request+0x1ca/frame > 0xffffffff81c5e9e0 > nvme_qpair_submit_request() at nvme_qpair_submit_request+0x38/frame > 0xffffffff81c5ea10 > nvme_ctrlr_start() at nvme_ctrlr_start+0x7b/frame 0xffffffff81c5ea80 > nvme_ctrlr_start_config_hook() at nvme_ctrlr_start_config_hook+0xe/frame > 0xffffffff81c5eaa0 > run_interrupt_driven_config_hooks() at > run_interrupt_driven_config_hooks+0x7c/frame 0xffffffff81c5eac0 > boot_run_interrupt_driven_config_hooks() at > boot_run_interrupt_driven_config_hooks+0x20/frame 0xffffffff81c5eb50 > mi_startup() at mi_startup+0x118/frame 0xffffffff81c5eb70 > btext() at btext+0x2c > KDB: enter: panic > [ thread pid 0 tid 100000 ] > Stopped at kdb_enter+0x3e: movq $0,kdb_why > db> > > http://dedi3.fuckner.net/~molli123/temp/11-ixgbe.log > > I'll reboot now and check if I patched the file correctly. But this > takes 45min. > > At least I figured out how to remove the empty Lines (Ctrl-a, shift-A). > Don't run minicom inside a screen ;-) I do not quite understand why such programs as minicom are needed at all. Isn't tip (AKA cu) good enough ? Please add the following patch to the kernel. diff --git a/sys/x86/iommu/intel_gas.c b/sys/x86/iommu/intel_gas.c index 51ad151..aa59c1b 100644 --- a/sys/x86/iommu/intel_gas.c +++ b/sys/x86/iommu/intel_gas.c @@ -327,13 +327,15 @@ dmar_gas_match_one(struct dmar_gas_match_args *a, struct dmar_map_entry *prev, start = roundup2(bs, a->common->alignment); /* DMAR_PAGE_SIZE to create gap after new entry. */ if (start + a->size + DMAR_PAGE_SIZE <= prev->end + prev->free_after && - start + a->size <= end) { + start + a->size <= end && dmar_test_boundary(start, a->size, + a->common->boundary)) { a->entry->start = start; return (true); } /* - * Not enough space to align at boundary, but allowed to split. + * Not enough space to align at the requested boundary, or + * boundary is smaller than the size, but allowed to split. * We already checked that start + size does not overlap end. * * XXXKIB. It is possible that bs is exactly at the start of