From owner-freebsd-current@freebsd.org Mon Oct 24 12:03:51 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E8FCDC1FCB1 for ; Mon, 24 Oct 2016 12:03:51 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AE20D9D6 for ; Mon, 24 Oct 2016 12:03:51 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost.zedat.fu-berlin.de (Exim 4.85) with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (envelope-from ) id <1bydyh-002ww8-BW>; Mon, 24 Oct 2016 14:03:43 +0200 Received: from p578a69f9.dip0.t-ipconnect.de ([87.138.105.249] helo=freyja.zeit4.iv.bundesimmobilien.de) by inpost2.zedat.fu-berlin.de (Exim 4.85) with esmtpsa (TLSv1.2:AES256-GCM-SHA384:256) (envelope-from ) id <1bydyh-002YtS-0w>; Mon, 24 Oct 2016 14:03:43 +0200 Date: Mon, 24 Oct 2016 14:03:37 +0200 From: "O. Hartmann" To: YongHyeon PYUN Cc: FreeBSD CURRENT Subject: Re: CURRENT: re(4) crashing system Message-ID: <20161024140337.47af924e@freyja.zeit4.iv.bundesimmobilien.de> In-Reply-To: <20161024051359.GA1185@michelle.fasterthan.co.kr> References: <20161023132538.6bf55fb2@hermann> <20161024051359.GA1185@michelle.fasterthan.co.kr> Organization: FU Berlin X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.29; amd64-portbld-freebsd12.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Originating-IP: 87.138.105.249 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Oct 2016 12:03:52 -0000 On Mon, 24 Oct 2016 14:14:00 +0900 YongHyeon PYUN wrote: > On Sun, Oct 23, 2016 at 01:25:38PM +0200, Hartmann, O. wrote: > > I tried to report earlier here that CURRENT does have some serious > > problems right now and one of those problems seems to be triggered by > > the recent re(4) driver. The problem is also present in recen 11-STABLE! > > > > Below, you'll find pciconf-output reagrding the device on a Lenovo E540 > > Laptop I can test on and trigger the problem. > > > > The phenomenon is that this NIC does not negotiate 1000baseTX, it is > > always falling back to 100baseTX although the device claims to be a 1 > > GBit capable device. > > > > When I try to put the device manually into 1000basTX mode via > > > > ifconfig re0 media 1000baseTX mediaopt full-duplex (with re(4) driver) > > > > it is possible to crash the system. The system also crashes when > > plugging/unplugging the LAN cord - I guess the renegotiation is > > triggering this crash immediately. > > > > I tried with several switches and routers capable of 1 GBit and it > > seems to be independent from the network hardware in use. > > > > I tried to capture a backtrace when the kernel crashes, but I do not > > know how to save the the kernel debugger output. Although I configured > > according the handbook debugging, there is no coredump at all. > > > > Advice is appreciated - if anybody is interesetd in solving this. > > > > There were several instability reports on re(4). I vaguely guess > it would be related with some missing initializations for certain > controllers. Unfortunately, there is no publicly available > datasheet for those controllers and it's not likely to get access > to it in near future. It seems vendor's FreeBSD driver accesses > lots of magic registers as well as loading DSP fixups. I have no > idea what it wants to do and re(4) used to heavily rely on power-on > default register values. Engineering samples I have do not show > instabilities so it wouldn't be easy to identify the issue. > > Probably the first step to address the issue would be identifying > those chips and narrowing down the scope of guessing. Would you > show me the dmesg output(re(4) and regphy(4) only)? pciconf(8) > output is useless here since RealTek uses the same PCI id for > PCIe variants. > > BTW, I was told that the vendor's FreeBSD driver seems to work fine > for normal usage pattern. The vendor's driver triggered an instant > panic and lacked H/W offloading features in the past. It might > have changed though. The problemacy with re(4) drivers arose again, when I bought some "green" equipment, mainly switches, which reduces power emission on short cables or non-connected ports. This brought down some servers with re(4) chipsets immediately and I had no clue what happend. I do not know whether this is a single fate so to speak, or this problem will arise for others, too. We exchanged on serving hardware all Realtek NICs with those from Intel, and luckily some server mainboards already have Intel PHY or NICs. The Broadcom devices we have on some older Fujitus hardware is also stable like a charme, even with the new power saving switches. While we can swap on server or workstation platforms the NIC, it is almost impossible on laptops and the number of laptops with realtek chips seems to grow. It is a pity that the venodr of the chipsets reject supporting other OSes than Windows - or in some rare cases only Linux. After you wrote the answer, I checked on the net who's suiatble drivers and the situation seems bad for almost all OSes apart from commercial ones like Windooze and Apple OS X. As soon as I get hands on the laptop again, I'll send the requested informations. I know that I played around with re(4) and rgephy(4) in the kernel, the rgephy(4) showed up on the dmesg, but I didn't see any effect - except that it offered some additional "media xxx-options-xxx" mostly appended with "flow" - but rying brought also down the system as pluggin or unplugging. The last kernel I compiled was then without rgephy(4) - the NIC worked as expected, but pluggin/unplugging or having some power-down activities on a Netgear SoHo green-pwer switch brings the system down as usual.