From owner-freebsd-net@FreeBSD.ORG Mon Oct 17 14:28:22 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C5FF106566B for ; Mon, 17 Oct 2011 14:28:22 +0000 (UTC) (envelope-from fodillemlinkarim@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id 02D7B8FC0A for ; Mon, 17 Oct 2011 14:28:21 +0000 (UTC) Received: by qyg14 with SMTP id 14so2255460qyg.13 for ; Mon, 17 Oct 2011 07:28:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=BGHnYtcwTt8SthrnGfez7eFdyfofEK0OwdQ2QoK3VRs=; b=abkbaudcLAv9AUD6Th8s48m+mz+OE3Ik+zX10x44+8x4ghxAMVtQHzhqEM/MaonBM4 vaVp/YRQInDvyUu+YGN2kumr3NN6ZSSq5UTOXHW7+9FjeuTyR7jK3H5ULfdtl1bByBTv yy//ok8KziOoqTyNmVU1yYxIiB5p3UweW3MyU= Received: by 10.229.227.84 with SMTP id iz20mr389934qcb.164.1318861701093; Mon, 17 Oct 2011 07:28:21 -0700 (PDT) Received: from [192.168.1.71] ([208.85.112.101]) by mx.google.com with ESMTPS id eg7sm12314500qab.2.2011.10.17.07.28.18 (version=SSLv3 cipher=OTHER); Mon, 17 Oct 2011 07:28:19 -0700 (PDT) Message-ID: <4E9C3B7F.8090700@gmail.com> Date: Mon, 17 Oct 2011 10:28:15 -0400 From: Karim User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110922 Lightning/1.0b2 Thunderbird/3.1.15 MIME-Version: 1.0 To: pyunyh@gmail.com References: <4E94637A.5090607@gmail.com> <20111011171029.GA5661@michelle.cdnetworks.com> <4E959F06.6040906@gmail.com> <20111012170347.GA9138@michelle.cdnetworks.com> <4E95DDEB.1090500@gmail.com> <20111012192730.GB9138@michelle.cdnetworks.com> In-Reply-To: <20111012192730.GB9138@michelle.cdnetworks.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: if_msk.c link negotiation / packet drops [solved!] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Oct 2011 14:28:22 -0000 Hi, On 11-10-12 03:27 PM, YongHyeon PYUN wrote: > On Wed, Oct 12, 2011 at 02:35:23PM -0400, Karim wrote: >> Hi, >> On 11-10-12 01:03 PM, YongHyeon PYUN wrote: >>> On Wed, Oct 12, 2011 at 10:07:02AM -0400, Karim wrote: > [...] > >>> Hmm, that indicates driver lost established link. msk(4) will >>> detect this condition and stop RX/TX MACs until it knows PHY >>> re-established a link. This may be the reason why you see occasional >>> packet drops. However I don't know why PHY loses established link >>> in the middle of working. >>> >> Yes, I am convinced this lost of link is related to the packet drops as >> well. At this point we can safely discard cabling issues or router >> issues (physical ones that is) since the same happens on a different >> network with different cables. >>>> From the code in e1000phy_status: >>>> >>>> static void >>>> e1000phy_status(struct mii_softc *sc) >>>> { >>>> struct mii_data *mii = sc->mii_pdata; >>>> int bmcr, bmsr, ssr; >>>> >>>> mii->mii_media_status = IFM_AVALID; >>>> mii->mii_media_active = IFM_ETHER; >>>> >>>> bmsr = PHY_READ(sc, E1000_SR) | PHY_READ(sc, E1000_SR); >>>> bmcr = PHY_READ(sc, E1000_CR); >>>> ssr = PHY_READ(sc, E1000_SSR); >>>> >>>> if (bmsr& E1000_SR_LINK_STATUS) >>>> mii->mii_media_status |= IFM_ACTIVE; >>>> >>>> >>>> I can see the bmsr& E1000_SR_LINK_STATUS check failing when the problem >>>> occurs. As a side note why are we ORing the same call twice isn't the >>>> same thing as calling it once: >>>> >>>> bmsr = PHY_READ(sc, E1000_SR) | PHY_READ(sc, E1000_SR); >>>> >>> The E1000_SR_LINK_STATUS bit is latched low so it should be read >>> twice. If you want to read once use E1000_SSR_LINK bit of >>> E1000_SSR register but I remember that bit was not reliable on some >>> PHY models. >> Thanks for the explanation and the alternative. The ssr register seems >> to give me the right bit (E1000_SSR_LINK) but it also gives me an extra >> bit 0x0100 that is not defined in e1000phyreg.h. Any idea what that bit >> would be/means? >> > I guess it's related with advanced power saving. It would indicate > current Energy detect status in PHY POV. > Generally Marvell's PHY will enter into automatic power saving mode > when it does not see any energy signal on the link. I don't know > exact time when it enters into that mode but it would take less > than 10 seconds if PHY do not see energy signal from link partner > once it initiated auto-negotiation. > However, e1000phy(4) always disables energy detect feature in > e1000phy_reset() so it wouldn't affect your issue, I guess. > > One interesting thing is that 0x100 of E1000_SSR register indicates > energy detect status is in "Sleep mode" which means it didn't > detect energy signal(i.e. lost link). I'm not sure whether this bit > report correct status when energy detect feature is disabled > though. > > Can you check whether your switch supports energy detect feature? > Or if your switch support EEE feature, try disabling it. > >>> By chance, does your back-ported driver include r222219? >>> If yes, did you cold boot after applying the change? >>> Warm boot does have effect. >> I do have this patch in the back-ported driver and due to several >> reasons I didn't cold boot the appliance. We will give that a try and see. >> > Ok, let me know whether that makes any difference or not. > >> To be more precises I have included msk patches up to r222516. >> >> Thanks! > [...] > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" After a weekend of test I can confirm the problem is gone with the back ported msk driver from FreeBSD 9 and a little bit of patching. Apart from the packet drops I also had various report from my snmp trap daemon. It was reporting the interface was going inactive and for a while I though the packet drop and inactivity reports were linked. It turned out there was a small race condition between the various polling components in msk_mediastatus() that was confusing the snmp daemon while the packet drops got solved by the back port. The race can be easily solved with the following patch: @@ -995,9 +996,11 @@ msk_mediastatus(struct ifnet *ifp, struct ifmediareq *ifmr) mii = device_get_softc(sc_if->msk_miibus); mii_pollstat(mii); - MSK_IF_UNLOCK(sc_if); + ifmr->ifm_active = mii->mii_media_active; ifmr->ifm_status = mii->mii_media_status; + + MSK_IF_UNLOCK(sc_if); } Without moving down the msk lock its possible for one thread to see its mii_media_status reset to IFM_AVALID in e1000phy_status() right before the assignment to ifmr->ifm_status. This resulted in false reports about interface inactivity in rare occasions between a kernel based probe and the snmp trap daemon. Thanks to everyone that chipped in to help, Karim.