From owner-freebsd-net@FreeBSD.ORG Mon Oct 17 17:51:40 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEA87106566B for ; Mon, 17 Oct 2011 17:51:40 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 526DD8FC16 for ; Mon, 17 Oct 2011 17:51:39 +0000 (UTC) Received: by wyi40 with SMTP id 40so2557544wyi.13 for ; Mon, 17 Oct 2011 10:51:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=QSOfqkm0jXhAlfDJzCSZBCmoLtmtHuhbzugDyhKLirE=; b=Y9QypJMip0rZSh+pjHpCW86bEP/TBsx8WB4BkVBhBqIdcABuNHVVbk1dLp0H9YfdYj ayx4GKcoh5DjsJLBhXvGSyi+XXFseToqV043ZMDCjy/Y2aZZHNHm9uF2xf3i9/xcBjAA mLk7EtTC2YZ/kQ5HgYBYUTrjz4J0uAiMio748= Received: by 10.216.160.83 with SMTP id t61mr120221wek.77.1318873898995; Mon, 17 Oct 2011 10:51:38 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id q30sm16628576wbn.17.2011.10.17.10.51.35 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 17 Oct 2011 10:51:37 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Mon, 17 Oct 2011 10:49:55 -0700 From: YongHyeon PYUN Date: Mon, 17 Oct 2011 10:49:55 -0700 To: Karim Message-ID: <20111017174955.GA3509@michelle.cdnetworks.com> References: <4E94637A.5090607@gmail.com> <20111011171029.GA5661@michelle.cdnetworks.com> <4E959F06.6040906@gmail.com> <20111012170347.GA9138@michelle.cdnetworks.com> <4E95DDEB.1090500@gmail.com> <20111012192730.GB9138@michelle.cdnetworks.com> <4E9C3B7F.8090700@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E9C3B7F.8090700@gmail.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org Subject: Re: if_msk.c link negotiation / packet drops [solved!] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Oct 2011 17:51:40 -0000 On Mon, Oct 17, 2011 at 10:28:15AM -0400, Karim wrote: > Hi, > > On 11-10-12 03:27 PM, YongHyeon PYUN wrote: > >On Wed, Oct 12, 2011 at 02:35:23PM -0400, Karim wrote: > >>Hi, > >>On 11-10-12 01:03 PM, YongHyeon PYUN wrote: > >>>On Wed, Oct 12, 2011 at 10:07:02AM -0400, Karim wrote: > >[...] > > > >>>Hmm, that indicates driver lost established link. msk(4) will > >>>detect this condition and stop RX/TX MACs until it knows PHY > >>>re-established a link. This may be the reason why you see occasional > >>>packet drops. However I don't know why PHY loses established link > >>>in the middle of working. > >>> > >>Yes, I am convinced this lost of link is related to the packet drops as > >>well. At this point we can safely discard cabling issues or router > >>issues (physical ones that is) since the same happens on a different > >>network with different cables. > >>>> From the code in e1000phy_status: > >>>> > >>>>static void > >>>>e1000phy_status(struct mii_softc *sc) > >>>>{ > >>>> struct mii_data *mii = sc->mii_pdata; > >>>> int bmcr, bmsr, ssr; > >>>> > >>>> mii->mii_media_status = IFM_AVALID; > >>>> mii->mii_media_active = IFM_ETHER; > >>>> > >>>> bmsr = PHY_READ(sc, E1000_SR) | PHY_READ(sc, E1000_SR); > >>>> bmcr = PHY_READ(sc, E1000_CR); > >>>> ssr = PHY_READ(sc, E1000_SSR); > >>>> > >>>> if (bmsr& E1000_SR_LINK_STATUS) > >>>> mii->mii_media_status |= IFM_ACTIVE; > >>>> > >>>> > >>>>I can see the bmsr& E1000_SR_LINK_STATUS check failing when the > >>>>problem > >>>>occurs. As a side note why are we ORing the same call twice isn't the > >>>>same thing as calling it once: > >>>> > >>>>bmsr = PHY_READ(sc, E1000_SR) | PHY_READ(sc, E1000_SR); > >>>> > >>>The E1000_SR_LINK_STATUS bit is latched low so it should be read > >>>twice. If you want to read once use E1000_SSR_LINK bit of > >>>E1000_SSR register but I remember that bit was not reliable on some > >>>PHY models. > >>Thanks for the explanation and the alternative. The ssr register seems > >>to give me the right bit (E1000_SSR_LINK) but it also gives me an extra > >>bit 0x0100 that is not defined in e1000phyreg.h. Any idea what that bit > >>would be/means? > >> > >I guess it's related with advanced power saving. It would indicate > >current Energy detect status in PHY POV. > >Generally Marvell's PHY will enter into automatic power saving mode > >when it does not see any energy signal on the link. I don't know > >exact time when it enters into that mode but it would take less > >than 10 seconds if PHY do not see energy signal from link partner > >once it initiated auto-negotiation. > >However, e1000phy(4) always disables energy detect feature in > >e1000phy_reset() so it wouldn't affect your issue, I guess. > > > >One interesting thing is that 0x100 of E1000_SSR register indicates > >energy detect status is in "Sleep mode" which means it didn't > >detect energy signal(i.e. lost link). I'm not sure whether this bit > >report correct status when energy detect feature is disabled > >though. > > > >Can you check whether your switch supports energy detect feature? > >Or if your switch support EEE feature, try disabling it. > > > >>>By chance, does your back-ported driver include r222219? > >>>If yes, did you cold boot after applying the change? > >>>Warm boot does have effect. > >>I do have this patch in the back-ported driver and due to several > >>reasons I didn't cold boot the appliance. We will give that a try and see. > >> > >Ok, let me know whether that makes any difference or not. > > > >>To be more precises I have included msk patches up to r222516. > >> > >>Thanks! > >[...] > >_______________________________________________ > >freebsd-net@freebsd.org mailing list > >http://lists.freebsd.org/mailman/listinfo/freebsd-net > >To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > After a weekend of test I can confirm the problem is gone with the back > ported msk driver from FreeBSD 9 and a little bit of patching. > > Apart from the packet drops I also had various report from my snmp trap > daemon. It was reporting the interface was going inactive and for a > while I though the packet drop and inactivity reports were linked. It > turned out there was a small race condition between the various polling > components in msk_mediastatus() that was confusing the snmp daemon while > the packet drops got solved by the back port. The race can be easily > solved with the following patch: > > > @@ -995,9 +996,11 @@ msk_mediastatus(struct ifnet *ifp, struct > ifmediareq *ifmr) > mii = device_get_softc(sc_if->msk_miibus); > > mii_pollstat(mii); > - MSK_IF_UNLOCK(sc_if); > + > ifmr->ifm_active = mii->mii_media_active; > ifmr->ifm_status = mii->mii_media_status; > + > + MSK_IF_UNLOCK(sc_if); > } > > Without moving down the msk lock its possible for one thread to see its > mii_media_status reset to IFM_AVALID in e1000phy_status() right before > the assignment to ifmr->ifm_status. This resulted in false reports about > interface inactivity in rare occasions between a kernel based probe and > the snmp trap daemon. This may happen when user application issues multiple SIOCGIFMEDIA ioctls without waiting for the completion of previous ioctl request. On second thought, I think it would be better to close the race because it's hard to predict how user application behaves. Thanks for pointing out. I'll handle this. > > Thanks to everyone that chipped in to help, > > Karim.