Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Oct 2011 10:49:55 -0700
From:      YongHyeon PYUN <pyunyh@gmail.com>
To:        Karim <fodillemlinkarim@gmail.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: if_msk.c link negotiation / packet drops [solved!]
Message-ID:  <20111017174955.GA3509@michelle.cdnetworks.com>
In-Reply-To: <4E9C3B7F.8090700@gmail.com>
References:  <4E94637A.5090607@gmail.com> <20111011171029.GA5661@michelle.cdnetworks.com> <CAN6yY1tWQZwdqgYdN3uBBdXiGQ2OFDMYbSjhEUeTimHjBnR9iA@mail.gmail.com> <4E959F06.6040906@gmail.com> <20111012170347.GA9138@michelle.cdnetworks.com> <4E95DDEB.1090500@gmail.com> <20111012192730.GB9138@michelle.cdnetworks.com> <4E9C3B7F.8090700@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Oct 17, 2011 at 10:28:15AM -0400, Karim wrote:
> Hi,
> 
> On 11-10-12 03:27 PM, YongHyeon PYUN wrote:
> >On Wed, Oct 12, 2011 at 02:35:23PM -0400, Karim wrote:
> >>Hi,
> >>On 11-10-12 01:03 PM, YongHyeon PYUN wrote:
> >>>On Wed, Oct 12, 2011 at 10:07:02AM -0400, Karim wrote:
> >[...]
> >
> >>>Hmm, that indicates driver lost established link. msk(4) will
> >>>detect this condition and stop RX/TX MACs until it knows PHY
> >>>re-established a link. This may be the reason why you see occasional
> >>>packet drops. However I don't know why PHY loses established link
> >>>in the middle of working.
> >>>
> >>Yes, I am convinced this lost of link is related to the packet drops as
> >>well. At this point we can safely discard cabling issues or router
> >>issues (physical ones that is) since the same happens on a different
> >>network with different cables.
> >>>> From the code in e1000phy_status:
> >>>>
> >>>>static void
> >>>>e1000phy_status(struct mii_softc *sc)
> >>>>{
> >>>>     struct mii_data *mii = sc->mii_pdata;
> >>>>     int bmcr, bmsr, ssr;
> >>>>
> >>>>     mii->mii_media_status = IFM_AVALID;
> >>>>     mii->mii_media_active = IFM_ETHER;
> >>>>
> >>>>     bmsr = PHY_READ(sc, E1000_SR) | PHY_READ(sc, E1000_SR);
> >>>>     bmcr = PHY_READ(sc, E1000_CR);
> >>>>     ssr = PHY_READ(sc, E1000_SSR);
> >>>>
> >>>>     if (bmsr&   E1000_SR_LINK_STATUS)
> >>>>         mii->mii_media_status |= IFM_ACTIVE;
> >>>>
> >>>>
> >>>>I can see the bmsr&   E1000_SR_LINK_STATUS check failing when the 
> >>>>problem
> >>>>occurs. As a side note why are we ORing the same call twice isn't the
> >>>>same thing as calling it once:
> >>>>
> >>>>bmsr = PHY_READ(sc, E1000_SR) | PHY_READ(sc, E1000_SR);
> >>>>
> >>>The E1000_SR_LINK_STATUS bit is latched low so it should be read
> >>>twice. If you want to read once use E1000_SSR_LINK bit of
> >>>E1000_SSR register but I remember that bit was not reliable on some
> >>>PHY models.
> >>Thanks for the explanation and the alternative. The ssr register seems
> >>to give me the right bit (E1000_SSR_LINK) but it also gives me an extra
> >>bit 0x0100 that is not defined in e1000phyreg.h. Any idea what that bit
> >>would be/means?
> >>
> >I guess it's related with advanced power saving. It would indicate
> >current Energy detect status in PHY POV.
> >Generally Marvell's PHY will enter into automatic power saving mode
> >when it does not see any energy signal on the link. I don't know
> >exact time when it enters into that mode but it would take less
> >than 10 seconds if PHY do not see energy signal from link partner
> >once it initiated auto-negotiation.
> >However, e1000phy(4) always disables energy detect feature in
> >e1000phy_reset() so it wouldn't affect your issue, I guess.
> >
> >One interesting thing is that 0x100 of E1000_SSR register indicates
> >energy detect status is in "Sleep mode" which means it didn't
> >detect energy signal(i.e. lost link). I'm not sure whether this bit
> >report correct status when energy detect feature is disabled
> >though.
> >
> >Can you check whether your switch supports energy detect feature?
> >Or if your switch support EEE feature, try disabling it.
> >
> >>>By chance, does your back-ported driver include r222219?
> >>>If yes, did you cold boot after applying the change?
> >>>Warm boot does have effect.
> >>I do have this patch in the back-ported driver and due to several
> >>reasons I didn't cold boot the appliance. We will give that a try and see.
> >>
> >Ok, let me know whether that makes any difference or not.
> >
> >>To be more precises I have included msk patches up to r222516.
> >>
> >>Thanks!
> >[...]
> >_______________________________________________
> >freebsd-net@freebsd.org mailing list
> >http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
> After a weekend of test I can confirm the problem is gone with the back 
> ported msk driver from FreeBSD 9 and a little bit of patching.
> 
> Apart from the packet drops I also had various report from my snmp trap 
> daemon. It was reporting the interface was going inactive and for a 
> while I though the packet drop and inactivity reports were linked. It 
> turned out there was a small race condition between the various polling 
> components in msk_mediastatus() that was confusing the snmp daemon while 
> the packet drops got solved by the back port. The race can be easily 
> solved with the following patch:
> 
> 
> @@ -995,9 +996,11 @@ msk_mediastatus(struct ifnet *ifp, struct 
> ifmediareq *ifmr)
>         mii = device_get_softc(sc_if->msk_miibus);
> 
>         mii_pollstat(mii);
> -       MSK_IF_UNLOCK(sc_if);
> +
>         ifmr->ifm_active = mii->mii_media_active;
>         ifmr->ifm_status = mii->mii_media_status;
> +
> +       MSK_IF_UNLOCK(sc_if);
>  }
> 
> Without moving down the msk lock its possible for one thread to see its 
> mii_media_status reset to IFM_AVALID in e1000phy_status() right before 
> the assignment to ifmr->ifm_status. This resulted in false reports about 
> interface inactivity in rare occasions between a kernel based probe and 
> the snmp trap daemon.

This may happen when user application issues multiple SIOCGIFMEDIA
ioctls without waiting for the completion of previous ioctl
request.
On second thought, I think it would be better to close the race
because it's hard to predict how user application behaves.
Thanks for pointing out. I'll handle this.

> 
> Thanks to everyone that chipped in to help,
> 
> Karim.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111017174955.GA3509>