Skip site navigation (1)Skip section navigation (2)
Date:      Sun,  6 Dec 1998 14:59:05 -0600 (CST)
From:      Tony Kimball <alk@pobox.com>
To:        net@FreeBSD.ORG
Subject:   Re: resolver behaviour 
Message-ID:  <13930.53261.509843.979179@avalon.east>
References:  <13930.17883.922553.625725@avalon.east> <48026.912946905@gjp.erols.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Quoth Gary Palmer on Sun, 6 December:
: Tony Kimball wrote in message ID
: <13930.17883.922553.625725@avalon.east>:
: > Frankly, the current behaviour is just plain broken:  Bum nameservers
: > too often prevent FreeBSD applications from connecting to extant
: > hosts on the Internet.
: 
: If the local nameserver is bum, then that suggests a local administrative 
: failure, does it not? This is exactly the situation you are describing ... the 
: local nameserver that the resolver contacts cannot find the information it is 
: looking for. 

I'm talking about bad nameservers on the Internet at large.

: If, on the other hand, the local nameserver cannot find 
: authoratitive information from a *NON-LOCALLY* hosted zone, then that is a 
: failure which no ammout of hackery in libc will be able to overcome because in 
: all likelyhood the data you are looking for just *doesn't* exist, because of a 
: remote administrative failure. 

The data does exist, just not on all nameservers.

: Slowing down the applications acceptance of 
: that fact will do nothing to help 

The slow-down can be very small relative to human perception while
being quite long enough to avoid the 'packet storm' bugaboo --
actually I don't see the 'packet storm' because there is no cascade
effect -- at this point I'm talking about behaviour of an application
using gethostby*, not about named behaviour.

: if you did this change in the 
: environment I run at work, that is *exactly* what would happen. We'd have 
: sendmail processes hanging around `n' times longer than they should have, 
: because our nameserver setup *works*. 

No, that is misleading at best.  If you are sending mail to
a name that resolves, your sendmail process would not be delayed at
all.  If you are sending mail to a name that resolves on a fallback
server only, with a bogus record on your primary server, the mail
would get through, as opposed to failing.  The only case
in which a longer delay would result is If you are sending mail to
a bogus hostname, which is a very odd and rare case indeed!
In that case it would take not N*M as you imply, but N*F+M, where
F<<M (and hence N*F+M approaches M for fixed N).

: Going to a different nameserver will get 
: you exactly the same answer. 

This simply is not the case.  I have encountered numerous occasions
in recent weeks where resorting to alternate nameservers has resulted
in a successful lookup.  In retrospect, I was observing the same sort
of behaviour of the Internet at large in the mid-80's, so this has
proven to be a long-term condition common to a wide variety of regions
of the global DNS graph.

: about handling internet failures in general. Its not libresolv's job to try 
: and second guess what bind is doing. I say again: your nameserver setup is 
: broken. You are really confusing the work that bind does with the work that 
: libresolv does.

For the sake of illustration, assume that I don't have or want a nameserver:
Nameservers are for network administrators while  I am a user, with an
application -- I want my application to work.

: Perhaps you are suggesting a kludge in gethostby* to work around a broken 
: setup? Thats sure the way it reads to me.

Exactly!  The universe is broken, and gethostby* should recognize that
fact and provide a way to deal with the reality.

Perhaps what is wanting is a novel, orthogonal configuration
parameter --  instead of an expansion of the function of the 'nameserver'
entry in resolv.conf, one might provide a distinct entry type.

nameserver primary.namserver.net
parallel alternate.nameserver.net 
nameserver fallback.nameserver.net
timeout 2000
pausetime 1000

The parallel nameserver entry would specify an alternate nameserver
which would be queried after pausetime milliseconds, if the primary
has not responded, or immediately if the primary gave a negative
response.  If no responses were recieved witin timeout milliseconds of
the initial query, the fallback nameserver would be queried.  The only
type of query which would suffer API latency greater than timeout*N as
a result would be a query for a truly bad name.  All other cases
are improved.  Moreover, this is a configurable parameter.  If you
aren't having problems with your primary nameserver, you don't need
to add a parallel alternate at all.

: > But this only pushes the problem out one level, to named.
: 
: I don't follow. You tell named that data for `x' is found on `x's namesevrer, 
: and data for everything else is found on `y's nameserver, and it works. Thats 
: how named is designed to work! It is *not* how libresolv is designed to work!

Named will try a server, get a negative response, and return a
negative response, just like gethostby*.  The problem still exists.
I can configure named correctly, even with Archie's expanded function
(which is not really relevant to this particular problem),
and unless I provide gethostby* with alternate service, the lookup
will still fail, while it should succeed because it could succeed
if only it were applied to the alternate server.  The problem still
exists because it has not been addressed.

: > Archie's patch then fixes the problem.  (I'd like to see that patch in
: > current!)
: 
: If it goes in -current, then it had better be off by default. I firmly believe 
: that this is a negatively impacting change for the majority of freebsd users 
: out there. 

You might easily have been misled by the fact that my earlier mail
discussed two distinct problems, often switching back and forth
between them, for which I apologize.  Archie's patch does not address
the problem of cache-polluted or otherwise corrupt name server
information.  It merely allows the dns administrator to specify
forwarding servers on a per-zone basis.  This is new optional
functionality.  The problem which it fixes is not the same problem
that I discuss in the context of gethostby*.



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13930.53261.509843.979179>