Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Jan 2001 12:21:51 -0500 (EST)
From:      Mike Andrews <mandrews@bit0.com>
To:        stable@freebsd.org
Subject:   Weird sporadic DNS resolution problems
Message-ID:  <Pine.BSF.4.21.0101111217090.23537-100000@mindcrime.bit0.com>

next in thread | raw e-mail | index | archive | help
I'm having a bizarre DNS resolution problem that I'm having a hell of a
time tracking down.  Someone tell me I'm just being stupid. :)

For a few months now, I'm *sporadically* unable to resolve *some* external
domains.  This started happening approximately between 4.1.1-RELEASE and
4.2-RELEASE, when I believe Bind 8 was upgraded in the source tree.  (I
don't remember the exact date, sorry)

Here's what appears to be going on:

When one (but not both) of the nameservers for a domain replies
non-authoritatively, named will cache a negative response, rather than
asking the other nameserver.  Subsequent lookups return an immediate
failure.  Restarting the nameserver, and then immediately querying the
same problematic domain DOES work, but only the first query.  After a few
minutes/hours the domain stops working again.

This is especially chronic because Sendmail tries to resolve domains on
incoming email (for spam protection purposes)...  it will give "Domain of
sender address foo@bar does not resolve" and return a 451 code. This
causes the other end to retry periodically, unless the other end is
something like Outlook Express, in which case the customer calls me and
complains. :)

One example domain is "farmersfrankfort.com".  This was moved from us to
another site yesterday, but we still do MX for them.  Looking at whois and
at the root servers, you can see that their two new nameservers are now
"cerberus.sbscorp.com" and "ns1.qwest.net".  Querying the sbscorp server
works great, querying qwest doesn't -- it appears Quest never added them
to their nameserver config at all.  (It has been only about 24 hours, so
it's not *too* surprising I guess...) Anyway, when someone on one of our
dialups tries to send mail with @farmersfrankfort.com on the end, our
Sendmail is unable to resolve it and rejects the message. If I restart
(not reload) named, it will start working for a while, then die on its own
again.  My theory is that if it happens to query sbscorp it's happy, if it
happens to query qwest it isn't, and caches the fact that it isn't.

Another example is "setel.com" and "se-tel.com".  We sometimes have
problems exchanging mail with them because one of their DNS servers
appears to be answering non-authoritatively.  Again, I can flush my
backlog by restarting named and immediately running the sendmail queue
manually (and I could probably flush their backlog by telnetting to their
SMTP port and issuing an ETRN)...  but obviously that's not exactly
elegant :)

I've tried adding "max-ncache-ttl 1" to my named config, hoping it would
help.  It didn't.

In one sense this is "not my problem" because their name server shouldn't
be answering non-authoritatively in the first place.  But the fact that
this started happening after a make world a few months ago, and that I
feel it should be a slight bit more tolerant of other people's sloppy
configurations, makes it my problem.

Anyone have any ideas as to what's going on, or can tell me what debugging
output to enable that I could send here that would help figure it out?  
Configuration options to named that would revert to older behavior?  A
whack on the head?  (I could just compile an older named I guess, but I
fear opening up security holes/DoS attacks.)


Mike Andrews * mandrews@dcr.net * mandrews@bit0.com * http://www.bit0.com
VP, sysadmin, & network guy, Digital Crescent Inc, Frankfort KY
Internet access for Frankfort, Lexington, Louisville and surrounding counties
www.fark.com: If it's not news, it's Fark.  (Or something like that.)



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0101111217090.23537-100000>