From owner-freebsd-hackers Mon Jul 16 17:46:28 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 849E337B406 for ; Mon, 16 Jul 2001 17:46:14 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.4/8.11.2) id f6H0j8N33016; Mon, 16 Jul 2001 17:45:08 -0700 (PDT) (envelope-from dillon) Date: Mon, 16 Jul 2001 17:45:08 -0700 (PDT) From: Matt Dillon Message-Id: <200107170045.f6H0j8N33016@earth.backplane.com> To: Len Conrad Cc: freebsd-hackers@FreeBSD.ORG Subject: Weird named problem - IN A for nameservers being lost! References: <5.1.0.14.0.20010622153827.02fa0da0@mail.Go2France.com> <5.1.0.14.0.20010622153827.02fa0da0@mail.Go2France.com> <5.1.0.14.0.20010623185802.04051eb0@mail.Go2France.com> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I've been trying to track down a weird problem with our mail system suddenly believing that a host does not exist, or timing out in DNS. I tracked it down to the DNS server, but I am not entirely sure what is going on. What appears to be happening is that the glue IN A record for the NS server for a domain is getting lost, and the NS record is remaining. When named gets into this state, it doesn't seem to be able to recover... it sees the NS record but it can't resolve it because the glue record is gone, and it doesn't try to get it after that. If you look at the cache dumps and dig output below, you can clearly see the timeout for fuji.jamcracker.com is less then the timeout for jamcracker.com AFTER we've looked up other elements for fuji, which means that when it timed out, that IN A record will be gone. But that IN A record is the IP address for the NS. So when it times out, the jamcracker entry is left there with no NS records whatsoever. I believe what is happening is that something is looking up other records for fuji, and this is replacing the original glue record with the real IN A record, but also changing the timeouts somehow and causing fuji's record to timeout early. As far as I can tell, this is an extremely serious bug in named. I am running 8.2.3. This has occured with several mail destinations, not just jamcracker. I went through jamcrackers whole DNS hierarchy and everything is setup properly, including all the timeouts (they are all set to 3600 seconds). Has anyone else seen this? Anyone know what is going on here? -Matt --- Here is a cache dump of a case where 'nslookup -query=mx jamcracker.com' no longer works. Everything with jamcracker in it is being dumped: jamcracker 2436 IN SOA fuji.jamcracker.com. hostmaster.jamcracker.com. ( 2001062900 10800 3600 1728000 3600 ) ;Cr=auth [216.32.126.150] ; 2436 IN AAAA fuji.jamcracker.com. hostmaster.jamcracker.com. ( ; 2001062900 10800 3600 1728000 3600 );jamcracker.com.;NODATA ;-$ ;Cr=auth [216.32.126.150] 2436 IN NS fuji.jamcracker.com. ;Cr=auth [216.32.126.150] 2436 IN A 66.35.217.100 ;Cr=auth [216.32.126.150] And here is a cache dump after I restart named and do the same nslookup: jamcracker 3591 IN NS fuji.jamcracker.com. ;Cr=auth [216.32.126.150] 3591 IN MX 5 va2mc.ummailbox.net. ;Cr=auth [216.32.126.150] $ORIGIN jamcracker.com. fuji 3591 IN A 66.35.220.151 ;Cr=addtnl [216.32.126.150] And here is a dump after named has been running a while: jamcracker 2016 IN NS fuji.jamcracker.com. ;Cr=auth [216.32.126.150] 2016 IN MX 5 va2mc.ummailbox.net. ;Cr=auth [216.32.126.150] 2206 IN SOA fuji.jamcracker.com. hostmaster.jamcracker.com. ( 2001062900 10800 3600 1728000 3600 ) ;Cr=auth [66.35.220.151] ; 2206 IN AAAA fuji.jamcracker.com. hostmaster.jamcracker.com. ( ; 2001062900 10800 3600 1728000 3600 );jamcracker.com.;NODATA ;-$ ;Cr=auth [66.35.220.151] 3140 IN A 66.35.217.100 ;Cr=auth [66.35.220.151] $ORIGIN jamcracker.com. fuji 1846 IN A 66.35.220.151 ;NT=13 Cr=addtnl [216.32.126.150] ; 2213 IN AAAA fuji.jamcracker.com. hostmaster.jamcracker.com. ( ; 2001062900 10800 3600 1728000 3600 );jamcracker.com.;NODATA ;-$ ; And here is the dig output. earth:/etc/namedb# dig jamcracker.com ; <<>> DiG 8.3 <<>> jamcracker.com ;; res options: init recurs defnam dnsrch ;; got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1 ;; QUERY SECTION: ;; jamcracker.com, type = A, class = IN ;; ANSWER SECTION: jamcracker.com. 50m27s IN A 66.35.217.100 ;; AUTHORITY SECTION: jamcracker.com. 31m43s IN NS fuji.jamcracker.com. ;; ADDITIONAL SECTION: fuji.jamcracker.com. 28m53s IN A 66.35.220.151 ;; Total query time: 1 msec ;; FROM: earth.backplane.com to SERVER: default -- 127.0.0.1 ;; WHEN: Mon Jul 16 17:36:13 2001 ;; MSG SIZE sent: 32 rcvd: 83 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message