Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Jul 2001 17:45:08 -0700 (PDT)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Len Conrad <LConrad@Go2France.com>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Weird named problem - IN A for nameservers being lost!
Message-ID:  <200107170045.f6H0j8N33016@earth.backplane.com>
References:  <5.1.0.14.0.20010622153827.02fa0da0@mail.Go2France.com> <5.1.0.14.0.20010622153827.02fa0da0@mail.Go2France.com> <5.1.0.14.0.20010623185802.04051eb0@mail.Go2France.com>

next in thread | previous in thread | raw e-mail | index | archive | help
    I've been trying to track down a weird problem with our mail system 
    suddenly believing that a host does not exist, or timing out in DNS.

    I tracked it down to the DNS server, but I am not entirely sure what is
    going on.  What appears to be happening is that the glue IN A record
    for the NS server for a domain is getting lost, and the NS record is 
    remaining.  When named gets into this state, it doesn't seem to be able
    to recover... it sees the NS record but it can't resolve it because
    the glue record is gone, and it doesn't try to get it after that.

    If you look at the cache dumps and dig output below, you can clearly
    see the timeout for fuji.jamcracker.com is less then the timeout
    for jamcracker.com AFTER we've looked up other elements for fuji,
    which means that when it timed out, that IN A record will be gone.
    But that IN A record is the IP address for the NS.  So when it times
    out, the jamcracker entry is left there with no NS records whatsoever.

    I believe what is happening is that something is looking up other 
    records for fuji, and this is replacing the original glue record with
    the real IN A record, but also changing the timeouts somehow and
    causing fuji's record to timeout early.

    As far as I can tell, this is an extremely serious bug in named.  I am
    running 8.2.3.

    This has occured with several mail destinations, not just jamcracker.
    I went through jamcrackers whole DNS hierarchy and everything is setup
    properly, including all the timeouts (they are all set to 3600 seconds).

    Has anyone else seen this?  Anyone know what is going on here?

						-Matt

					---

    Here is a cache dump of a case where 'nslookup -query=mx jamcracker.com'
    no longer works.  Everything with jamcracker in it is being dumped:

jamcracker      2436    IN      SOA     fuji.jamcracker.com. hostmaster.jamcracker.com. (
                2001062900 10800 3600 1728000 3600 )    ;Cr=auth [216.32.126.150]
;       2436    IN      AAAA    fuji.jamcracker.com. hostmaster.jamcracker.com. (
;               2001062900 10800 3600 1728000 3600 );jamcracker.com.;NODATA     ;-$     ;Cr=auth [216.32.126.150]
        2436    IN      NS      fuji.jamcracker.com.    ;Cr=auth [216.32.126.150]
        2436    IN      A       66.35.217.100   ;Cr=auth [216.32.126.150]

    And here is a cache dump after I restart named and do the same nslookup:

jamcracker      3591    IN      NS      fuji.jamcracker.com.    ;Cr=auth [216.32.126.150]
        3591    IN      MX      5 va2mc.ummailbox.net.  ;Cr=auth [216.32.126.150]
$ORIGIN jamcracker.com.
fuji    3591    IN      A       66.35.220.151   ;Cr=addtnl [216.32.126.150]

    And here is a dump after named has been running a while:

jamcracker      2016    IN      NS      fuji.jamcracker.com.    ;Cr=auth [216.32.126.150]
        2016    IN      MX      5 va2mc.ummailbox.net.  ;Cr=auth [216.32.126.150]
        2206    IN      SOA     fuji.jamcracker.com. hostmaster.jamcracker.com. (
                2001062900 10800 3600 1728000 3600 )    ;Cr=auth [66.35.220.151]
;       2206    IN      AAAA    fuji.jamcracker.com. hostmaster.jamcracker.com. (
;               2001062900 10800 3600 1728000 3600 );jamcracker.com.;NODATA     ;-$     ;Cr=auth [66.35.220.151]
        3140    IN      A       66.35.217.100   ;Cr=auth [66.35.220.151]
$ORIGIN jamcracker.com.
fuji    1846    IN      A       66.35.220.151   ;NT=13 Cr=addtnl [216.32.126.150]
;       2213    IN      AAAA    fuji.jamcracker.com. hostmaster.jamcracker.com. (
;               2001062900 10800 3600 1728000 3600 );jamcracker.com.;NODATA     ;-$     ;

    And here is the dig output.

earth:/etc/namedb# dig jamcracker.com

; <<>> DiG 8.3 <<>> jamcracker.com 
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
;; QUERY SECTION:
;;      jamcracker.com, type = A, class = IN

;; ANSWER SECTION:
jamcracker.com.         50m27s IN A     66.35.217.100

;; AUTHORITY SECTION:
jamcracker.com.         31m43s IN NS    fuji.jamcracker.com.

;; ADDITIONAL SECTION:
fuji.jamcracker.com.    28m53s IN A     66.35.220.151

;; Total query time: 1 msec
;; FROM: earth.backplane.com to SERVER: default -- 127.0.0.1
;; WHEN: Mon Jul 16 17:36:13 2001
;; MSG SIZE  sent: 32  rcvd: 83


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200107170045.f6H0j8N33016>