Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Oct 2006 17:55:02 -0700
From:      perikillo <perikillo@gmail.com>
To:        "FreeBSD Mailing List" <freebsd-questions@freebsd.org>
Subject:   Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups
Message-ID:  <51d7a5160610031755m643de45dk48b2c4d26d3f511a@mail.gmail.com>
In-Reply-To: <51d7a5160610031351h62e33f42kbbdd7e4001c9342d@mail.gmail.com>
References:  <51d7a5160610031351h62e33f42kbbdd7e4001c9342d@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/3/06, perikillo <perikillo@gmail.com> wrote:
>
>   Hi people i have read a some mails about this problem, it looks like all
> was running some 5.X branch, i have been using FreeBSD 6.1 some months
> ago,  yesterday i make the buildworld process, right now i have my box with
> FreeBSD6.1-p10.
>
>   This box runs bacula server with this NIC:
>
> vr0: <VIA VT6102 Rhine II 10/100BaseTX> port 0xe400-0xe4ff mem
> 0xee022000-0xee0220ff at device 18.0 on pci0
> vr0: Reserved 0x100 bytes for rid 0x10 type 4 at 0xe400
> miibus0: <MII bus> on vr0
> vr0: bpf attached
> vr0: Ethernet address: 00:01:6c:2c:09:90
> vr0: [MPSAFE]
>
>   This NIC is integrated with the motherboard, i used this box with
> freebsd 5.4-pX almost 1 year running bacula 1.38.5 without a problem.
>
>   1 full backup take almost 140Gb of data.
>
> Last week i lost 1 job Full Backup from one of my biggest servers running
> RH9 aprox 80Gb off data, bacula just backup 35Gb and mark the job ->Error
>
> 26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Fatal error: Network
> error with FD during Backup: ERR=Operation timed out
> 26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Fatal error: No Job
> status returned from FD.
> 26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Error: Bacula 1.38.11(28Jun06): 26-Sep-2006 00:28:48
>
> FD termination status:  Error
> SD termination status:  Error
> Termination:            *** Backup Error ***
>
>   I have no problem with the client, is running our ERP software and no
> comment here.
>
> In my freebsd console appear this:
>
> vr0: watchdog timeout
>
>   I reset the server, and all the Differential backups has been working
> good, i do the buildworld yesterday and let my bacula server ready to do a
> full backup for all my clients and whops...
>
> I lost 2 clients jobs:
>
> Client 1:
>
> 02-Oct 18:30 bacula-dir: Start Backup JobId 176, Job=
> PDC.2006-10-02_18.30.00
> 02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Fatal error: Network
> error with FD during Backup: ERR=Operation timed out
> 02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Fatal error: No Job
> status returned from FD.
> 02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Error: Bacula 1.38.11(28Jun06): 02-Oct-2006 20:40:11
>   JobId:                  176
>   Job:                    PDC.2006-10-02_18.30.00
>   Backup Level:           Full
>   Client:                   "PDC" Windows NT 4.0,MVS,NT 4.0.1381
>   FileSet:                "PDC-FS" 2006-08-21 18:04:12
>   Pool:                   "FullTape"
>   Storage:                "LTO-1"
>   Scheduled time:         02-Oct-2006 18:30:00
>   Start time:             02-Oct-2006 18:30:06
>   End time:               02-Oct-2006 20:40:11
>   Elapsed time:           2 hours 10 mins 5 secs
>   Priority:               11
>   FD Files Written:       0
>   SD Files Written:       0
>   FD Bytes Written:       0 (0 B)
>   SD Bytes Written:       0 (0 B)
>   Rate:                   0.0 KB/s
>   Software Compression:   None
>   Volume name(s):         FullTape-0004
>   Volume Session Id:      2
>   Volume Session Time:    1159832414
>   Last Volume Bytes:      38,857,830,949 ( 38.85 GB)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  Error
>   SD termination status:  Error
>   Termination:            *** Backup Error ***
>
> Client 2
>
> 02-Oct 21:30 bacula-dir: Start Backup JobId 178, Job=
> MBXBDCB.2006-10-02_21.30.00
> 02-Oct 21:31 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
> Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
> Retrying ...
> 02-Oct 21:37 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
> Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
> Retrying ...
> 02-Oct 21:44 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
> Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
> Retrying ...
> 02-Oct 21:51 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
> Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
> Retrying ...
> 02-Oct 21:58 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
> Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
> Retrying ...
> 02-Oct 22:04 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853
> Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down
> Retrying ...
> 02-Oct 22:10 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Fatal error: bnet.c:859
> Unable to connect to File daemon on 192.168.2.9:9102 . ERR=Host is down
> 02-Oct 22:10 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Error: Bacula 1.38.11(28Jun06): 02-Oct-2006 22:10:03
>   JobId:                  178
>   Job:                    MBXBDCB.2006-10-02_21.30.00
>   Backup Level:           Full
>   Client:                 "MBXBDCB" i686-pc-linux-gnu,redhat,9
>   FileSet:                "MBXBDCB-FS" 2006-08-21 23:00:02
>   Pool:                   "FullTape"
>   Storage:                "LTO-1"
>   Scheduled time:         02-Oct-2006 21:30:00
>   Start time:             02-Oct-2006 21:30:02
>   End time:               02-Oct-2006 22:10:03
>   Elapsed time:           40 mins 1 sec
>   Priority:               13
>   FD Files Written:       0
>   SD Files Written:       0
>   FD Bytes Written:       0 (0 B)
>   SD Bytes Written:       0 (0 B)
>   Rate:                   0.0 KB/s
>   Software Compression:   None
>   Volume name(s):
>   Volume Session Id:      4
>   Volume Session Time:    1159832414
>   Last Volume Bytes:      38,857,830,949 (38.85 GB)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:
>   SD termination status:  Waiting on FD
>   Termination:            *** Backup Error ***
>
> My console again:
>
> vr0: watchdog timeout
>
> But my catalog backup was made with success.
>
> 03-Oct 03:00 bacula-dir: Start Backup JobId 179, Job=
> BackupCatalog.2006-10-03_03.00.00
> 03-Oct 03:03 bacula-dir: Bacula 1.38.11 (28Jun06): 03-Oct-2006 03:03:00
>   JobId:                  179
>   Job:                    BackupCatalog.2006-10-03_03.00.00
>   Backup Level:           Full
>   Client:                 "BACULA" i386-portbld-freebsd6.1,freebsd,
> 6.1-RELEASE-p3
>   FileSet:                "CATALOG-FS" 2006-08-22 05:00:02
>   Pool:                   "FullTape"
>   Storage:                "LTO-1"
>   Scheduled time:         03-Oct-2006 03:00:00
>   Start time:             03-Oct-2006 03:00:50
>   End time:               03-Oct-2006 03:03:00
>   Elapsed time:           2 mins 10 secs
>   Priority:               14
>   FD Files Written:       7,646
>   SD Files Written:       7,646
>   FD Bytes Written:       360,432,688 (360.4 MB)
>   SD Bytes Written:       361,320,457 (361.3 MB)
>   Rate:                   2772.6 KB/s
>   Software Compression:   None
>   Volume name(s):         FullTape-0004
>   Volume Session Id:      5
>   Volume Session Time:    1159832414
>   Last Volume Bytes:      39,219,629,264 (39.21 GB)
>   Non-fatal FD errors:    0
>   SD Errors:              0
>   FD termination status:  OK
>   SD termination status:  OK
>   Termination:            Backup OK
>
> I wasnt on that office,  i  note this during the morning because went i
> was trying to access that server from the other building with putty, i
> couldn't connect at first, them my main say "it's happend again :-("... i
> call to my friend there to un-plug and plug the cable and just with that i
> was able to connect to that server.
>
>    It looks like this NIC is having problems with the workload hi, i have
> 2 things here that i can do:
>
> 1; Change the cable and try again.
> 2; Change the NIC and try again.
>
>    What else can i do..?
>
>    But i really hope someone fix this problem, thanks all for your time.
>
> Part of my dmesg output:
>
> Copyright (c) 1992-2006 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights reserved.
> FreeBSD 6.1-RELEASE-p10 #5: Mon Oct  2 13:26:52 PDT 2006
>     root@bacula.MBX.local:/usr/obj/usr/src/sys/BACULA
> Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a14000.
> Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0a14188.
> Table 'FACP' at 0x1bff3040
> Table 'APIC' at 0x1bff7dc0
> MADT: Found table at 0x1bff7dc0
> MP Configuration Table version 1.1 found at 0xc00f1400
> APIC: Using the MADT enumerator.
> MADT: Found CPU APIC ID 0 ACPI ID 0: enabled
> ACPI APIC Table: <KM400  AWRDACPI>
> Calibrating clock(s) ... i8254 clock: 1193181 Hz
> CLK_USE_I8254_CALIBRATION not specified - using default frequency
> Timecounter "i8254" frequency 1193182 Hz quality 0
> Calibrating TSC clock ... TSC clock: 1600072446 Hz
> CPU: AMD Duron(tm) processor (1600.07-MHz 686-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x681  Stepping = 1
>
> Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
>   AMD Features=0xc0400800<SYSCALL,MMX+,3DNow+,3DNow>
> Data TLB: 32 entries, fully associative
> Instruction TLB: 16 entries, fully associative
> L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
> L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way
> associative
> L2 internal cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 8-way
> associative
> real memory  = 469696512 (447 MB)
> Physical memory chunk(s):
> 0x0000000000001000 - 0x000000000009efff, 647168 bytes (158 pages)
> 0x0000000000100000 - 0x00000000003fffff, 3145728 bytes (768 pages)
> 0x0000000000c25000 - 0x000000001b7d7fff, 448475136 bytes (109491 pages)
> avail memory = 450490368 (429 MB)
> bios32: Found BIOS32 Service Directory header at 0xc00fac70
> bios32: Entry = 0xfb0f0 (c00fb0f0)  Rev = 0  Len = 1
> pcibios: PCI BIOS entry at 0xf0000+0xb160
> pnpbios: Found PnP BIOS data at 0xc00fbc20
> pnpbios: Entry = f0000:bc50  Rev = 1.0
> Other BIOS signatures found:
> APIC: CPU 0 has ACPI ID 0
> MADT: Found IO APIC ID 2, Interrupt 0 at 0xfec00000
> ioapic0: Routing external 8259A's -> intpin 0
>
> Greetings.
>
>


Wow....

   Today i change the NIC to one 3COM:

xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xd400-0xd47f mem
0xee021000-0xee02107f irq 17 at device 9.0 on pci0
xl0: Reserved 0x80 bytes for rid 0x14 type 3 at 0xee021000
xl0: using memory mapped I/O
xl0: media options word: a
xl0: found MII/AUTO
miibus0: <MII bus> on xl0
xl0: bpf attached
xl0: Ethernet address: 00:01:02:6d:e8:a4
xl0: [MPSAFE]

Right now my first backup again crash

xl0: watchdog timeout

Right now i change the cable from on port to another and see what happends.

Guy, please someone has something to tell me, this is critical for me.

This is my second NIC.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51d7a5160610031755m643de45dk48b2c4d26d3f511a>