Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Mar 2006 20:45:39 -0500
From:      Mikhail Teterin <mi+mx@aldan.algebra.com>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        alc@freebsd.org, stable@freebsd.org
Subject:   Re: more weird bugs with mmap-ing via NFS
Message-ID:  <200603212045.39845.mi%2Bmx@aldan.algebra.com>
In-Reply-To: <200603220109.k2M19GVS007470@apollo.backplane.com>
References:  <200603211607.30372.mi%2Bmx@aldan.algebra.com> <200603211948.28178.mi%2Bmx@aldan.algebra.com> <200603220109.k2M19GVS007470@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
в╕второк 21 березень 2006 20:09, Matthew Dillon Ви написали:
>     If the network bandwidth is still going full bore then the program is
>     doing something.  NFS retries would not account for it.  A simple
>     test for that would be to ^Z the program once it gets into this state
>     and see if the network bandwidth goes to zero.

Pressing ^Z moves the process' state from ``nfs'' to ``STOP'' according to 
top(1), but the shell does not give the prompt back for many minutes. Only 
when it does, does the bandwidth go down to negligable amounts.

>     So if we assume that packets aren't being lost, then the question
>     becomes: what is the program doing that is causing the network
>     bandwidth to go nuts?

You have the program's source... I run it simply as:

	mzip -g -v -b 16k -w /meow/tmp/db.dmp /backup/tmp/db.dmp.gz.part

/meow is local, /backup is mounted this way:

	mount_nfs -r 5120 -w 5120 -ointr pandora:/backup /backup

>     ktrace on the program would tell us if read() or write() or ftruncate()
>     were causing an issue.

According to `kdump -l', which I launched in parallel to the ktrace-ed mzip, 
the last syscall is madvise. But that returns long before the bandwidth 
shoots up...

>     'vmstat 1' while the program is running would tell us if VM faults
>     are creating an issue.

Just as `systat -vm', `vmstat 1' hangs -- and stalls everything else for many 
minutes. Maybe, this is the hint at too much faulting?

>   50 lines of output from something like this after the program has gotten
>   into its weird state might give us a clue:
>    tcpdump -s 4096 -n -i <interface> -l port 2049

Now I am thoroughly confused, the lines are very repetative:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em0, link-type EN10MB (Ethernet), capture size 4096 bytes
20:41:55.788436 IP 172.21.128.43.2049 > 172.21.130.86.1445243414: reply ok 60
20:41:55.788502 IP 172.21.130.86.1445243415 > 172.21.128.43.2049: 1472 write 
fh 1090,6005/15141914 5120 (5120) bytes @ 4943872
20:41:55.788811 IP 172.21.128.43.2049 > 172.21.130.86.1445243415: reply ok 60 
write ERROR: Permission denied
20:41:55.788872 IP 172.21.130.86.1445243416 > 172.21.128.43.2049: 1472 write 
fh 1090,6005/15141914 5120 (5120) bytes @ 4947968
[...]

The only reason for "permission denied" I know, is the firewall, but neither 
the server nor the client even have ipfw loaded...

Yours,

	-mi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200603212045.39845.mi%2Bmx>