Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Mar 2010 11:59:34 -0700
From:      Garrett Cooper <yanefbsd@gmail.com>
To:        Anton Shterenlikht <mexas@bristol.ac.uk>
Cc:        jhell <jhell@dataix.net>, FreeBSD Current <freebsd-current@freebsd.org>, freebsd-ia64@freebsd.org
Subject:   Re: ldd leaves the machine unresponsive
Message-ID:  <7d6fde3d1003181159t3eb1a665ge07f5673cf096e67@mail.gmail.com>
In-Reply-To: <20100318155113.GE1552@mech-cluster241.men.bris.ac.uk>
References:  <20100317163230.GJ87732@mech-cluster241.men.bris.ac.uk> <alpine.BSF.2.00.1003181013370.91777@pragry.qngnvk.ybpny> <20100318155113.GE1552@mech-cluster241.men.bris.ac.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 18, 2010 at 8:51 AM, Anton Shterenlikht <mexas@bristol.ac.uk> w=
rote:
> On Thu, Mar 18, 2010 at 11:29:36AM -0400, jhell wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> On Wed, 17 Mar 2010 12:32, Anton Shterenlikht wrote:
>> In Message-Id: <20100317163230.GJ87732@mech-cluster241.men.bris.ac.uk>
>>
>> > Just updated to ia64 r205248
>> >
>> > If my problem is due to my mis-configuration,
>> > I apologise in advance.
>> >
>> > I run this shell script after each upgrade
>> > and 'make delete-old-libs' to check
>> > if any shared objects need to be rebuilt:
>> >
>> > <start script>
>> >
>> > #!/bin/sh
>> >
>> > for file in `find /bin /sbin /usr/bin /usr/sbin /usr/lib /usr/libexec =
/usr/local -name "*"`
>> > do
>> > =A0 =A0 =A0 =A0echo $file
>> > =A0 =A0 =A0 =A0ldd $file >> /root/ldd_results 2> /dev/zero
>> > done
>> >
>> > <end script>
>> >
>>
>> This will probably do closer to what you actually would want to look for=
.
>>
>> Writing to /dev/zero ... I don't know never tried it since /dev/null is
>> usually the standard place to throw trash.
>>
>> #!/bin/sh
>> for file in `find /*bin /usr/*bin /usr/lib* /usr/local/*bin -type f` do
>> =A0 =A0 =A0 echo $file
>> =A0 =A0 =A0 ldd $file >>/root/ldd_results 2>/dev/null
>> done
>>
>> The problem with your script is that it finds most files that it can not
>> or is not useful to run ldd on and leaves you junk in return.
>>
>> It might be more useful if you searched for dynamically linked ELF
>> binaries to run ldd against like the following.
>>
>> =3D=3D=3D Script starts here =3D=3D=3D
>> #!/bin/sh
>>
>> SEARCHPATH=3D"/*bin /usr/*bin /usr/lib* /usr/local/*bin"
>>
>> trap 'exit 1' 2
>>
>> check_libs() {
>> for spath in $SEARCHPATH; do
>> =A0 =A0 =A0 =A0 =A0for ifelf in `find $spath -type f`; do
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ldd `file $ifelf | grep dynamically |=
 cut -f1 -d:`
>> =A0 =A0 =A0 =A0 =A0done
>> done
>> }
>>
>> check_libs 2>/dev/null
>> =3D=3D=3D Script ends here =3D=3D=3D
>>
>> The above will find all type ELF * that are dynamically linked within th=
e
>> SEARCHPATH variable and run ldd on them and print the results to stdout.
>>
>> Obviously since you are going to have thousands of files being questione=
d,
>> stdout is not going to be useful.
>>
>> So with the about stated:
>> save the script to: checklibs.sh
>> run with: "sh checklibs.sh >/root/checklibs_output"
>> or: "script /root/checklibs_output checklibs.sh"
>>
>> > After the upgrade to r205248, the script
>> > freezes at seemingly random points.
>> >
>>
>> Unneeded disk usage & execution.
>>
>> > I can still ssh to the machine (using keys), i.e.
>> > I see the welcome message, but cannot get to the console prompt.
>>
>> Of course... to many open files or processes in wait. SSH already has th=
e
>> information it needs loaded into memory, that's why you can get sort-of-=
in
>>
>> ZFS file-system perhaps ?
>>
>> >
>> > On the serial console I cannot get the prompt
>> > after entering the root password.
>> >
>>
>> See above.
>>
>> > I have top(1) running interactively in another window.
>> > The sh process is in "getblk" state, and ignores kill -9.
>> > But there's no ldd process.
>> >
>> > And shutdown requests are also ignored:
>> >
>> > # shutdown -r now
>> > Shutdown NOW!
>> > shutdown: [pid 8019]
>> > #
>> > and nothing happens after that
>> >
>> > So I have to do a cold reset via MP.
>> >
>> > On ia64 r204322, this script causes no problems.
>> >
>> > Please advise
>> >
>>
>> The above edited script should help to limit disk usage and too many ope=
n
>> processes that causes your machine to bog down like that. This script do=
es
>> have its limitations and there is one bug in it... Ill let you figure ou=
t
>> how to get rid of that bug but it really does not effect the intended
>> output so I left it alone and sent error output to fd/2.
>>
>> The limitations you'll find is how many files that ldd(1) or file(1) can
>> handle at one time. But if you specify specific paths like already in
>> SEARCHPATH then you will most likely never see this unless the files in
>> /*bin grow to be over max number of files that file(1) or ldd(1) can
>> handle at one time. Shortly said... use direct paths or short globs like
>> above.
>>
>> > many thanks
>> > anton
>> >
>>
>> A final note you might want to just install sysutils/libchk and run that=
.
>>
>> Standard Disclaimer: NONE OF THIS CONTAINED HEREIN "THIS MESSAGE" EXCUSE=
S
>> ANY OF THE UNEXPLAINED DISK LOCKING THAT IS GOING ON AND THE INFORMATION
>> FOR WHICH IT MAY CONTAIN BECOMING UNAVAILABLE AT ANY POINT IN TIME DURIN=
G
>> THE ORIGINAL RUN OF THE FIRST SCRIPT OR THE SECOND SCRIPT THAT WAS POSTE=
D
>> EITHER AS A ATTACHMENT OR IN-LINE.
>>
>> ;) JK!
>>
>> Good Luck.
>
> many thanks, this is very helpful
>
> I don't seem to have this lockup anymore.
> Don't know what was happening. I've run
> it now several times on 3 different ia64
> current (different revisions) boxes, with
> disks of different speed, and can't reproduce.
> My script was very crude, of course.
> I'll try sysutils/libchk

FWIW I've been seeing some performance issues with iir(4) and mfi(4)
backed UFS2 with softupdate filesystems on my new machine with some
other drivers loaded on my system [a PCI based em(4) card and
nvidia-driver enabled card -- which uses GIANT locking still].

Machine is Core i7 on an ASUS W6T Professional MB, 12GB RAM, with
debug symbols, ddb, kgdb, anti-reslock contention manager, (no
witness) etc.

I don't have much other than that to provide at this time, but it
might help to see if and when there's an overlap in the drivers noted
here.

Thanks,
-Garrett



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7d6fde3d1003181159t3eb1a665ge07f5673cf096e67>