From owner-freebsd-hackers@FreeBSD.ORG Wed Feb 20 15:49:17 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0EDC1222; Wed, 20 Feb 2013 15:49:17 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id AFB1AD80; Wed, 20 Feb 2013 15:49:16 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 08955B98D; Wed, 20 Feb 2013 10:49:16 -0500 (EST) From: John Baldwin To: Mikolaj Golub Subject: Re: libprocstat(3): retrieve process command line args and environment Date: Wed, 20 Feb 2013 09:04:14 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <20130119151253.GB88025@gmail.com> <201301251531.43540.jhb@freebsd.org> <20130212215054.GA9839@gmail.com> In-Reply-To: <20130212215054.GA9839@gmail.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201302200904.15324.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 20 Feb 2013 10:49:16 -0500 (EST) Cc: Stanislav Sedov , Kostik Belousov , "Robert N. M. Watson" , Attilio Rao , freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2013 15:49:17 -0000 On Tuesday, February 12, 2013 4:50:54 pm Mikolaj Golub wrote: > On Fri, Jan 25, 2013 at 03:31:43PM -0500, John Baldwin wrote: > > > BTW, one off-ball thought I have is that I would like to have a mode where > > libprocstat operates on a core file (of a process, not a kernel crash dump), > > so it could list the threads from a core dump, and possibly file descriptor > > info (if PR kern/173723 is implemented). > > > > We certainly could have a 'raw' mode where it spat out name: value or XML > > of the entire kinfo_proc perhaps. > > > > It looks very interesting! Do you mean something like this? Yes, exactly like this! Very nice indeed. > root@lisa:/root # sh -c 'kill -5 $$' > Trace/BPT trap (core dumped) > root@lisa:/root # ls -l sh.core > -rw------- 1 root wheel 8790016 Feb 12 21:17 sh.core > root@lisa:/root # procstat sh.core > PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM > 674 657 674 657 657 1 root - FreeBSD ELF32 sh > root@lisa:/root # procstat -f sh.core > PID COMM FD T V FLAGS REF OFFSET PRO NAME > 674 sh text v r r-------- - - - /bin/sh > 674 sh ctty v c rw------- - - - /dev/pts/0 > 674 sh cwd v d r-------- - - - /root > 674 sh root v d r-------- - - - / > 674 sh 0 v c rw------- 8 2537 - /dev/pts/0 > 674 sh 1 v c rw------- 8 2537 - /dev/pts/0 > 674 sh 2 v c rw------- 8 2537 - /dev/pts/0 > root@lisa:/root # procstat -v sh.core > PID START END PRT RES PRES REF SHD FL TP PATH > 674 0x8048000 0x8064000 r-x 28 0 1 0 CN-- vn /bin/sh > 674 0x8064000 0x8066000 rw- 2 0 1 0 ---- df > 674 0x28064000 0x2807a000 r-x 22 0 17 0 CN-- vn /libexec/ld-elf.so.1 > 674 0x2807a000 0x28083000 rw- 9 0 1 0 ---- df > 674 0x28084000 0x280a3000 r-x 31 32 2 1 CN-- vn /lib/libedit.so.7 > 674 0x280a3000 0x280a5000 rw- 2 0 1 0 C--- vn /lib/libedit.so.7 > 674 0x280a5000 0x280a7000 rw- 0 0 0 0 ---- -- > 674 0x280a7000 0x280e0000 r-x 57 0 4 2 CN-- vn /lib/libncurses.so.8 > 674 0x280e0000 0x280e3000 rw- 3 0 1 0 C--- vn /lib/libncurses.so.8 > 674 0x280e3000 0x28213000 r-x 304 0 34 17 CN-- vn /lib/libc.so.7 > 674 0x28213000 0x2821a000 rw- 7 0 1 0 C--- vn /lib/libc.so.7 > 674 0x2821a000 0x28243000 rw- 16 0 2 0 ---- df > 674 0x28400000 0x28c00000 rw- 24 0 2 0 ---- df > 674 0xbfbdf000 0xbfbff000 rwx 3 0 1 0 ---D df > 674 0xbfbff000 0xbfc00000 r-x 0 0 20 0 CN-- ph > > Here is my attempt to implement it: > > http://people.freebsd.org/~trociny/procstat_core.1.patch > > The patch needs much work yet, especially the userland part, but looks > like it is good enough for demonstration purposes to discuss how this > should be done properly. > > So, procstat data is stored in a core as note sections with name > "FreeBSD" and types NT_PROCSTAT_PROC, NT_PROCSTAT_FILES, ... > > The current format of notes is a header of sizeof(int) and data in the > format as it is returned by a related sysctl call. I think the > header should provide some versioning and for the cases I implemented > (kinfo_proc, kinfo_file, kinfo_vmentry) it contains a value of the > corresponding kinfo struct size (e.g. KINFO_VMENTRY_SIZE). It might be > not the best solution and I would be glad for suggestions. I think including the size is good and is probably sufficient for versioning. > (BTW, why don't we have constants like KINFO_VMENTRY_SIZE defined for > all archs?) I cannot speak to that. > To avoid code duplication I changed the code of kinfo sysctl handlers > to output kinfo data to sbuf instead of calling SYSCTL_OUT directly so > these functions could be used by both sysctl handlers and the coredump > routine. Ok. > Another thing I am not sure about is writing procstat data in the > coredump routine. The coredump routine on the first pass calculates > core sizes and on the second pass does actual writing. I added > procstat in that way that procstat data is collected on the first run > to internal buffers and on the second pass is copied from the buffers > to the core. I could do this another way, e.g running kern_proc_*out() > twice, on the fist pass with tiny buffer and a drain routine that > would calculate data length, but this looks less efficient, > complicates things and currently I am not sure that procstat sizes are > not changeable between the passes. The issue with my approach I see is > that after the first pass, before actually doing allocations we check > corefile limits. I do allocations before checking the limit, so > potentially using some kernel space when it might not be allowed by > limits. Is this a problem I should worry and try another approach? The process should be stopped by the time we dump a core, so running it multiple times should be ok in that the sizes should not change. I would say that you should try to implement a "determine sizes" pass that doesn't allocate anything, but others should comment on that. > I do nothing yet for freebsd32 compat case. I suppose I should check > if the dumping process is running 32 binary and store procstat data in > freebsd32 format? Hmmmmm, that would allow you to use a 32-bit procstat to examine it, but then you'd also need to extend libprocstat to support 32-bit structures so that the native 64-bit procstat can still examine the structures. Also, this information is nice to have for Linux binaries as well. I'm not sure it is worth doing this since usually you want to examine a core dump on the machine that generated the core (that has all the same libraries, etc.). > As I said there is much to do in userland, and I would like to > concentrate on the kernel part and the format first, but I would be > glad if somebody helps me with the problem I faced linking libelf to > libprocstat. With the modifications to the libprocstat makefile: > > -DPADD= ${LIBKVM} ${LIBUTIL} > -LDADD= -lkvm -lutil > +DPADD= ${LIBELF} ${LIBKVM} ${LIBUTIL} > +LDADD= -lelf -lkvm -lutil > > buildworld fails with the error: > > make: don't know how to make /scratch2/tmp/trociny/obj/i386.i386/home/trociny/svn/base/head/tmp/usr/lib/libelf.a. Stop This I have no idea on. One other thing to consider is if gcore needs to be updated to output these records as well. It could use the sysctls to fetch the data and then write out appropriate notes I think, so perhaps it wouldn't be too difficult to add this as a followup commit once the kernel version has settled and the file format is set? -- John Baldwin