Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Dec 2018 21:19:09 +0100
From:      "Kristof Provost" <kp@FreeBSD.org>
To:        "Mark Saad" <nonesuch@longcount.org>
Cc:        "FreeBSD Hackers" <freebsd-hackers@freebsd.org>
Subject:   Re: libxo question
Message-ID:  <4ADD32A9-22D6-4983-BEC2-B3881EB59C81@FreeBSD.org>
In-Reply-To: <CAMXt9NZOa0CRdAB2jERK7iP3VXB367g0Y0oYaL6q893RBr3aTw@mail.gmail.com>
References:  <CAMXt9NZOa0CRdAB2jERK7iP3VXB367g0Y0oYaL6q893RBr3aTw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 28 Dec 2018, at 20:31, Mark Saad wrote:
> All
>   I am playing around with procstat and libxo on 12-STABLE from
> yesterday . I wanted to get a list of  thread_id's for some processes.
> I wrote a quick python script to grab the data but xml output is not
> well formed. Here is my sample script , which should work on python
> 2.7
>
> ----8<-----------------------
>   1 import subprocess as sp
>   2 import os,sys
>   3 import pprint as pp
>   4 import xml.etree.cElementTree as ET
>   5
>   6
>   7 FNULL = open(os.devnull, 'w')
>   8 cmd = "procstat --libxo xml -ta"
>   9 p = sp.Popen(cmd, shell=True, stdout=sp.PIPE,stderr=FNULL,
> executable="/bin/sh")
>  10 text , err = p.communicate()
>  11
>  12 root = ET.fromstring(text)
>  13
>  14 pp.pprint(root)
>  15
>  16 sys.exit(1)
> ------------>8-----------------------
>
> I am constantly getting this odd issue about the xml being not well 
> formatted
>
> Traceback (most recent call last):
>   File "/tmp/test.py", line 12, in <module>
>     root = ET.fromstring(text)
>   File "<string>", line 124, in XML
> cElementTree.ParseError: not well-formed (invalid token): line 1, 
> column 32
>
> Attached is a copy of the xml.   Any guidance would be helpful.
>
The attachment seems to have been eaten by a grue, but I can trivially 
reproduce the problem.
Passing the output of `procstat --libxo xml -ta` to xmllint gives us:

	-:1: parser error : StartTag: invalid element name
	<procstat 
version="1"><threads><0><process_id>0</process_id><command>kernel</com

The libxo code doesn’t quite cope with some of the subtle differences 
between JSON and XML. In this case, that XML tag names must start with a 
letter or an underscore. They may contain numbers, but may not start 
with them.

I’ve used the following very quick&dirty patch to make xmllint happy:

	diff --git a/usr.bin/procstat/procstat.c b/usr.bin/procstat/procstat.c
	index 0269d3c5a5f..5c042322e83 100644
	--- a/usr.bin/procstat/procstat.c
	+++ b/usr.bin/procstat/procstat.c
	@@ -152,7 +152,7 @@ procstat(const struct procstat_cmd *cmd, struct 
procstat *prstat,
	 {
	        char *pidstr = NULL;

	-       asprintf(&pidstr, "%d", kipp->ki_pid);
	+       asprintf(&pidstr, "pid_%d", kipp->ki_pid);
	        if (pidstr == NULL)
	                xo_errc(1, ENOMEM, "Failed to allocate memory in 
procstat()");
	        xo_open_container(pidstr);
	diff --git a/usr.bin/procstat/procstat_rusage.c 
b/usr.bin/procstat/procstat_rusage.c
	index 3d8c76370c0..f9caef49a2f 100644
	--- a/usr.bin/procstat/procstat_rusage.c
	+++ b/usr.bin/procstat/procstat_rusage.c
	@@ -126,7 +126,7 @@ print_rusage(struct kinfo_proc *kipp)
	            format_time(&kipp->ki_rusage.ru_stime));

	        if ((procstat_opts & PS_OPT_PERTHREAD) != 0) {
	-               asprintf(&threadid, "%d", kipp->ki_tid);
	+               asprintf(&threadid, "ID_%d", kipp->ki_tid);
	                if (threadid == NULL)
	                        xo_errc(1, ENOMEM,
	                            "Failed to allocate memory in 
print_rusage()");
	diff --git a/usr.bin/procstat/procstat_sigs.c 
b/usr.bin/procstat/procstat_sigs.c
	index 984d5d57f95..ceb36ca0dcb 100644
	--- a/usr.bin/procstat/procstat_sigs.c
	+++ b/usr.bin/procstat/procstat_sigs.c
	@@ -155,7 +155,7 @@ procstat_threads_sigs(struct procstat *procstat, 
struct kinfo_proc *kipp)
	        kinfo_proc_sort(kip, count);
	        for (i = 0; i < count; i++) {
	                kipp = &kip[i];
	-               asprintf(&threadid, "%d", kipp->ki_tid);
	+               asprintf(&threadid, "ID_%d", kipp->ki_tid);
	                if (threadid == NULL)
	                        xo_errc(1, ENOMEM, "Failed to allocate memory 
in "
	                            "procstat_threads_sigs()");
	diff --git a/usr.bin/procstat/procstat_threads.c 
b/usr.bin/procstat/procstat_threads.c
	index c62bb516175..17f11044021 100644
	--- a/usr.bin/procstat/procstat_threads.c
	+++ b/usr.bin/procstat/procstat_threads.c
	@@ -66,7 +66,7 @@ procstat_threads(struct procstat *procstat, struct 
kinfo_proc *kipp)
	        kinfo_proc_sort(kip, count);
	        for (i = 0; i < count; i++) {
	                kipp = &kip[i];
	-               asprintf(&threadid, "%d", kipp->ki_tid);
	+               asprintf(&threadid, "ID_%d", kipp->ki_tid);
	                if (threadid == NULL)
	                        xo_errc(1, ENOMEM, "Failed to allocate memory 
in "
	                            "procstat_threads()");

It’s probably not the prettiest XML, and I’m not sure how useful the 
tags are now, but arguably tags with dynamic names are a bad idea 
anyway.
I think you wouldn’t see this problem with JSON, so perhaps that’s a 
workaround you can consider as well.

Regards,
Kristof
From owner-freebsd-hackers@freebsd.org  Fri Dec 28 22:12:57 2018
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B4D4C1424697
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri, 28 Dec 2018 22:12:57 +0000 (UTC)
 (envelope-from nonesuch@longcount.org)
Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com
 [IPv6:2a00:1450:4864:20::52d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 527606DFB8
 for <freebsd-hackers@freebsd.org>; Fri, 28 Dec 2018 22:12:56 +0000 (UTC)
 (envelope-from nonesuch@longcount.org)
Received: by mail-ed1-x52d.google.com with SMTP id g22so18480402edr.7
 for <freebsd-hackers@freebsd.org>; Fri, 28 Dec 2018 14:12:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=longcount-org.20150623.gappssmtp.com; s=20150623;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=Bw/sfl9IM054107QKapveTrB/x6hbbToWPXYeU4RzH0=;
 b=GJlFS8J6umN7t3ntSpqFy4531AMNde/x//l5ttfz8hw+ICW2YZlA3ysqyQeMskDkYD
 HebFG77B2EOWp1INzyyyBv1Sl0+hNBi6O7qb7RHCaPoUWT7ROtAuIbUx1NS4LmFEefw8
 a+p28Hfy52Mm+l2xt7iFVht6S8jyLyVV7fcsUvkkf09Uw8pGbyI1Sy1rPHZ/fwMXtxJF
 JgcQjxfs1aRLeE31bNLn2pNlG64yMGhpxan7RvFhgueEx3Zj6rhKvMjg7oCXA6NWp7Is
 Kww5K1pNOBE6MUPr8+fpojUOcomjcV7wAYss2OSOLv5eW0AvlGEpstUWUXO/ALtpO4ZJ
 B3mA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=Bw/sfl9IM054107QKapveTrB/x6hbbToWPXYeU4RzH0=;
 b=LYYXk57TwRKQG2OnVj5+hju13Ww/sO5CVYK1cIa0DsuO4G97cO2a8ugVwnn9zFF3ev
 uGz5h5qjyDe4l208BQAeuGtPTVBxlE3yIJD0LARoKjIyv2KO7IU2Slc+3o7QI+tt2Qug
 DonSwtM24NI2zoNZHolhj/GWn/OWG+0/hAwkh4qtuNiKoE0rbNNzg79coW+/4/YHVLOo
 C7jo+RB2nJhWxrcILaekXax2kIJ7JWCxXyY/dj9MQQ9AShTtkXcFvfPALPNs1lMBC9LW
 d6c9paBu11FDgVtJKsobb7kZH7aKHN8ZbWO3MC/kkm1KThVSh409Wbx+Q3J62E3DnOY8
 QP/Q==
X-Gm-Message-State: AA+aEWb/KpqAJnnBvC9uc2Ylw5/8YRSsKY8N0yk56P/H0YPsnmJcpXlR
 +wgIbjan8CieWBgK1FjuxUkeqHqN0WfgfQkIu1J2OMAn354=
X-Google-Smtp-Source: AFSGD/Va6wyfdUXFVbnaYxZ3v3Ek9mdh2gsJq5eKHsCJ0dMHsCbd2QSllTefhKLd1ll+QNE5xNtcVhifatwlybqD8Dg=
X-Received: by 2002:a50:9063:: with SMTP id z32mr23953321edz.133.1546035174810; 
 Fri, 28 Dec 2018 14:12:54 -0800 (PST)
MIME-Version: 1.0
References: <CAMXt9NZOa0CRdAB2jERK7iP3VXB367g0Y0oYaL6q893RBr3aTw@mail.gmail.com>
 <201812282040.wBSKeWPL023999@elf.torek.net>
In-Reply-To: <201812282040.wBSKeWPL023999@elf.torek.net>
From: Mark Saad <nonesuch@longcount.org>
Date: Fri, 28 Dec 2018 17:12:43 -0500
Message-ID: <CAMXt9Na=+HvdUPsmHMWC5W=FTxNPj0y7_D28PSC3a=Q9KXrkSQ@mail.gmail.com>
Subject: Re: libxo question
To: Chris Torek <torek@elf.torek.net>
Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 527606DFB8
X-Spamd-Bar: -----
Authentication-Results: mx1.freebsd.org;
 dkim=pass header.d=longcount-org.20150623.gappssmtp.com header.s=20150623
 header.b=GJlFS8J6
X-Spamd-Result: default: False [-5.71 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 R_DKIM_ALLOW(-0.20)[longcount-org.20150623.gappssmtp.com:s=20150623];
 FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0];
 MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org];
 DMARC_NA(0.00)[longcount.org]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 TO_DN_ALL(0.00)[];
 MX_GOOD(-0.01)[cached: alt1.aspmx.l.google.com];
 DKIM_TRACE(0.00)[longcount-org.20150623.gappssmtp.com:+];
 RCPT_COUNT_TWO(0.00)[2];
 RCVD_IN_DNSWL_NONE(0.00)[d.2.5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org
 : 127.0.5.0]; R_SPF_NA(0.00)[];
 NEURAL_HAM_SHORT(-0.93)[-0.934,0]; FROM_EQ_ENVFROM(0.00)[];
 MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[];
 ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US];
 RCVD_COUNT_TWO(0.00)[2];
 IP_SCORE(-2.46)[ip: (-8.96), ipnet: 2a00:1450::/32(-1.76), asn: 15169(-1.52),
 country: US(-0.08)]
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>;
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Dec 2018 22:12:57 -0000

On Fri, Dec 28, 2018 at 3:40 PM Chris Torek <torek@elf.torek.net> wrote:
>
> >Attached is a copy of the xml.   Any guidance would be helpful.
>
> Your attachment was stripped before it got here, but the problem
> is clear enough.  Procstat / libxo is generating invalid XML.
>
> Here's a bit of sample "procstat --libxo xml" output, which
> I generated locally by running
>
>     procstat --libxo xml -ta
>
> and hand massaging the result:
>
>     <procstat version="1">
>         <threads>
>             <0>
>                 <process_id>0</process_id>
>                 <command>kernel</command>
>                 <threads>
>                     <100000>
>                         <thread_id>100000</thread_id>
>                         <thread_name>swapper</thread_name>
>                         <cpu>-1</cpu>
>      [snip]
>
> Valid XML tags must begin with an alphabetic character or an
> underscore (see https://www.w3schools.com/xml/xml_elements.asp),
> and neither <0> nor <100000> do so.
>
> A quick workaround is to use json instead.  However, libxo
> probably should "work smarter" with tags.
>
> (XML is a terrible data-encoding language because of all of its
> special rules.  If you think you've found them all, watch out for
> CDATA!  JSON is better but still has some issues with encoding,
> requiring that arbitrary binary data be atob or base64 encoded or
> similar.)
>
> Chris

I updated the patch form kb to work on 12
https://mirrors.nycbug.org/pub/patches/procstat-libxo-12-STABLE.patch

Here is the xml output as well
https://mirrors.nycbug.org/pub/patches/procstat.xml

This works better then before and  python's xml parser, mozilla and
edge think its valid xml.

I think this should be fixed what should we do to make it happen ?




-- 
mark saad | nonesuch@longcount.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4ADD32A9-22D6-4983-BEC2-B3881EB59C81>