Date: Fri, 28 Dec 2018 21:19:09 +0100 From: "Kristof Provost" <kp@FreeBSD.org> To: "Mark Saad" <nonesuch@longcount.org> Cc: "FreeBSD Hackers" <freebsd-hackers@freebsd.org> Subject: Re: libxo question Message-ID: <4ADD32A9-22D6-4983-BEC2-B3881EB59C81@FreeBSD.org> In-Reply-To: <CAMXt9NZOa0CRdAB2jERK7iP3VXB367g0Y0oYaL6q893RBr3aTw@mail.gmail.com> References: <CAMXt9NZOa0CRdAB2jERK7iP3VXB367g0Y0oYaL6q893RBr3aTw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 28 Dec 2018, at 20:31, Mark Saad wrote: > All > I am playing around with procstat and libxo on 12-STABLE from > yesterday . I wanted to get a list of thread_id's for some processes. > I wrote a quick python script to grab the data but xml output is not > well formed. Here is my sample script , which should work on python > 2.7 > > ----8<----------------------- > 1 import subprocess as sp > 2 import os,sys > 3 import pprint as pp > 4 import xml.etree.cElementTree as ET > 5 > 6 > 7 FNULL = open(os.devnull, 'w') > 8 cmd = "procstat --libxo xml -ta" > 9 p = sp.Popen(cmd, shell=True, stdout=sp.PIPE,stderr=FNULL, > executable="/bin/sh") > 10 text , err = p.communicate() > 11 > 12 root = ET.fromstring(text) > 13 > 14 pp.pprint(root) > 15 > 16 sys.exit(1) > ------------>8----------------------- > > I am constantly getting this odd issue about the xml being not well > formatted > > Traceback (most recent call last): > File "/tmp/test.py", line 12, in <module> > root = ET.fromstring(text) > File "<string>", line 124, in XML > cElementTree.ParseError: not well-formed (invalid token): line 1, > column 32 > > Attached is a copy of the xml. Any guidance would be helpful. > The attachment seems to have been eaten by a grue, but I can trivially reproduce the problem. Passing the output of `procstat --libxo xml -ta` to xmllint gives us: -:1: parser error : StartTag: invalid element name <procstat version="1"><threads><0><process_id>0</process_id><command>kernel</com The libxo code doesn’t quite cope with some of the subtle differences between JSON and XML. In this case, that XML tag names must start with a letter or an underscore. They may contain numbers, but may not start with them. I’ve used the following very quick&dirty patch to make xmllint happy: diff --git a/usr.bin/procstat/procstat.c b/usr.bin/procstat/procstat.c index 0269d3c5a5f..5c042322e83 100644 --- a/usr.bin/procstat/procstat.c +++ b/usr.bin/procstat/procstat.c @@ -152,7 +152,7 @@ procstat(const struct procstat_cmd *cmd, struct procstat *prstat, { char *pidstr = NULL; - asprintf(&pidstr, "%d", kipp->ki_pid); + asprintf(&pidstr, "pid_%d", kipp->ki_pid); if (pidstr == NULL) xo_errc(1, ENOMEM, "Failed to allocate memory in procstat()"); xo_open_container(pidstr); diff --git a/usr.bin/procstat/procstat_rusage.c b/usr.bin/procstat/procstat_rusage.c index 3d8c76370c0..f9caef49a2f 100644 --- a/usr.bin/procstat/procstat_rusage.c +++ b/usr.bin/procstat/procstat_rusage.c @@ -126,7 +126,7 @@ print_rusage(struct kinfo_proc *kipp) format_time(&kipp->ki_rusage.ru_stime)); if ((procstat_opts & PS_OPT_PERTHREAD) != 0) { - asprintf(&threadid, "%d", kipp->ki_tid); + asprintf(&threadid, "ID_%d", kipp->ki_tid); if (threadid == NULL) xo_errc(1, ENOMEM, "Failed to allocate memory in print_rusage()"); diff --git a/usr.bin/procstat/procstat_sigs.c b/usr.bin/procstat/procstat_sigs.c index 984d5d57f95..ceb36ca0dcb 100644 --- a/usr.bin/procstat/procstat_sigs.c +++ b/usr.bin/procstat/procstat_sigs.c @@ -155,7 +155,7 @@ procstat_threads_sigs(struct procstat *procstat, struct kinfo_proc *kipp) kinfo_proc_sort(kip, count); for (i = 0; i < count; i++) { kipp = &kip[i]; - asprintf(&threadid, "%d", kipp->ki_tid); + asprintf(&threadid, "ID_%d", kipp->ki_tid); if (threadid == NULL) xo_errc(1, ENOMEM, "Failed to allocate memory in " "procstat_threads_sigs()"); diff --git a/usr.bin/procstat/procstat_threads.c b/usr.bin/procstat/procstat_threads.c index c62bb516175..17f11044021 100644 --- a/usr.bin/procstat/procstat_threads.c +++ b/usr.bin/procstat/procstat_threads.c @@ -66,7 +66,7 @@ procstat_threads(struct procstat *procstat, struct kinfo_proc *kipp) kinfo_proc_sort(kip, count); for (i = 0; i < count; i++) { kipp = &kip[i]; - asprintf(&threadid, "%d", kipp->ki_tid); + asprintf(&threadid, "ID_%d", kipp->ki_tid); if (threadid == NULL) xo_errc(1, ENOMEM, "Failed to allocate memory in " "procstat_threads()"); It’s probably not the prettiest XML, and I’m not sure how useful the tags are now, but arguably tags with dynamic names are a bad idea anyway. I think you wouldn’t see this problem with JSON, so perhaps that’s a workaround you can consider as well. Regards, Kristof From owner-freebsd-hackers@freebsd.org Fri Dec 28 22:12:57 2018 Return-Path: <owner-freebsd-hackers@freebsd.org> Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B4D4C1424697 for <freebsd-hackers@mailman.ysv.freebsd.org>; Fri, 28 Dec 2018 22:12:57 +0000 (UTC) (envelope-from nonesuch@longcount.org) Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 527606DFB8 for <freebsd-hackers@freebsd.org>; Fri, 28 Dec 2018 22:12:56 +0000 (UTC) (envelope-from nonesuch@longcount.org) Received: by mail-ed1-x52d.google.com with SMTP id g22so18480402edr.7 for <freebsd-hackers@freebsd.org>; Fri, 28 Dec 2018 14:12:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=longcount-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Bw/sfl9IM054107QKapveTrB/x6hbbToWPXYeU4RzH0=; b=GJlFS8J6umN7t3ntSpqFy4531AMNde/x//l5ttfz8hw+ICW2YZlA3ysqyQeMskDkYD HebFG77B2EOWp1INzyyyBv1Sl0+hNBi6O7qb7RHCaPoUWT7ROtAuIbUx1NS4LmFEefw8 a+p28Hfy52Mm+l2xt7iFVht6S8jyLyVV7fcsUvkkf09Uw8pGbyI1Sy1rPHZ/fwMXtxJF JgcQjxfs1aRLeE31bNLn2pNlG64yMGhpxan7RvFhgueEx3Zj6rhKvMjg7oCXA6NWp7Is Kww5K1pNOBE6MUPr8+fpojUOcomjcV7wAYss2OSOLv5eW0AvlGEpstUWUXO/ALtpO4ZJ B3mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Bw/sfl9IM054107QKapveTrB/x6hbbToWPXYeU4RzH0=; b=LYYXk57TwRKQG2OnVj5+hju13Ww/sO5CVYK1cIa0DsuO4G97cO2a8ugVwnn9zFF3ev uGz5h5qjyDe4l208BQAeuGtPTVBxlE3yIJD0LARoKjIyv2KO7IU2Slc+3o7QI+tt2Qug DonSwtM24NI2zoNZHolhj/GWn/OWG+0/hAwkh4qtuNiKoE0rbNNzg79coW+/4/YHVLOo C7jo+RB2nJhWxrcILaekXax2kIJ7JWCxXyY/dj9MQQ9AShTtkXcFvfPALPNs1lMBC9LW d6c9paBu11FDgVtJKsobb7kZH7aKHN8ZbWO3MC/kkm1KThVSh409Wbx+Q3J62E3DnOY8 QP/Q== X-Gm-Message-State: AA+aEWb/KpqAJnnBvC9uc2Ylw5/8YRSsKY8N0yk56P/H0YPsnmJcpXlR +wgIbjan8CieWBgK1FjuxUkeqHqN0WfgfQkIu1J2OMAn354= X-Google-Smtp-Source: AFSGD/Va6wyfdUXFVbnaYxZ3v3Ek9mdh2gsJq5eKHsCJ0dMHsCbd2QSllTefhKLd1ll+QNE5xNtcVhifatwlybqD8Dg= X-Received: by 2002:a50:9063:: with SMTP id z32mr23953321edz.133.1546035174810; Fri, 28 Dec 2018 14:12:54 -0800 (PST) MIME-Version: 1.0 References: <CAMXt9NZOa0CRdAB2jERK7iP3VXB367g0Y0oYaL6q893RBr3aTw@mail.gmail.com> <201812282040.wBSKeWPL023999@elf.torek.net> In-Reply-To: <201812282040.wBSKeWPL023999@elf.torek.net> From: Mark Saad <nonesuch@longcount.org> Date: Fri, 28 Dec 2018 17:12:43 -0500 Message-ID: <CAMXt9Na=+HvdUPsmHMWC5W=FTxNPj0y7_D28PSC3a=Q9KXrkSQ@mail.gmail.com> Subject: Re: libxo question To: Chris Torek <torek@elf.torek.net> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 527606DFB8 X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=longcount-org.20150623.gappssmtp.com header.s=20150623 header.b=GJlFS8J6 X-Spamd-Result: default: False [-5.71 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[longcount-org.20150623.gappssmtp.com:s=20150623]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[longcount.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: alt1.aspmx.l.google.com]; DKIM_TRACE(0.00)[longcount-org.20150623.gappssmtp.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[d.2.5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; R_SPF_NA(0.00)[]; NEURAL_HAM_SHORT(-0.93)[-0.934,0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; IP_SCORE(-2.46)[ip: (-8.96), ipnet: 2a00:1450::/32(-1.76), asn: 15169(-1.52), country: US(-0.08)] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD <freebsd-hackers.freebsd.org> List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/> List-Post: <mailto:freebsd-hackers@freebsd.org> List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help> List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe> X-List-Received-Date: Fri, 28 Dec 2018 22:12:57 -0000 On Fri, Dec 28, 2018 at 3:40 PM Chris Torek <torek@elf.torek.net> wrote: > > >Attached is a copy of the xml. Any guidance would be helpful. > > Your attachment was stripped before it got here, but the problem > is clear enough. Procstat / libxo is generating invalid XML. > > Here's a bit of sample "procstat --libxo xml" output, which > I generated locally by running > > procstat --libxo xml -ta > > and hand massaging the result: > > <procstat version="1"> > <threads> > <0> > <process_id>0</process_id> > <command>kernel</command> > <threads> > <100000> > <thread_id>100000</thread_id> > <thread_name>swapper</thread_name> > <cpu>-1</cpu> > [snip] > > Valid XML tags must begin with an alphabetic character or an > underscore (see https://www.w3schools.com/xml/xml_elements.asp), > and neither <0> nor <100000> do so. > > A quick workaround is to use json instead. However, libxo > probably should "work smarter" with tags. > > (XML is a terrible data-encoding language because of all of its > special rules. If you think you've found them all, watch out for > CDATA! JSON is better but still has some issues with encoding, > requiring that arbitrary binary data be atob or base64 encoded or > similar.) > > Chris I updated the patch form kb to work on 12 https://mirrors.nycbug.org/pub/patches/procstat-libxo-12-STABLE.patch Here is the xml output as well https://mirrors.nycbug.org/pub/patches/procstat.xml This works better then before and python's xml parser, mozilla and edge think its valid xml. I think this should be fixed what should we do to make it happen ? -- mark saad | nonesuch@longcount.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4ADD32A9-22D6-4983-BEC2-B3881EB59C81>