Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Aug 2004 17:38:49 -0400
From:      "Dan Langille" <dan@langille.org>
To:        "Jacques A. Vidrine" <nectar@FreeBSD.org>
Cc:        freebsd-vuxml@freebsd.org
Subject:   Re: database tables for VuXML
Message-ID:  <412B7D29.31547.1472F80@localhost>
In-Reply-To: <20040822192915.GA17478@madman.celabo.org>
References:  <41279E59.13631.76AD2AB9@localhost>

next in thread | previous in thread | raw e-mail | index | archive | help
On 22 Aug 2004 at 14:29, Jacques A. Vidrine wrote:

> [Added freebsd-vuxml@]
> 
> On Sat, Aug 21, 2004 at 07:11:21PM -0400, Dan Langille wrote:
> > Hi folks,
> >
> > I'm in the early stage of incorporating VuXML information into
> > FreshPorts.
> 
> Alright!  That'll be cool.
> 
> > I have a preliminary database table design which appears
> > below.  If you think I've missed any essential information out of
> > this design, please let me know.
> >
> > First: has anyone already worked on creating database tables for the
> > VuXML data?
> 
> I have an old prototype application which uses SQLite.  FWIW, here is
> the schema I used then. (the formatting is a little funny because this
> was embedded in some other code)
> 
>         CREATE TABLE vuxml (
>               vid          VARCHAR PRIMARY KEY UNIQUE
>            ,  topic        VARCHAR
>            ,  description  VARCHAR
>            ,  discovery    DATE
>            ,  entry        DATE
>            ,  modified     DATE
> 	   -- computed max(entry, modified) for convenience in
> 	   -- other expressions
>            ,  last         DATE
>            );

This is pretty much identical to what I have.

> 
>         CREATE TABLE refs (
>              vid           VARCHAR
>            , type          VARCHAR
>            , text          VARCHAR
>            );

Ahh, yes, that repeating group.  I'm not storing that.  I should be

>         CREATE TABLE affected (
>               vid          VARCHAR
>            ,  type         VARCHAR
>            );

What is this type field?  I didn't follow what the affected table 
contains.

>         CREATE TABLE names (
>               affected     INTEGER
>            ,  name         VARCHAR
>            );

>From your example below, the names table would have entries for mutt 
and ja-mutt.  I think I'll do that as well.  See the end of my 
message for how I plan to relate VuXML entries to specific ports.

>         CREATE TABLE ranges (
>               affected     INTEGER
>            ,  lowop        VARCHAR
>            ,  low          VARCHAR
>            ,  glow0        INTEGER
>            ,  glow1        INTEGER
>            ,  glow2        INTEGER
>            ,  glow3        INTEGER
>            ,  highop       VARCHAR
>            ,  high         VARCHAR
>            ,  ghigh0       INTEGER
>            ,  ghigh1       INTEGER
>            ,  ghigh2       INTEGER
>            ,  ghigh3       INTEGER
>            );
> 
> The `ranges' table perhaps looks erm interesting.  In this case,
> I encoded each package version number into four 32-bit integers.
> This allowed one to lookup an affected package completely in SQL.
> However, the encoding I used may now be incorrect in some ways, since
> pkg_install et al were recently changed to treat version numbers
> differently.  

FreshPorts stores PORTVERSION and PORTREVISION as separate text 
fields.  I take it that the ranges equations should compare only 
PORTVERSION and ignore PORTREVISION.

> Also, SQLite could not really optimize the complex SQL
> statement that resulted.  So I think it is interesting for further
> study, but I doubt I'd do it that way again unless I had a very
> particular need to use pure SQL or the database was huge.

SQLite is a great little database.  I like what I've seen of it 
(mostly via my work on Bacula).  I'll be using PostgreSQL for this.

> 
> > create table vuxml
> > (
> >     id              serial                not null,
> > -- internal FreshPorts ID
> >     vid             text                  not null,
> > -- the vuln vid. e.g. c4b025bb-f05d-11d8-9837-000c41e2cdad
> >     topic           text                  not null,
> >     description     text                  not null,
> >     date_discovery  date                          ,
> >     date_entry      date                          ,
> >     date_modified   date                          ,
> >     status          date                  not null,
> >     primary key (id)
> > );
> > 
> > create table vuxml_port_xref
> > (
> >     id                    serial                not null,
> >     vuxml_id              integer               not null,
> >     port_id               integer               not null,
> >     primary key (id)
> > );
> > 
> > create table vuxml_port_xref_range
> > (
> >     id                    serial                not null,
> >     vuxml_port_xref_id    text                          ,
> >     range_operator_start  text                          ,
> >     range_operator_end    text                          ,
> >     range_version_start   text                          ,
> >     range_version_end     text                          ,
> >     primary key (id)
> > );
> >
> > If you consider the output from: vxquery -t vuxml ~/VuXML/vuln.xml
> > tnftpd, the entry in vuxml_port_xref_range for tnftpd might be:
> >
> > (1, 'lt', 20040810, NULL, NULL)
> 
> Seem reasonable.  For what its worth, storing the range operators as
> `>', `=>', etc lets you get something human readable as the output of a
> SELECT, e.g.
> 
>    SELECT range_version_start, range_operator_start, port_name,
>           range_operator_end, range_version_end FROM ... WHERE ...;
> 
>       ('20040810', '<', 'tnftpd', '', '')
> 
> I found that handy for manual inspection of the database.

Whatever I store,  need to be able to run queries upon it.  I was 
thinking of keeping the existing values as that might be easier with 
perl.  Not sure yetp.

> You may notice from my example that I used separate `name', `ranges',
> and `affected' tables.  I found this easier when filling the database,
> since the <package>/<system> elements are kind of a cross-product
> operator.  For example, when processing something like
> 
>       <package>
> 	<name>mutt</name>
> 	<name>ja-mutt</name>
> 	<range><ge>1.4</ge><lt>1.4.2</lt></range>
>       </package>

This is what I do with my vuxml_port_xref table.  That table relates 
a vid to a port.

> I first created an `affected' entry to contain the other references.
> Then, as I hit each `name' I just associated it with the `affected'
> entry, and as I hit each `range' I did the same.  (Otherwise one
> needs to construct a few lists, and then make all entries once
> the </package> end tag is seen.)  I think this is also easier to
> modify if we get new children of <package>, e.g. the <category> and
> <architecture> tags that are currently in the VuXML pre-1.2 DTD.

My plan is to populate empty the vuxml_* tables each time there is a 
commit to the data file.  This keeps the vuxml system totally 
separate from FreshPorts.

Each row under Commit History (e.g. 
http://beta.freshports.org/sysutils/bacula/) relates to a row from 
the commit_log_ports table.  That looks something like this:

create table commit_log_ports
(
    commit_log_id      integer               not null,
    port_id            integer               not null,
    needs_refresh      smallint              not null,
    port_version       text                          ,
    port_revision      text                          ,
    primary key (commit_log_id, port_id)
);

I'll probably create another table commit_log_ports_vuxml:

create table commit_log_ports
(
    commit_log_id      integer               not null,
    port_id            integer               not null,
    vuxml_id         integer     not null                 ,
    primary key (commit_log_id, port_id)
);

Looking at the current data, there's about 140 affected ports, but I 
haven't broken that with respect to ranges, which is what the above 
will do.

I'll either do that, or add the vuxml_id column to the 
commit_log_ports table, but I'd rather keep it separate.  Time will 
tell.

-- 
Dan Langille : http://www.langille.org/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?412B7D29.31547.1472F80>