Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Dec 2020 02:31:36 +0100
From:      Peter <pmc@citylink.dinoex.sub.org>
To:        Kristof Provost <kp@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Panic: 12.2 fails to use VIMAGE jails
Message-ID:  <X9Ao%2BBKDXADds36A@gate.oper.dinoex.org>
In-Reply-To: <1AAE98C9-ADF9-4869-B863-601542CEBB67@FreeBSD.org>
References:  <20201207125451.GA11406@gate.oper.dinoex.org> <39DBEA53-960F-4D70-86D7-847E6DFA437D@FreeBSD.org> <20201207233449.GA11025@gate.oper.dinoex.org> <DDDE7802-1C8C-4EB7-AA0C-DFCD7E5D2BAB@FreeBSD.org> <X8/Kr0td1cxI%2BP%2BV@gate.oper.dinoex.org> <1AAE98C9-ADF9-4869-B863-601542CEBB67@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote:

! > Sorry for the bad news.
! >=20
! You appear to be triggering two or three different bugs there.

That is possible. Then there are two or three different bugs in the
production code.

In any case, my current workaround, i.e. delaying in the exec.poststop

>         exec.poststop =3D "
>            sleep 6 ;
>            /usr/sbin/ngctl shutdown ${ifname1l}: ;
>        ";

helps for it all and makes the system behave solid. This is true
with and without Your patch.

! Can you reduce your netgraph use case to a small test case that can trigg=
er
! the problem?

I'm sorry, I fear I don't get Your point.
Assumed there are actually two or three bugs here, You are asking me
to reduce config so that it will trigger only one of them? Is that
correct?

Then let me put this different: assuming this is the OS for the life
support system of the manned Jupiter mission. Then, which one of the
bugs do You want to get fixed, and which would You prefer to keep and
make Your oxygen supply cut off?

https://www.youtube.com/watch?v=3DBEo2g-w545A

! I=E2=80=99m not likely to be able to do anything unless I can reproduce
! the problem(s).

I understand that.
=46rom Your former mail I get the impression that you prefer to rely
on tests. I consider this a bad habit[1] and prefer logical thinking.

So lets try that:
We know that there is a problem with taking down an interface from a
VIMAGE, in the way it is done by "jail -r". We know this problem can
be solidly workarounded by delaying the interface takedown for a short
time.

Now with Your patch, we do not get the typical crash at interface
takedown. Instead, all of a sudden, there are strange crashes from
various other places. And, interestingly, we get these also when
STARTING a jail.

I think this is not an additional problem, it is instead a valuable
information (albeit not the one You might like to get).

Furthermore, we get these new crashes always invoked by "ifconfig",
and they seem to have in common that somebody tries to obtain
information about some interface configuration and receives some
bogus. I might conclude, just out of the belly without looking into
details, that either
 - your patch achieves to garble some internal interface data,
   instead of what it is intended to do, or
 - the original problem manages to garble internal interface data
   (leading to the usual crash), and Your patch does not achieve to
   solve this, but only protects from the immediate consequence.

It might also be worth consideration, that, while the problem may be
more easy to reproduce with epair, this effect may or may not be a
netgraph specific one[2].

Now lets keep in mind that a successful test means EXACTLY NOTHING.
By which other means can we confirm that Your patch fully achieves
what it is intended for? (E.g. something like dumping and verifying
the respective internal tables in-vivo)

(Background: It is not that I would be unwilling to create clean and
precisely reproducible scenarious, But, one of my problems is
currently, I only have two machines availabe: the graphical one where
I'm just typing, and the backend server with the jails that does
practically everything.
Therefore, experimenting on any of them creates considerable pain.
I'm working on that issue, trying to get a real server board for the
backend so to get the current one free for testing - but what I would
like to use, e.g. ASUS Z10PE+cores+regECC, is not something one would
easily find on yardsales - and seldom for an acceptable price.)


cheerio,
PMc

[1] Rationale: a failing test tells us that either the test or the
    application has a bug (50/50 chance). A succeeding test tells us
    that 1 equals 1, which we knew already before.
    In fact, tests tell us *nothing at all* about the state of our
    code, and specifically, 'successful' outcomes do NOT mean that
    things are all correct.
    The only true usefulness of tests is to protect against
    re-introducing a fault that was already fixed before,
    i.e. regressions.

[2] My netgraph configuration consists of bringing up some bridges
    and then attaching the jails to them.

    Here is the bridge starter (only respective component,
    there are more of these populated, but probably not influencing
    the issue):
------------------------------------------------
#! /bin/sh

# PROVIDE: netgraphs
# REQUIRE: netwait
# BEFORE: NETWORKING

=2E /etc/rc.subr

name=3D"netgraphs"
start_cmd=3D"${name}_start"
stop_cmd=3D"${name}_stop"

load_rc_config $name

netgraphs_graphs=3D"svc"

netgraphs_svc_if1_name=3D"nge_svc_1u"
netgraphs_svc_if1_mac=3D"00:1d:92:01:02:01"
netgraphs_svc_if1_addr=3D"***.***.***.***/29"

netgraphs_svc_start()
{
    local _ifname
    if ngctl info svcswitch: > /dev/null 2>&1; then
        netgraphs_svc_stop
    fi
   =20
    echo "Creating SVC Switch"
    ngctl -f - <<EOF
        mkpeer bridge crhook link16
        name .:crhook svcswitch
        mkpeer svcswitch: eiface link0 ether
        name svcswitch:link0 $netgraphs_svc_if1_name
EOF
    _ifname=3D`ngctl msg ${netgraphs_svc_if1_name}: getifname | \
                awk '$1 =3D=3D "Args:" { print substr($2, 2, length($2)-2)}=
'`
    ifconfig $_ifname name $netgraphs_svc_if1_name
    ifconfig $netgraphs_svc_if1_name link $netgraphs_svc_if1_mac
    ifconfig $netgraphs_svc_if1_name inet $netgraphs_svc_if1_addr
}
netgraphs_svc_stop()
{
    echo "Shutting down SVC switch"
    ngctl shutdown svcswitch:
    ngctl shutdown ${netgraphs_svc_if1_name}:
}

netgraphs_start()
{
    local _cmd
    for i in "$@"; do
        eval _cmd=3Dnetgraphs_${i}_start
        if type $_cmd > /dev/null 2>&1; then
            $_cmd
        else
            echo "netgraphs-start: object $i not found" >&2
        fi
    done
}

netgraphs_stop()
{
    local _cmd
    for i in "$@"; do
        eval _cmd=3Dnetgraphs_${i}_stop
        if type $_cmd > /dev/null 2>&1; then
            $_cmd
        else
            echo "netgraphs-stop: object $i not found" >&2
        fi
    done
}

netgraphs_tasks=3D""
if test $# -eq 1; then
    if test "$1" =3D "stop"; then
        for i in $netgraphs_graphs; do
            netgraphs_tasks=3D"$i $netgraphs_tasks"
        done
    else
        for i in $netgraphs_graphs; do
            netgraphs_tasks=3D"$netgraphs_tasks $i"
        done
    fi
fi
=20
run_rc_command "$@" "$netgraphs_tasks"
------------------------------------------------

    And here is the full jail config (only respective jail:
------------------------------------------------
allow.set_hostname =3D "false";
allow.mount.procfs =3D "false";
allow.mount.devfs =3D "false";
allow.raw_sockets =3D "false";
enforce_statfs =3D 1;
devfs_ruleset =3D 4;
securelevel =3D 2;

mount.devfs;
exec.start =3D "/bin/sh /etc/rc";
exec.stop =3D "/bin/sh /etc/rc.shutdown";
exec.consolelog =3D "/var/log/jail_${name}_console.log";

path =3D "/j/$name";
interface =3D "lo0";
ip4.saddrsel =3D "false";

rail {
        jid =3D 10;
        devfs_ruleset =3D 11;
        host.hostname =3D "rail.***********.org";
        vnet =3D "new";
        sysvshm;
        $ifname1l =3D nge_${name}_1l;
        $ifname1l_mac =3D 00:1d:92:01:01:0a;
        vnet.interface =3D "$ifname1l";
        exec.prestart =3D "
            echo -e \"mkpeer eiface crhook ether\nname .:crhook $ifname1l\"=
 \
                | /usr/sbin/ngctl -f -
            /usr/sbin/ngctl connect ${ifname1l}: svcswitch: ether link2
            ifname=3D`/usr/sbin/ngctl msg ${ifname1l}: getifname | \
                awk '$1 =3D=3D \"Args:\" { print substr($2, 2, length($2)-2=
)}'`
            /sbin/ifconfig \$ifname name $ifname1l
            /sbin/ifconfig $ifname1l link $ifname1l_mac
        ";
        exec.poststart =3D "
            /usr/sbin/jexec $name /sbin/sysctl kern.securelevel=3D3 ;
        ";
        exec.poststop =3D "
#            sleep 6 ;
            /usr/sbin/ngctl shutdown ${ifname1l}: ;
        ";
	exec.start =3D "/bin/sleep 4 &";=09
}
------------------------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?X9Ao%2BBKDXADds36A>