Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Jul 2015 16:01:48 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Jan Mikkelsen <janm@transactionware.com>
Cc:        FreeBSD Stable Mailing List <freebsd-stable@freebsd.org>, Karl Denninger <karl@denninger.net>
Subject:   Re: amd64 kernel dynamic linking allows extern references to statics
Message-ID:  <20150716130148.GM2404@kib.kiev.ua>
In-Reply-To: <C3196483-BFE4-4F65-A685-0FCF23693391@transactionware.com>
References:  <13C52D9D-E1E6-4F83-A881-4E867C336B31@transactionware.com> <20150715132744.GD2404@kib.kiev.ua> <C3196483-BFE4-4F65-A685-0FCF23693391@transactionware.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 16, 2015 at 09:18:15AM +1000, Jan Mikkelsen wrote:
> 
> > On 15 Jul 2015, at 11:27 pm, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > 
> > On Wed, Jul 15, 2015 at 06:17:20PM +1000, Jan Mikkelsen wrote:
> >> Hi,
> >> 
> >> (All on 10.2-BETA1.)
> >> 
> >> I noticed that the latest patch in the bug https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594>; works on amd64 but fails to load zfs.ko on i386 with a symbol not found error.
> >> 
> >> Looking at the patch, there is one file that has ???extern int zio_use_uma??? to reference a variable that is declared elsewhere as ???static int zio_use_uma???. To me this obviously should not work. However it does work on amd64 but fails on i386.
> >> 
> >> Below is a small test case that reproduces the problem. The generated kernel module loads on amd64 but fails on i386. On amd64 one compilation unit is accessing a static in from another compilation unit by declaring the variable ???extern???. 
> >> 
> >> I haven???t looked further to attempt to find the bug. However, it looks like a Bad Thing??? to me.
> >> 
> > 
> > I am not sure that this is fixable.  Issue is that amd64 modules are
> > relinked object files, and they might have unresolved relocations against
> > local symbols.  Change like the following probably fix your test case,
> > but also quite possible would break legitimate local references.
> > 
> > diff --git a/sys/kern/link_elf_obj.c b/sys/kern/link_elf_obj.c
> > index 021381d..6fa5276 100644
> > --- a/sys/kern/link_elf_obj.c
> > +++ b/sys/kern/link_elf_obj.c
> > @@ -1096,7 +1096,8 @@ link_elf_lookup_symbol(linker_file_t lf, const char *name, c_linker_sym_t *sym)
> > 
> > 	for (i = 0, symp = ef->ddbsymtab; i < ef->ddbsymcnt; i++, symp++) {
> > 		strp = ef->ddbstrtab + symp->st_name;
> > -		if (symp->st_shndx != SHN_UNDEF && strcmp(name, strp) == 0) {
> > +		if (symp->st_shndx != SHN_UNDEF && strcmp(name, strp) == 0 &&
> > +		    ELF_ST_BIND(symp->st_info) != STB_LOCAL) {
> > 			*sym = (c_linker_sym_t) symp;
> > 			return 0;
> > 		}
> 
> I don???t know why there could be an unresolved relocation against a local symbol. My (admittedly trivial) tests with nm and static functions/variables have not led to a ???U??? record. Under what circumstances would this happen?

Assembler cannot know the final location for any non-local data, so
any variable, global or static, requires a relocation in the object file
for the final code to work properly.  Look at the most trivial example

static int a;
int
f(int x)
{
	return (a + x);
}
int *g(void)
{
	return (&a);
}
The g() function is needed for prevent too smart gcc optimizer from
noting that nothing could modify a and replace its uses with the literal
0.

The generated object file, after disassembly, is
0000000000000000 <f>:
   0:   89 f8                   mov    %edi,%eax
   2:   03 05 00 00 00 00       add    0x0(%rip),%eax        # 8 <f+0x8>
                        4: R_X86_64_PC32        .bss-0x4
   8:   c3                      retq   
The relocation is X86_64_PC32, which takes a symbol as displacement.
In this case, compiler decided to use the segment base for the symbol,
but it is allowed to use the local symbol directly.

Only linker during the final link (not the incremental -r link) can fill
the word which provides displacement for the rip-relative addressing to
fetch a value.

> 
> Separately to this is that this behaviour also defeats multiple definition checking at link time. Modifying my test slightly to have one compilation unit with ???static int testvar???, one compilation unit with ???int testvar??? and one with ???extern int testvar??? gives nm output like this from the .ko file:
> 
> 00000000000000d0 d testvar
> 00000000000000c0 d testvar
> 
> It is unclear which instance of testvar the ???extern??? declaration used for resolution. Simple testing seems to show it was the one with the non-static storage class. However, a piece of code depending on the previous resolution to local names could break by the addition a file with a name clash. Also, not no ???U??? record ??? the ???extern int testvar??? declaration has been resolved at this point.
> 
> Making both definitions static gives this:
> 
>                  U testvar
> 00000000000000c0 d testvar
> 00000000000000d0 d testvar
> 
> Which testvar will I get at load time? Who knows? Adding a file with a static variable with a name clash in another compilation unit can change the behaviour of a module.
> 
> The big question for me is: under which legitimate case is this feature necessary? The downsides seem significant.
> 

Well, might be that adding a check for multiple definitions of the same
symbol into the link_elf_obj.c is reasonable, I do not have very strong
opinion against it. But also, I do not see a way to do it in non-O(n^2)
time, which is probably not the best thing to put into kernel.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150716130148.GM2404>