Discussion:
[GOLD] Support --icf=safe with -pie for x86_64
(too old to reply)
Rahul Chaudhry via binutils
2017-01-12 21:28:49 UTC
Permalink
--icf=safe folds identical functions only if the functions do not have their
address taken. It uses relocation types to distinguish between a function call
and taking the address of a function. This works fine for all architectures
except x86_64, where the same relocation type (R_X86_64_PC32) may be used for
both function call and taking the address of a function (when compiled with
-fPIE). Currently Target_x86_64::do_can_check_for_function_pointers returns
false if the linker is invoked with -pie, and --icf=safe folds only ctors and
dtors.

This patch adds improved support for --icf=safe when used with -pie. For the
ambiguous case of R_X86_64_PC32 relocation, it checks the instruction opcode to
distinguish between function calls and taking the address of a function.

OK?

* x86_64.cc (Target_x86_64::do_can_check_for_function_pointers):
Return true even when building pie binaries.
(Target_x86_64::possible_function_pointer_reloc): Check opcode
for R_X86_64_PC32 relocations.
(Target_x86_64::local_reloc_may_be_function_pointer): Pass
extra arguments to local_reloc_may_be_function_pointer.
(Target_x86_64::global_reloc_may_be_function_pointer): Likewise.
* testsuite/Makefile.am (icf_safe_pie_test): New test case.
* testsuite/Makefile.in: Regenerate.
* testsuite/icf_safe_pie_test.sh: New shell script.
--
Rahul
Alan Modra
2017-01-13 01:23:25 UTC
Permalink
+ {
+ // This relocation may be used both for function calls and
+ // for taking address of a function. We distinguish between
+ // them by checking the opcodes.
+ section_size_type stype;
+ const unsigned char* view = src_obj->section_contents(src_indx,
+ &stype,
+ true);
+
+ // call
+ if (r_offset >= 1
+ && view[r_offset - 1] == 0xe8)
+ return false;
Is it safe to assume that 0xe8 is really the start of an instruction?

What if instead you are looking at a modrm or sib for a rip-relative read?
It may not match in this case (I'm rusty at x86 and would have to look
at an AMD or Intel manual to know) but your should check this and of
course for the other encodings below.

Also, might you have an R_X86_64_PC32 in data and so be looking at the
high byte of the previous word?
+
+ // jmp
+ if (r_offset >= 1
+ && view[r_offset - 1] == 0xe9)
+ return false;
+
+ // jo/jno/jb/jnb/je/jne/jna/ja/js/jns/jp/jnp/jl/jge/jle/jg
+ if (r_offset >= 2
+ && view[r_offset - 2] == 0x0f
+ && view[r_offset - 1] >= 0x80
+ && view[r_offset - 1] <= 0x8f)
+ return false;
+
+ // Be conservative and treat all others as function pointers.
+ return true;
+ }
}
return false;
}
--
Alan Modra
Australia Development Lab, IBM
Rahul Chaudhry via binutils
2017-01-13 20:10:55 UTC
Permalink
Post by Alan Modra
Is it safe to assume that 0xe8 is really the start of an instruction?
What if instead you are looking at a modrm or sib for a rip-relative read?
It may not match in this case (I'm rusty at x86 and would have to look
at an AMD or Intel manual to know) but your should check this and of
course for the other encodings below.
You're right, this may result in false positives with some instance getting
classified as a function call when it's not, which in turn might result in some
section getting folded when it shouldn't be.

Ideally, we'd want a way to go from a relocation offset to the beginning of the
instruction containing the relocation. Is there a good way to do that for
x86_64 (short of disassembling the whole section)?

The approach here is inspired by other places in the same file that seem to be
doing something similar (can_convert_mov_to_lea/can_convert_callq_to_direct).
Maybe I'm missing something and there's more context in those other functions
that makes it safe to do so.
Post by Alan Modra
Also, might you have an R_X86_64_PC32 in data and so be looking at the
high byte of the previous word?
Not sure what you mean here, but this code is only called for text sections, so
we can be sure that the relocation offset is part of an instruction. Did you
mean an instruction with an encoding containning opcode bytes, followed by
immediate operand, which is followed by a R_X86_64_PC32 relocation?
--
Rahul
Post by Alan Modra
+ {
+ // This relocation may be used both for function calls and
+ // for taking address of a function. We distinguish between
+ // them by checking the opcodes.
+ section_size_type stype;
+ const unsigned char* view = src_obj->section_contents(src_indx,
+ &stype,
+ true);
+
+ // call
+ if (r_offset >= 1
+ && view[r_offset - 1] == 0xe8)
+ return false;
Is it safe to assume that 0xe8 is really the start of an instruction?
What if instead you are looking at a modrm or sib for a rip-relative read?
It may not match in this case (I'm rusty at x86 and would have to look
at an AMD or Intel manual to know) but your should check this and of
course for the other encodings below.
Also, might you have an R_X86_64_PC32 in data and so be looking at the
high byte of the previous word?
+
+ // jmp
+ if (r_offset >= 1
+ && view[r_offset - 1] == 0xe9)
+ return false;
+
+ // jo/jno/jb/jnb/je/jne/jna/ja/js/jns/jp/jnp/jl/jge/jle/jg
+ if (r_offset >= 2
+ && view[r_offset - 2] == 0x0f
+ && view[r_offset - 1] >= 0x80
+ && view[r_offset - 1] <= 0x8f)
+ return false;
+
+ // Be conservative and treat all others as function pointers.
+ return true;
+ }
}
return false;
}
--
Alan Modra
Australia Development Lab, IBM
Alan Modra
2017-01-13 22:23:48 UTC
Permalink
Post by Rahul Chaudhry via binutils
Post by Alan Modra
Also, might you have an R_X86_64_PC32 in data and so be looking at the
high byte of the previous word?
Not sure what you mean here, but this code is only called for text sections, so
OK, if you know it is text then you're a lot safer, but even so..
Post by Rahul Chaudhry via binutils
we can be sure that the relocation offset is part of an instruction. Did you
mean an instruction with an encoding containning opcode bytes, followed by
immediate operand, which is followed by a R_X86_64_PC32 relocation?
.. what if you have a jump table of 32-bit relative offsets in text?
I think such a table would use R_X86_64_PC32 relocs if the
destinations were in other sections or other files. If your new code
happens to look at one of those relocs then one byte before r_offset
is not part of an instruction using an R_X86_64_PC32 reloc.
--
Alan Modra
Australia Development Lab, IBM
Rahul Chaudhry via binutils
2017-01-14 00:28:36 UTC
Permalink
Post by Alan Modra
Post by Rahul Chaudhry via binutils
Post by Alan Modra
Also, might you have an R_X86_64_PC32 in data and so be looking at the
high byte of the previous word?
Not sure what you mean here, but this code is only called for text sections, so
OK, if you know it is text then you're a lot safer, but even so..
Post by Rahul Chaudhry via binutils
we can be sure that the relocation offset is part of an instruction. Did you
mean an instruction with an encoding containning opcode bytes, followed by
immediate operand, which is followed by a R_X86_64_PC32 relocation?
.. what if you have a jump table of 32-bit relative offsets in text?
I think such a table would use R_X86_64_PC32 relocs if the
destinations were in other sections or other files. If your new code
happens to look at one of those relocs then one byte before r_offset
is not part of an instruction using an R_X86_64_PC32 reloc.
For a jump table containing relocations, it should be fine to treat all of them
as function calls/jumps, right?

If anything, the check for opcode 1-2 bytes before r_offset is likely to fail
and classify the relocation as address taken instead of a function call. This
is the reverse problem of above, it may result in a function section not
getting folded, that could have been folded safely.
--
Rahul
Alan Modra
2017-01-16 03:01:49 UTC
Permalink
Post by Rahul Chaudhry via binutils
Post by Alan Modra
Post by Rahul Chaudhry via binutils
Post by Alan Modra
Also, might you have an R_X86_64_PC32 in data and so be looking at the
high byte of the previous word?
Not sure what you mean here, but this code is only called for text sections, so
OK, if you know it is text then you're a lot safer, but even so..
Post by Rahul Chaudhry via binutils
we can be sure that the relocation offset is part of an instruction. Did you
mean an instruction with an encoding containning opcode bytes, followed by
immediate operand, which is followed by a R_X86_64_PC32 relocation?
.. what if you have a jump table of 32-bit relative offsets in text?
I think such a table would use R_X86_64_PC32 relocs if the
destinations were in other sections or other files. If your new code
happens to look at one of those relocs then one byte before r_offset
is not part of an instruction using an R_X86_64_PC32 reloc.
For a jump table containing relocations, it should be fine to treat all of them
as function calls/jumps, right?
I don't know. I could look at gold source and figure out the answer,
but it's really your job to convince Cary and others that what you're
doing is safe. ie. Prove to some level of confidence that comparing
the byte before r_offset of an R_X86_64_PC32 against 0xe8 really is a
valid test for a call. Note that I'm *not* saying your patch was
wrong. It may in fact be the case that all insns that could use an
R_X86_64_PC32 besides a call will have modrm or sib bytes different
to 0xe8. Ditto for the other opcodes you test. It might also be the
case that gcc won't put jump tables using R_X86_64_PC32 in code
sections. (What crazy assembly programmers do is another story.)

x86 code editing/analysis is difficult. If I had to do it, I think
I'd define a few more relocs for use in code so that you could find
the start of an instruction. You could even do that without bloating
objects with extra relocs. eg. instead of R_X86_64_32 in code you'd
use a series of relocs that all behave like R_X86_64_32 for relocation
but also say something about the insn opcode:
R_X86_64_32_I1 insn opcode one byte before r_offset
R_X86_64_32_I2 insn opcode two bytes before r_offset
etc.
--
Alan Modra
Australia Development Lab, IBM
Cary Coutant
2017-01-18 18:55:13 UTC
Permalink
Post by Rahul Chaudhry via binutils
Post by Alan Modra
Also, might you have an R_X86_64_PC32 in data and so be looking at the
high byte of the previous word?
Not sure what you mean here, but this code is only called for text sections, so
we can be sure that the relocation offset is part of an instruction. Did you
mean an instruction with an encoding containning opcode bytes, followed by
immediate operand, which is followed by a R_X86_64_PC32 relocation?
I looked (maybe not carefully enough), but I didn't find anything in
this call chain that restricts the check to text sections:

gc_process_relocs
-> scan.[global|local]_reloc_may_be_function_pointer
-> possible_function_pointer_reloc

You need to check the SHF_EXECINST flag before assuming you're looking
at an opcode. (Look in x86_64.cc for "is_executable".)
Post by Rahul Chaudhry via binutils
Post by Alan Modra
For a jump table containing relocations, it should be fine to treat all of them
as function calls/jumps, right?
I don't know. I could look at gold source and figure out the answer,
but it's really your job to convince Cary and others that what you're
doing is safe. ie. Prove to some level of confidence that comparing
the byte before r_offset of an R_X86_64_PC32 against 0xe8 really is a
valid test for a call. Note that I'm *not* saying your patch was
wrong. It may in fact be the case that all insns that could use an
R_X86_64_PC32 besides a call will have modrm or sib bytes different
to 0xe8. Ditto for the other opcodes you test. It might also be the
case that gcc won't put jump tables using R_X86_64_PC32 in code
sections. (What crazy assembly programmers do is another story.)
I think you should also be checking that the target symbol is
STT_FUNC; that should rule out most jump table cases. (I see a check
for != STT_OBJECT in the local symbol path, but nothing in the global
symbol path. It may be the case that STT_NOTYPE is used for some
extern function symbols, but you want to be conservative, right?)
Post by Rahul Chaudhry via binutils
x86 code editing/analysis is difficult. If I had to do it, I think
I'd define a few more relocs for use in code so that you could find
the start of an instruction. You could even do that without bloating
objects with extra relocs. eg. instead of R_X86_64_32 in code you'd
use a series of relocs that all behave like R_X86_64_32 for relocation
R_X86_64_32_I1 insn opcode one byte before r_offset
R_X86_64_32_I2 insn opcode two bytes before r_offset
etc.
Actually, I'd much prefer separate reloc types that just tells the
linker whether it's a call or a materialization of a function pointer.
In an ideal world, the linker should never have to load section
contents until it's applying relocations in pass 2. I really dislike
looking at opcodes. Everything we need to know in pass 1 should be
available by scanning the relocations.

-cary
Rahul Chaudhry via binutils
2017-01-21 00:53:26 UTC
Permalink
Hi Alan and Cary,

Thanks for your feedback. Some comments below.
Post by Cary Coutant
I looked (maybe not carefully enough), but I didn't find anything in
gc_process_relocs
-> scan.[global|local]_reloc_may_be_function_pointer
-> possible_function_pointer_reloc
You need to check the SHF_EXECINST flag before assuming you're looking
at an opcode. (Look in x86_64.cc for "is_executable".)
I'll take a closer look at this, add a check, and send an updated patch.
Post by Cary Coutant
I think you should also be checking that the target symbol is
STT_FUNC; that should rule out most jump table cases. (I see a check
for != STT_OBJECT in the local symbol path, but nothing in the global
symbol path. It may be the case that STT_NOTYPE is used for some
extern function symbols, but you want to be conservative, right?)
Ditto for this one.
Post by Cary Coutant
Post by Alan Modra
x86 code editing/analysis is difficult. If I had to do it, I think
I'd define a few more relocs for use in code so that you could find
the start of an instruction. You could even do that without bloating
objects with extra relocs. eg. instead of R_X86_64_32 in code you'd
use a series of relocs that all behave like R_X86_64_32 for relocation
R_X86_64_32_I1 insn opcode one byte before r_offset
R_X86_64_32_I2 insn opcode two bytes before r_offset
etc.
Actually, I'd much prefer separate reloc types that just tells the
linker whether it's a call or a materialization of a function pointer.
In an ideal world, the linker should never have to load section
contents until it's applying relocations in pass 2. I really dislike
looking at opcodes. Everything we need to know in pass 1 should be
available by scanning the relocations.
I agree completely. The need for this patch (and any related heuristics) comes
from the fact that the same relocation type is being used for both function
calls and taking the address of the function. The compiler already knows the
difference, and an ideal solution would be to simply use different relocation
types for different use cases.

What would it take to reach that solution? How involved is it to add a new
relocation type for something like this? I would guess that some coordination
is needed with gcc (and other compilers) for this to work. On the binutils
side, a new relocation type would need support in gas/objdump/ld/gold and maybe
other places as well. Can you point me to a recent example for adding a similar
new relocation type so I can get an idea of what it entails?

Thanks,
Rahul
Cary Coutant
2017-01-21 02:00:07 UTC
Permalink
Post by Rahul Chaudhry via binutils
Post by Cary Coutant
Actually, I'd much prefer separate reloc types that just tells the
linker whether it's a call or a materialization of a function pointer.
In an ideal world, the linker should never have to load section
contents until it's applying relocations in pass 2. I really dislike
looking at opcodes. Everything we need to know in pass 1 should be
available by scanning the relocations.
I agree completely. The need for this patch (and any related heuristics) comes
from the fact that the same relocation type is being used for both function
calls and taking the address of the function. The compiler already knows the
difference, and an ideal solution would be to simply use different relocation
types for different use cases.
What would it take to reach that solution? How involved is it to add a new
relocation type for something like this? I would guess that some coordination
is needed with gcc (and other compilers) for this to work. On the binutils
side, a new relocation type would need support in gas/objdump/ld/gold and maybe
other places as well. Can you point me to a recent example for adding a similar
new relocation type so I can get an idea of what it entails?
The x86 psABI is pretty well established, and changing it involves
overcoming a lot of inertia. It's not impossible, though -- there have
been some recent changes, and a few new relocations have been added.
I'd suggest starting with a proposal on the psABI mailing list at
x86-64-***@googlegroups.com. If you can recruit enough support, it's
possible that others would contribute the GCC and Gnu ld changes.

-cary
Rahul Chaudhry via binutils
2017-02-03 20:17:45 UTC
Permalink
Post by Rahul Chaudhry via binutils
Post by Cary Coutant
I looked (maybe not carefully enough), but I didn't find anything in
gc_process_relocs
-> scan.[global|local]_reloc_may_be_function_pointer
-> possible_function_pointer_reloc
You need to check the SHF_EXECINST flag before assuming you're looking
at an opcode. (Look in x86_64.cc for "is_executable".)
I'll take a closer look at this, add a check, and send an updated patch.
Post by Cary Coutant
I think you should also be checking that the target symbol is
STT_FUNC; that should rule out most jump table cases. (I see a check
for != STT_OBJECT in the local symbol path, but nothing in the global
symbol path. It may be the case that STT_NOTYPE is used for some
extern function symbols, but you want to be conservative, right?)
Ditto for this one.
Please see the updated patch below. There are two updates:

* Added a check for SHF_EXECINSTR flag before checking the call/jump opcodes.

* Added a check for STT_FUNC in the global symbol path.


Also, to add some background behind this patch, we're linking a large C++
binary (chrome-browser) with -pie.

* With --icf=none, the binary size is 177M.

* With --icf=safe, before this patch, the binary size is 176M.
ICF is able to fold 16730 sections (only ctors and dtors, since
Target_x86_64::do_can_check_for_function_pointers returns
false when using -pie).

* After applying this patch, with --icf=safe, the binary size is 169M.
ICF is able to fold 55623 sections.


* x86_64.cc (Target_x86_64::do_can_check_for_function_pointers):
Return true even when building pie binaries.
(Target_x86_64::possible_function_pointer_reloc): Check opcode
for R_X86_64_PC32 relocations.
(Target_x86_64::local_reloc_may_be_function_pointer): Pass
extra arguments to local_reloc_may_be_function_pointer.
(Target_x86_64::global_reloc_may_be_function_pointer): Likewise.
* gc.h (gc_process_relocs): Add check for STT_FUNC.
* testsuite/Makefile.am (icf_safe_pie_test): New test case.
* testsuite/Makefile.in: Regenerate.
* testsuite/icf_safe_pie_test.sh: New shell script.
--
Rahul
Cary Coutant
2017-02-15 08:39:15 UTC
Permalink
Post by Rahul Chaudhry via binutils
Return true even when building pie binaries.
(Target_x86_64::possible_function_pointer_reloc): Check opcode
for R_X86_64_PC32 relocations.
(Target_x86_64::local_reloc_may_be_function_pointer): Pass
extra arguments to local_reloc_may_be_function_pointer.
(Target_x86_64::global_reloc_may_be_function_pointer): Likewise.
* gc.h (gc_process_relocs): Add check for STT_FUNC.
* testsuite/Makefile.am (icf_safe_pie_test): New test case.
* testsuite/Makefile.in: Regenerate.
* testsuite/icf_safe_pie_test.sh: New shell script.
This is OK. I've committed it on your behalf.

Thanks!

-cary

Loading...