Discussion:
[PATCH] RISC-V: Adjust __global_pointer$ value to reduce code size.
Jim Wilson
2018-10-15 23:01:37 UTC
Permalink
For RISC-V, the compiler doesn't emit gp-relative references, they are created
during relaxation. Before this patch, the gp value is sdata+0x800 which is
guaranteed to cover as much as sdata+sbss as possible, but when they are
smaller than 0x1000 we end up wasting some of the gp-relative range. This
changes the calculation to shift gp down when possible, while still covering
all of sdata+sbss. This reduces code size for a trivial hello world program
by about 0.5%. For a more realistic benchmark set, EEMBC Automotive-1.1, I'm
seeing about 0.1% code size reduction on average. For large programs, this
has no effect.

This was tested with riscv{32,64}-{elf,linux} cross binutils build and check,
there were no regressions. I did have to update one linker testcase to disable
relaxation because it assumed that variable accesses in data would not be
relaxed. It was also tested with riscv32-elf and riscv64-linux cross gcc
build and check, and a linux+buildroot build and boot.

Committed.

Jim

ld/
* emulparams/elf32lriscv-defs.sh (DATA_START_SYMBOLS): New.
(SDATA_START_SYMBOLS): Define __SDATA_BEGIN__. Don't define
__global_pointer$.
(OTHER_END_SYMBOLS): New. Define __global_pointer$.
* testsuite/ld-riscv-elf/pcrel-lo-addend-2.d (#ld): Add --no-relax.
---
ld/emulparams/elf32lriscv-defs.sh | 13 ++++++++++++-
ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d | 2 +-
2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/ld/emulparams/elf32lriscv-defs.sh b/ld/emulparams/elf32lriscv-defs.sh
index 91015d4471..5ac3b6023d 100644
--- a/ld/emulparams/elf32lriscv-defs.sh
+++ b/ld/emulparams/elf32lriscv-defs.sh
@@ -30,8 +30,19 @@ TEXT_START_ADDR=0x10000
MAXPAGESIZE="CONSTANT (MAXPAGESIZE)"
COMMONPAGESIZE="CONSTANT (COMMONPAGESIZE)"

-SDATA_START_SYMBOLS="${CREATE_SHLIB-__global_pointer$ = . + 0x800;}
+DATA_START_SYMBOLS="${CREATE_SHLIB-__DATA_BEGIN__ = .;}"
+
+SDATA_START_SYMBOLS="${CREATE_SHLIB-__SDATA_BEGIN__ = .;}
*(.srodata.cst16) *(.srodata.cst8) *(.srodata.cst4) *(.srodata.cst2) *(.srodata .srodata.*)"

INITIAL_READONLY_SECTIONS=".interp : { *(.interp) } ${CREATE_PIE-${INITIAL_READONLY_SECTIONS}}"
INITIAL_READONLY_SECTIONS="${RELOCATING+${CREATE_SHLIB-${INITIAL_READONLY_SECTIONS}}}"
+
+# We must cover as much of sdata as possible if it exists. If sdata+bss is
+# smaller than 0x1000 then we should start from bss end to cover as much of
+# the program as possible. But we can't allow gp to cover any of rodata, as
+# the address of variables in rodata may change during relaxation, so we start
+# from data in that case.
+OTHER_END_SYMBOLS="${CREATE_SHLIB-__BSS_END__ = .;
+ __global_pointer$ = MIN(__SDATA_BEGIN__ + 0x800,
+ MAX(__DATA_BEGIN__ + 0x800, __BSS_END__ - 0x800));}"
diff --git a/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d b/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d
index 9e94c5c399..039de102c3 100644
--- a/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d
+++ b/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d
@@ -1,5 +1,5 @@
#name: %pcrel_lo overflow with an addend
#source: pcrel-lo-addend-2.s
#as: -march=rv32ic
-#ld: -melf32lriscv
+#ld: -melf32lriscv --no-relax
#error: .*dangerous relocation: %pcrel_lo overflow with an addend
--
2.17.1
Palmer Dabbelt
2018-10-15 23:43:03 UTC
Permalink
Post by Jim Wilson
For RISC-V, the compiler doesn't emit gp-relative references, they are created
during relaxation. Before this patch, the gp value is sdata+0x800 which is
guaranteed to cover as much as sdata+sbss as possible, but when they are
smaller than 0x1000 we end up wasting some of the gp-relative range. This
changes the calculation to shift gp down when possible, while still covering
all of sdata+sbss. This reduces code size for a trivial hello world program
by about 0.5%. For a more realistic benchmark set, EEMBC Automotive-1.1, I'm
seeing about 0.1% code size reduction on average. For large programs, this
has no effect.
This was tested with riscv{32,64}-{elf,linux} cross binutils build and check,
there were no regressions. I did have to update one linker testcase to disable
relaxation because it assumed that variable accesses in data would not be
relaxed. It was also tested with riscv32-elf and riscv64-linux cross gcc
build and check, and a linux+buildroot build and boot.
Committed.
Jim
ld/
* emulparams/elf32lriscv-defs.sh (DATA_START_SYMBOLS): New.
(SDATA_START_SYMBOLS): Define __SDATA_BEGIN__. Don't define
__global_pointer$.
(OTHER_END_SYMBOLS): New. Define __global_pointer$.
* testsuite/ld-riscv-elf/pcrel-lo-addend-2.d (#ld): Add --no-relax.
---
ld/emulparams/elf32lriscv-defs.sh | 13 ++++++++++++-
ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d | 2 +-
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/ld/emulparams/elf32lriscv-defs.sh b/ld/emulparams/elf32lriscv-defs.sh
index 91015d4471..5ac3b6023d 100644
--- a/ld/emulparams/elf32lriscv-defs.sh
+++ b/ld/emulparams/elf32lriscv-defs.sh
@@ -30,8 +30,19 @@ TEXT_START_ADDR=0x10000
MAXPAGESIZE="CONSTANT (MAXPAGESIZE)"
COMMONPAGESIZE="CONSTANT (COMMONPAGESIZE)"
-SDATA_START_SYMBOLS="${CREATE_SHLIB-__global_pointer$ = . + 0x800;}
+DATA_START_SYMBOLS="${CREATE_SHLIB-__DATA_BEGIN__ = .;}"
+
+SDATA_START_SYMBOLS="${CREATE_SHLIB-__SDATA_BEGIN__ = .;}
*(.srodata.cst16) *(.srodata.cst8) *(.srodata.cst4) *(.srodata.cst2) *(.srodata .srodata.*)"
INITIAL_READONLY_SECTIONS=".interp : { *(.interp) } ${CREATE_PIE-${INITIAL_READONLY_SECTIONS}}"
INITIAL_READONLY_SECTIONS="${RELOCATING+${CREATE_SHLIB-${INITIAL_READONLY_SECTIONS}}}"
+
+# We must cover as much of sdata as possible if it exists. If sdata+bss is
+# smaller than 0x1000 then we should start from bss end to cover as much of
+# the program as possible. But we can't allow gp to cover any of rodata, as
+# the address of variables in rodata may change during relaxation, so we start
+# from data in that case.
This smells like a bug lurking somewhere. IIRC we avoided relaxing anything
that points to a mergeable segment due to these sorts of constraints, are we
just missing something? I would consider it a bug if users can end up with
incorrect relaxations, regardless of where they're targeting the GP.
Post by Jim Wilson
+OTHER_END_SYMBOLS="${CREATE_SHLIB-__BSS_END__ = .;
+ __global_pointer$ = MIN(__SDATA_BEGIN__ + 0x800,
+ MAX(__DATA_BEGIN__ + 0x800, __BSS_END__ - 0x800));}"
FWIW, when drag racing linker relaxation I always ended up using an offset of
0x7C0 instead of 0x800 because of that extra bit of pessimism in the linker
relaxations that eats a few bytes of offset. IIRC that saves you something
like 3 instructions in Dhrystone's inner loop because it allows relaxing
against the first few bytes of .sdata.

I don't know if it matters any more here, as if I understand how this is all
working we'll end up with GP pointing quite a way before ".sdata+0x800" for
small programs and for big programs the extra few symbols will just be noise.
Post by Jim Wilson
diff --git a/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d b/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d
index 9e94c5c399..039de102c3 100644
--- a/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d
+++ b/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d
@@ -1,5 +1,5 @@
#name: %pcrel_lo overflow with an addend
#source: pcrel-lo-addend-2.s
#as: -march=rv32ic
-#ld: -melf32lriscv
+#ld: -melf32lriscv --no-relax
#error: .*dangerous relocation: %pcrel_lo overflow with an addend
Jim Wilson
2018-10-16 00:39:06 UTC
Permalink
Post by Palmer Dabbelt
This smells like a bug lurking somewhere. IIRC we avoided relaxing anything
that points to a mergeable segment due to these sorts of constraints, are we
just missing something? I would consider it a bug if users can end up with
incorrect relaxations, regardless of where they're targeting the GP.
We avoid relaxation in mergeable and code sections. In this case the
problem is rodata which is neither, but immediately follows .text and
hence variables in rodata can change address when .text shrinks. We
probably never tried to relax references to rodata before. I didn't
look at trying to handle this in relaxation itself, it was easier to
just fix gp to avoid overlapping rodata. We should probably look at
the rodata problem later though.
Post by Palmer Dabbelt
FWIW, when drag racing linker relaxation I always ended up using an offset of
0x7C0 instead of 0x800 because of that extra bit of pessimism in the linker
relaxations that eats a few bytes of offset. IIRC that saves you something
like 3 instructions in Dhrystone's inner loop because it allows relaxing
against the first few bytes of .sdata.
The current definition is .sdata+0x800 which makes everything .sdata
and later relaxable. If you used .sdata+0x7c0, then you would get the
last 64 bytes of the .data section as relaxable. My patch should do
this automatically for you if .sdata+.sbss is smaller than 0x1000.
Though if you have your own linker script you don't get any benefit
from this unless you update your linker script.;
Post by Palmer Dabbelt
I don't know if it matters any more here, as if I understand how this is all
working we'll end up with GP pointing quite a way before ".sdata+0x800" for
small programs and for big programs the extra few symbols will just be noise.
For big programs there will be no extra symbols, because if
.sdata+.bss is 0x1000 or larger we get the exact same gp value as
before. The smaller the program the more the benefit. The larger the
program the less the benefit, with the benefit dropping to zero once
the program gets big enough.

Jim
Palmer Dabbelt
2018-10-16 01:20:25 UTC
Permalink
Post by Jim Wilson
Post by Palmer Dabbelt
This smells like a bug lurking somewhere. IIRC we avoided relaxing anything
that points to a mergeable segment due to these sorts of constraints, are we
just missing something? I would consider it a bug if users can end up with
incorrect relaxations, regardless of where they're targeting the GP.
We avoid relaxation in mergeable and code sections. In this case the
problem is rodata which is neither, but immediately follows .text and
hence variables in rodata can change address when .text shrinks. We
probably never tried to relax references to rodata before. I didn't
look at trying to handle this in relaxation itself, it was easier to
just fix gp to avoid overlapping rodata. We should probably look at
the rodata problem later though.
Ah, OK. I guess I just assemed .rodata was mergeable because I assumed strings
ended up in there. Either way: it seems like a bug, but only one that you just
found when writing this patch and not one introduced by this patch. Certainly
it shouldn't block a merge.
Post by Jim Wilson
Post by Palmer Dabbelt
FWIW, when drag racing linker relaxation I always ended up using an offset of
0x7C0 instead of 0x800 because of that extra bit of pessimism in the linker
relaxations that eats a few bytes of offset. IIRC that saves you something
like 3 instructions in Dhrystone's inner loop because it allows relaxing
against the first few bytes of .sdata.
The current definition is .sdata+0x800 which makes everything .sdata
and later relaxable. If you used .sdata+0x7c0, then you would get the
last 64 bytes of the .data section as relaxable. My patch should do
this automatically for you if .sdata+.sbss is smaller than 0x1000.
Though if you have your own linker script you don't get any benefit
from this unless you update your linker script.;
Ah, yes, I guess that was the point: while ".sdata+0x800" should allow relaxing
against a symbol at ".sdata+0", in practice we have a bit of pessimism in the
relaxation code. I'm actually afraid I don't understand exactly where it comes
from, but if you look at "max_alignment" and "reserve_size" in
"_bfd_riscv_relax_section()" you'll see there's a bit -- 0x7C0 seems a bit
conservative, I'd guess that you can relax against ".sdata+0" down to
"__global_pointer$=.sdata+0x7F0".

It actually looks like my misunderstanding of "max_alignment" is the source of
this auipc+addend bug you're run in to: I appear to have just blindly copied
this alignment pessimism (IIUC, this shrinks the relaxable region by the symbol
size, which would avoid the overflows for LUI-based sequences) from the LUI
relaxations into the AUIPC relaxations. Now that I actually understand the
code (or at least, I think I understand it) that seems completely bogus for the
AUIPC case.

Sorry!
Post by Jim Wilson
Post by Palmer Dabbelt
I don't know if it matters any more here, as if I understand how this is all
working we'll end up with GP pointing quite a way before ".sdata+0x800" for
small programs and for big programs the extra few symbols will just be noise.
For big programs there will be no extra symbols, because if
.sdata+.bss is 0x1000 or larger we get the exact same gp value as
before. The smaller the program the more the benefit. The larger the
program the less the benefit, with the benefit dropping to zero once
the program gets big enough.
Yes, I agree -- and I think that's why this is really splitting hairs. This
really only matter when drag racing Dhrystone, which isn't worth the time --
and also certainly shouldn't block a merge :)

Loading...