Discussion:
MIPS JAL/JALR to BAL transformation for Linux (o32 ABI)
Fu, Chao-Ying
2009-07-31 00:57:38 UTC
Permalink
Hi All,

We tried to implement an optimization to transform JALR to BAL
for function calls inside a shared library to speed up the performance.
It turned out that BFD_RELOC_MIPS_JALR was designed as a hint to help the JALR
transformation. But, this relocation is enabled by N32 and N64 ABIs.
So, we made a patch to enable BFD_RELOC_MIPS_JALR for mips32, mips32r2,
mips64, and mips64r2 for all ABIs.
In order to utilize this optimization, we need to use -mno-explicit-relocs
for GCC to let the assembler emit BFD_RELOC_MIPS_JALR for shared libraries (-mshared).

The JAL to BAL transformation is just enabled by the same mechanism in this patch.
Please see the example and check if this patch may break something. Thanks a lot!

Ex 1: (Calls inside a shared library)
# cat call.c
int t2() { return 1984 + t3(); }
int t3() { return 0; }

# cc1 -quiet call.c -O2 -mabicalls -mshared -G0 -mno-explicit-relocs -o call.s -fno-inline-small-functions
# as-new call.s -o call.o -mips32r2
# objdump -dr call.o
call.o: file format elf32-tradbigmips

Disassembly of section .text:

00000000 <t3>:
0: 3c1c0000 lui gp,0x0
0: R_MIPS_HI16 _gp_disp
4: 279c0000 addiu gp,gp,0
4: R_MIPS_LO16 _gp_disp
8: 0399e021 addu gp,gp,t9
c: 03e00008 jr ra
10: 00001021 move v0,zero

00000014 <t2>:
14: 3c1c0000 lui gp,0x0
14: R_MIPS_HI16 _gp_disp
18: 279c0000 addiu gp,gp,0
18: R_MIPS_LO16 _gp_disp
1c: 0399e021 addu gp,gp,t9
20: 27bdffe0 addiu sp,sp,-32
24: afbf001c sw ra,28(sp)
28: afbc0010 sw gp,16(sp)
2c: 8f990000 lw t9,0(gp)
2c: R_MIPS_CALL16 t3
30: 0320f809 jalr t9 <------------------
30: R_MIPS_JALR t3
34: 00000000 nop
38: 8fbc0010 lw gp,16(sp)
3c: 8fbf001c lw ra,28(sp)
40: 244207c0 addiu v0,v0,1984
44: 03e00008 jr ra
48: 27bd0020 addiu sp,sp,32

# ld-new -shared call.o -o libcall.so
# objdump -dr libcall.so
libcall.so: file format elf32-tradbigmips
Disassembly of section .text:
000002d4 <t3>:
2d4: 3c1c0002 lui gp,0x2
2d8: 279c803c addiu gp,gp,-32708
2dc: 0399e021 addu gp,gp,t9
2e0: 03e00008 jr ra
2e4: 00001021 move v0,zero

000002e8 <t2>:
2e8: 3c1c0002 lui gp,0x2
2ec: 279c8028 addiu gp,gp,-32728
2f0: 0399e021 addu gp,gp,t9
2f4: 27bdffe0 addiu sp,sp,-32
2f8: afbf001c sw ra,28(sp)
2fc: afbc0010 sw gp,16(sp)
300: 8f998018 lw t9,-32744(gp)
304: 0411fff3 bal 2d4 <t3> <--------------------
308: 00000000 nop
30c: 8fbc0010 lw gp,16(sp)
310: 8fbf001c lw ra,28(sp)
314: 244207c0 addiu v0,v0,1984
318: 03e00008 jr ra
31c: 27bd0020 addiu sp,sp,32

Ex 2: (Calls not in a shared library)
# cat call.c
int t2() { return 1984 + t3(); }
int t3() { return 0; }

# cc1 -quiet call.c -O2 -mabicalls -mno-shared -G0 -o call.s -fno-inline-small-functions
# as-new call.s -o call.o -mips32r2
# objdump -dr call.o
call.o: file format elf32-tradbigmips


Disassembly of section .text:

00000000 <t3>:
0: 03e00008 jr ra
4: 00001021 move v0,zero

00000008 <t2>:
8: 27bdffe0 addiu sp,sp,-32
c: afbf001c sw ra,28(sp)
10: 0c000000 jal 0 <t3> <-----------------
10: R_MIPS_26 t3
14: 00000000 nop
18: 8fbf001c lw ra,28(sp)
1c: 244207c0 addiu v0,v0,1984
20: 03e00008 jr ra
24: 27bd0020 addiu sp,sp,32

# ld-new call.o -o call
# objdump -dr call
call: file format elf32-tradbigmips


Disassembly of section .text:

0040006c <t3>:
40006c: 03e00008 jr ra
400070: 00001021 move v0,zero

00400074 <t2>:
400074: 27bdffe0 addiu sp,sp,-32
400078: afbf001c sw ra,28(sp)
40007c: 0411fffb bal 40006c <t3> <-----------------
400080: 00000000 nop
400084: 8fbf001c lw ra,28(sp)
400088: 244207c0 addiu v0,v0,1984
40008c: 03e00008 jr ra
400090: 27bd0020 addiu sp,sp,32

Regards,
Chao-ying

gas/ChangeLog
2009-07-30 Chao-ying Fu <***@mips.com>

* config/tc-mips.c (MIPS_JALR_HINT_P): New define. True for mips32,
mip32r2, mips64, and mips64r2.
(macro_build_jalr): If MIPS_JALR_HINT_P, emit BFD_RELOC_MIPS_JALR.

bfd/ChangeLog
2009-07-30 Chao-ying Fu <***@mips.com>

* elf32-mips.c (mips_reloc_map): Add BFD_RELOC_MIPS_JALR.
* elfxx-mips.c (JAL_JALR_TO_BAL_P): New define to transform JAL/JALR
to BAL for CPUs that include RM9000, mips32, mips32r2, mips64, and mips64r2.
(mips_elf_perform_relocation): Use JAL_JALR_TO_BAL_P to guard the transformation.


Index: src/gas/config/tc-mips.c
===================================================================
--- src.orig/gas/config/tc-mips.c 2009-07-30 16:31:53.379834000 -0700
+++ src/gas/config/tc-mips.c 2009-07-30 16:54:32.814022000 -0700
@@ -290,6 +290,12 @@ static int file_ase_mips16;
|| mips_opts.isa == ISA_MIPS64 \
|| mips_opts.isa == ISA_MIPS64R2)

+/* True if we want to create BFD_RELOC_MIPS_JALR for jalr $25. */
+#define MIPS_JALR_HINT_P (mips_opts.isa == ISA_MIPS32 \
+ || mips_opts.isa == ISA_MIPS32R2 \
+ || mips_opts.isa == ISA_MIPS64 \
+ || mips_opts.isa == ISA_MIPS64R2)
+
/* True if -mips3d was passed or implied by arguments passed on the
command line (e.g., by -march). */
static int file_ase_mips3d;
@@ -3923,12 +3929,11 @@ macro_build_jalr (expressionS *ep)
char *f = NULL;

if (HAVE_NEWABI)
- {
- frag_grow (8);
- f = frag_more (0);
- }
+ frag_grow (8);
+ if (HAVE_NEWABI || MIPS_JALR_HINT_P)
+ f = frag_more (0);
macro_build (NULL, "jalr", "d,s", RA, PIC_CALL_REG);
- if (HAVE_NEWABI)
+ if (HAVE_NEWABI || MIPS_JALR_HINT_P)
fix_new_exp (frag_now, f - frag_now->fr_literal,
4, ep, FALSE, BFD_RELOC_MIPS_JALR);
}
Index: src/bfd/elf32-mips.c
===================================================================
--- src.orig/bfd/elf32-mips.c 2009-07-30 16:31:53.587623000 -0700
+++ src/bfd/elf32-mips.c 2009-07-30 16:40:12.216495000 -0700
@@ -1261,6 +1261,7 @@ static const struct elf_reloc_map mips_r
{ BFD_RELOC_MIPS_GOT_PAGE, R_MIPS_GOT_PAGE },
{ BFD_RELOC_MIPS_GOT_OFST, R_MIPS_GOT_OFST },
{ BFD_RELOC_MIPS_GOT_DISP, R_MIPS_GOT_DISP },
+ { BFD_RELOC_MIPS_JALR, R_MIPS_JALR },
{ BFD_RELOC_MIPS_TLS_DTPMOD32, R_MIPS_TLS_DTPMOD32 },
{ BFD_RELOC_MIPS_TLS_DTPREL32, R_MIPS_TLS_DTPREL32 },
{ BFD_RELOC_MIPS_TLS_DTPMOD64, R_MIPS_TLS_DTPMOD64 },
Index: src/bfd/elfxx-mips.c
===================================================================
--- src.orig/bfd/elfxx-mips.c 2009-07-30 16:31:53.713500000 -0700
+++ src/bfd/elfxx-mips.c 2009-07-30 16:40:12.261451000 -0700
@@ -668,6 +668,16 @@ static bfd *reldyn_sorting_bfd;
( ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) != E_MIPS_ARCH_1) \
|| ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) == E_MIPS_MACH_3900))

+/* True if ABFD is for CPUs that are faster if jal/jalr is converted to bal.
+ This should be safe for all architectures, but for now we enable it
+ for RM9000, mips32, mips32r2, mips64, and mips64r2. */
+#define JAL_JALR_TO_BAL_P(abfd) \
+ ( ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) == E_MIPS_MACH_9000) \
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) == E_MIPS_ARCH_32) \
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) == E_MIPS_ARCH_32R2) \
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) == E_MIPS_ARCH_64) \
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) == E_MIPS_ARCH_64R2))
+
/* True if ABFD is a PIC object. */
#define PIC_OBJECT_P(abfd) \
((elf_elfheader (abfd)->e_flags & EF_MIPS_PIC) != 0)
@@ -5590,7 +5600,7 @@ mips_elf_perform_relocation (struct bfd_
prediction hardware. If we are linking for the RM9000, and we
see jal, and bal fits, use it instead. Note that this
transformation should be safe for all architectures. */
- if (bfd_get_mach (input_bfd) == bfd_mach_mips9000
+ if (JAL_JALR_TO_BAL_P (input_bfd)
&& !info->relocatable
&& !require_jalx
&& ((r_type == R_MIPS_26 && (x >> 26) == 0x3) /* jal addr */
Adam Nemet
2009-07-31 18:16:08 UTC
Permalink
Post by Fu, Chao-Ying
In order to utilize this optimization, we need to
use -mno-explicit-relocs for GCC to let the assembler emit
BFD_RELOC_MIPS_JALR for shared libraries (-mshared).
I agree that we should try to use BAL in shared libraries. However it
seems to me that requiring -mno-explicit-relocs is a high price to pay.
Can't we instead change the calls in shared libraries to also use the
PLT (or the locally binding function directly if possible)?
Post by Fu, Chao-Ying
+/* True if ABFD is for CPUs that are faster if jal/jalr is converted to bal.
+ This should be safe for all architectures, but for now we enable it
+ for RM9000, mips32, mips32r2, mips64, and mips64r2. */
+#define JAL_JALR_TO_BAL_P(abfd) \
+ ( ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) == E_MIPS_MACH_9000) \
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) == E_MIPS_ARCH_32) \
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) == E_MIPS_ARCH_32R2) \
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) == E_MIPS_ARCH_64) \
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) == E_MIPS_ARCH_64R2))
I think this should be a negative predicate. As you say JALR->BAL
should be a profitable transformation on most CPUs.

Adam
Adam Nemet
2009-07-31 19:46:33 UTC
Permalink
Post by Adam Nemet
Post by Fu, Chao-Ying
In order to utilize this optimization, we need to
use -mno-explicit-relocs for GCC to let the assembler emit
BFD_RELOC_MIPS_JALR for shared libraries (-mshared).
I agree that we should try to use BAL in shared libraries. However it
seems to me that requiring -mno-explicit-relocs is a high price to pay.
Can't we instead change the calls in shared libraries to also use the
PLT (or the locally binding function directly if possible)?
If we can't do this there is also the other idea which we used to do to
avoid -mno-explicit-relocs in our toolchain. Emit jalr and jr with an
extra optional operand that is the symbol name of the function for PIC
calls and use that to create the JALR relocation.

Adam
Richard Sandiford
2009-08-01 08:15:50 UTC
Permalink
Post by Adam Nemet
Post by Adam Nemet
Post by Fu, Chao-Ying
In order to utilize this optimization, we need to
use -mno-explicit-relocs for GCC to let the assembler emit
BFD_RELOC_MIPS_JALR for shared libraries (-mshared).
I agree that we should try to use BAL in shared libraries. However it
seems to me that requiring -mno-explicit-relocs is a high price to pay.
Can't we instead change the calls in shared libraries to also use the
PLT (or the locally binding function directly if possible)?
If we can't do this there is also the other idea which we used to do to
avoid -mno-explicit-relocs in our toolchain. Emit jalr and jr with an
extra optional operand that is the symbol name of the function for PIC
calls and use that to create the JALR relocation.
Yeah. These days we could also use (thanks Alan!):

.reloc 1f, R_MIPS_JALR, foo
1: jalr $25

Does that mean you already have a GCC patch to do this kind of thing?

Richard
Adam Nemet
2009-08-01 19:53:40 UTC
Permalink
Post by Richard Sandiford
Post by Adam Nemet
If we can't do this there is also the other idea which we used to do to
avoid -mno-explicit-relocs in our toolchain. Emit jalr and jr with an
extra optional operand that is the symbol name of the function for PIC
calls and use that to create the JALR relocation.
.reloc 1f, R_MIPS_JALR, foo
1: jalr $25
Nice!
Post by Richard Sandiford
Does that mean you already have a GCC patch to do this kind of thing?
Yes, I had this on my todo list but it obviously became less important after
non-PIC executable support; the big improvements were due to JALR->JAL
replacements in the executable. I'll prepare a patch.

Adam
Fu, Chao-Ying
2009-08-26 20:37:51 UTC
Permalink
Post by Adam Nemet
Post by Richard Sandiford
Post by Adam Nemet
If we can't do this there is also the other idea which we
used to do to
Post by Richard Sandiford
Post by Adam Nemet
avoid -mno-explicit-relocs in our toolchain. Emit jalr
and jr with an
Post by Richard Sandiford
Post by Adam Nemet
extra optional operand that is the symbol name of the
function for PIC
Post by Richard Sandiford
Post by Adam Nemet
calls and use that to create the JALR relocation.
.reloc 1f, R_MIPS_JALR, foo
1: jalr $25
Nice!
Post by Richard Sandiford
Does that mean you already have a GCC patch to do this kind
of thing?
Yes, I had this on my todo list but it obviously became less
important after
non-PIC executable support; the big improvements were due to JALR->JAL
replacements in the executable. I'll prepare a patch.
Adam
Hi Adam,

Any update about this GCC patch? Thanks a lot!

Regards,
Chao-ying
Adam Nemet
2009-08-26 21:19:52 UTC
Permalink
Post by Fu, Chao-Ying
Any update about this GCC patch? Thanks a lot!
It's basically done. If testing goes fine overnight I'll post tomorrow.

Adam
Fu, Chao-Ying
2009-08-01 00:43:55 UTC
Permalink
Post by Adam Nemet
I agree that we should try to use BAL in shared libraries. However it
seems to me that requiring -mno-explicit-relocs is a high
price to pay.
Can't we instead change the calls in shared libraries to also use the
PLT (or the locally binding function directly if possible)?
Maybe Richard can give some feedback on this.
Post by Adam Nemet
Post by Fu, Chao-Ying
+/* True if ABFD is for CPUs that are faster if jal/jalr is
converted to bal.
Post by Fu, Chao-Ying
+ This should be safe for all architectures, but for now
we enable it
Post by Fu, Chao-Ying
+ for RM9000, mips32, mips32r2, mips64, and mips64r2. */
+#define JAL_JALR_TO_BAL_P(abfd) \
+ ( ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) ==
E_MIPS_MACH_9000) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32R2) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64R2))
I think this should be a negative predicate. As you say JALR->BAL
should be a profitable transformation on most CPUs.
Yes. If everyone is ok, we can just set JAL_JALR_TO_BAL_P(abfd) to 1.
(And, fix new test failures due to BAL mismatching.)
Thanks!

Regards,
Chao-ying
Richard Sandiford
2009-08-01 08:36:50 UTC
Permalink
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
+/* True if ABFD is for CPUs that are faster if jal/jalr is
converted to bal.
Post by Fu, Chao-Ying
+ This should be safe for all architectures, but for now
we enable it
Post by Fu, Chao-Ying
+ for RM9000, mips32, mips32r2, mips64, and mips64r2. */
+#define JAL_JALR_TO_BAL_P(abfd) \
+ ( ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) ==
E_MIPS_MACH_9000) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32R2) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64R2))
I think this should be a negative predicate. As you say JALR->BAL
should be a profitable transformation on most CPUs.
Yes. If everyone is ok, we can just set JAL_JALR_TO_BAL_P(abfd) to 1.
(And, fix new test failures due to BAL mismatching.)
Sounds good to me FWIW. Also, I don't think MIPS_JALR_HINT_P should
be predicated on ISA level. It should be a format-based decision
instead. The question then is whether we want to unconditionally
enable this for all non-IRIX targets, at the potential risk of
breaking compatibility with other non-GNU & non-IRIX o32 abicalls
linkers, or whether we want to be more conservative.

I'm not sure whether such linkers exist. And even if they do,
anyone with specific requirements could easily create a new GAS
configuration that had this macro turned off. So how about:

#ifdef TE_IRIX
#define MIPS_JALR_HINT_P HAVE_NEWABI
#else
/* As a GNU extension, we use R_MIPS_JALR for o32 too. */
#define MIPS_JALR_HINT_P 1
#endif

Richard
Fu, Chao-Ying
2009-08-04 01:18:05 UTC
Permalink
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
+/* True if ABFD is for CPUs that are faster if jal/jalr is
converted to bal.
Post by Fu, Chao-Ying
+ This should be safe for all architectures, but for now
we enable it
Post by Fu, Chao-Ying
+ for RM9000, mips32, mips32r2, mips64, and mips64r2. */
+#define JAL_JALR_TO_BAL_P(abfd) \
+ ( ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) ==
E_MIPS_MACH_9000) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32R2) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64R2))
I think this should be a negative predicate. As you say JALR->BAL
should be a profitable transformation on most CPUs.
Yes. If everyone is ok, we can just set
JAL_JALR_TO_BAL_P(abfd) to 1.
Post by Fu, Chao-Ying
(And, fix new test failures due to BAL mismatching.)
Sounds good to me FWIW. Also, I don't think MIPS_JALR_HINT_P should
be predicated on ISA level. It should be a format-based decision
instead. The question then is whether we want to unconditionally
enable this for all non-IRIX targets, at the potential risk of
breaking compatibility with other non-GNU & non-IRIX o32 abicalls
linkers, or whether we want to be more conservative.
I'm not sure whether such linkers exist. And even if they do,
anyone with specific requirements could easily create a new GAS
#ifdef TE_IRIX
#define MIPS_JALR_HINT_P HAVE_NEWABI
#else
/* As a GNU extension, we use R_MIPS_JALR for o32 too. */
#define MIPS_JALR_HINT_P 1
#endif
Yes. Here is the new patch to use your define in "tc-mips.c".

And I used JALR_TO_BAL_P and JAL_TO_BAL_P to guard the transformation
in "elfxx-mips.c". JALR_TO_BAL_P is true for all CPUs, because
BAL should be faster than JALR. JAL_TO_BAL_P is true for RM9000, since
the original code supports RM9000. For other CPUs, JAL should perform
well similar to BAL. In the future, we can still add more CPUs to JAL_TO_BAL_P.

I fixed three GAS tests to include R_MIPS_JALR relocations after JALR.
The target "mips-linux-gnu" was tested and there are no new failures
under GAS, LD, and Binutils. Is the patch ok? Thanks!

Note: We still need a GCC patch to insert R_MIPS_JALR reloc when -mexplicit-relocs.

Regards,
Chao-ying

gas/ChangeLog
2009-08-03 Chao-ying Fu <***@mips.com>

* config/tc-mips.c (MIPS_JALR_HINT_P): New define. For IRIX, it is true
for new abi. For non-IRIX targets, it is always true.
(macro_build_jalr): If MIPS_JALR_HINT_P, emit BFD_RELOC_MIPS_JALR.

bfd/ChangeLog
2009-08-03 Chao-ying Fu <***@mips.com>

* elf32-mips.c (mips_reloc_map): Add BFD_RELOC_MIPS_JALR.
* elfxx-mips.c (JAL_TO_BAL_P): New define to transform JAL
to BAL for CPUs. It is true for RM9000.
(JALR_TO_BAL_P): New define to transform JALR to BAL. It is true for all CPUs.
(mips_elf_perform_relocation): Use JAL_TO_BAL_P and JALR_TO_BAL_P
to guard the transformation.

gas/testsuite/ChangeLog
2009-08-03 Chao-ying Fu <***@mips.com>

* gas/mips/jal-svr4pic.d, gas/mips/jal-xgot.d, gas/mips/mips-abi32-pic2.d:
Add R_MIPS_JALR relocations after jalr.

Index: src/gas/config/tc-mips.c
===================================================================
--- src.orig/gas/config/tc-mips.c 2009-08-03 17:51:53.079188000 -0700
+++ src/gas/config/tc-mips.c 2009-08-03 17:52:15.061183000 -0700
@@ -290,6 +290,14 @@ static int file_ase_mips16;
|| mips_opts.isa == ISA_MIPS64 \
|| mips_opts.isa == ISA_MIPS64R2)

+/* True if we want to create R_MIPS_JALR for jalr $25. */
+#ifdef TE_IRIX
+#define MIPS_JALR_HINT_P HAVE_NEWABI
+#else
+/* As a GNU extension, we use R_MIPS_JALR for o32 too. */
+#define MIPS_JALR_HINT_P 1
+#endif
+
/* True if -mips3d was passed or implied by arguments passed on the
command line (e.g., by -march). */
static int file_ase_mips3d;
@@ -3922,13 +3930,13 @@ macro_build_jalr (expressionS *ep)
{
char *f = NULL;

- if (HAVE_NEWABI)
+ if (MIPS_JALR_HINT_P)
{
frag_grow (8);
f = frag_more (0);
}
macro_build (NULL, "jalr", "d,s", RA, PIC_CALL_REG);
- if (HAVE_NEWABI)
+ if (MIPS_JALR_HINT_P)
fix_new_exp (frag_now, f - frag_now->fr_literal,
4, ep, FALSE, BFD_RELOC_MIPS_JALR);
}
Index: src/bfd/elf32-mips.c
===================================================================
--- src.orig/bfd/elf32-mips.c 2009-08-03 17:51:53.358906000 -0700
+++ src/bfd/elf32-mips.c 2009-08-03 17:52:15.074169000 -0700
@@ -1261,6 +1261,7 @@ static const struct elf_reloc_map mips_r
{ BFD_RELOC_MIPS_GOT_PAGE, R_MIPS_GOT_PAGE },
{ BFD_RELOC_MIPS_GOT_OFST, R_MIPS_GOT_OFST },
{ BFD_RELOC_MIPS_GOT_DISP, R_MIPS_GOT_DISP },
+ { BFD_RELOC_MIPS_JALR, R_MIPS_JALR },
{ BFD_RELOC_MIPS_TLS_DTPMOD32, R_MIPS_TLS_DTPMOD32 },
{ BFD_RELOC_MIPS_TLS_DTPREL32, R_MIPS_TLS_DTPREL32 },
{ BFD_RELOC_MIPS_TLS_DTPMOD64, R_MIPS_TLS_DTPMOD64 },
Index: src/bfd/elfxx-mips.c
===================================================================
--- src.orig/bfd/elfxx-mips.c 2009-08-03 17:51:53.412852000 -0700
+++ src/bfd/elfxx-mips.c 2009-08-03 17:53:13.611573000 -0700
@@ -668,6 +668,17 @@ static bfd *reldyn_sorting_bfd;
( ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) != E_MIPS_ARCH_1) \
|| ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) == E_MIPS_MACH_3900))

+/* True if ABFD is for CPUs that are faster if JAL is converted to BAL.
+ This should be safe for all architectures. We enable this predicate
+ for RM9000 for now. */
+#define JAL_TO_BAL_P(abfd) \
+ ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) == E_MIPS_MACH_9000)
+
+/* True if ABFD is for CPUs that are faster if JALR is converted to BAL.
+ This should be safe for all architectures. We enable this predicate for
+ all CPUs. */
+#define JALR_TO_BAL_P(abfd) 1
+
/* True if ABFD is a PIC object. */
#define PIC_OBJECT_P(abfd) \
((elf_elfheader (abfd)->e_flags & EF_MIPS_PIC) != 0)
@@ -5590,11 +5601,12 @@ mips_elf_perform_relocation (struct bfd_
prediction hardware. If we are linking for the RM9000, and we
see jal, and bal fits, use it instead. Note that this
transformation should be safe for all architectures. */
- if (bfd_get_mach (input_bfd) == bfd_mach_mips9000
- && !info->relocatable
+ if (!info->relocatable
&& !require_jalx
- && ((r_type == R_MIPS_26 && (x >> 26) == 0x3) /* jal addr */
- || (r_type == R_MIPS_JALR && x == 0x0320f809))) /* jalr t9 */
+ && ((JAL_TO_BAL_P (input_bfd)
+ && (r_type == R_MIPS_26 && (x >> 26) == 0x3)) /* jal addr */
+ || (JALR_TO_BAL_P (input_bfd) && (r_type == R_MIPS_JALR
+ && x == 0x0320f809)))) /* jalr t9 */
{
bfd_vma addr;
bfd_vma dest;
Index: src/gas/testsuite/gas/mips/jal-svr4pic.d
===================================================================
--- src.orig/gas/testsuite/gas/mips/jal-svr4pic.d 2009-08-03 17:51:53.210056000 -0700
+++ src/gas/testsuite/gas/mips/jal-svr4pic.d 2009-08-03 17:52:15.130113000 -0700
@@ -26,6 +26,7 @@ Disassembly of section .text:
0+0034 <[^>]*> addiu t9,t9,0
[ ]*34: R_MIPS_LO16 .text
0+0038 <[^>]*> jalr t9
+[ ]*38: R_MIPS_JALR .text
0+003c <[^>]*> nop
0+0040 <[^>]*> lw gp,0\(sp\)
0+0044 <[^>]*> nop
@@ -33,6 +34,7 @@ Disassembly of section .text:
[ ]*48: R_MIPS_CALL16 weak_text_label
0+004c <[^>]*> nop
0+0050 <[^>]*> jalr t9
+[ ]*50: R_MIPS_JALR weak_text_label
0+0054 <[^>]*> nop
0+0058 <[^>]*> lw gp,0\(sp\)
0+005c <[^>]*> nop
@@ -40,6 +42,7 @@ Disassembly of section .text:
[ ]*60: R_MIPS_CALL16 external_text_label
0+0064 <[^>]*> nop
0+0068 <[^>]*> jalr t9
+[ ]*68: R_MIPS_JALR external_text_label
0+006c <[^>]*> nop
0+0070 <[^>]*> lw gp,0\(sp\)
0+0074 <[^>]*> b 0+0000 <text_label>
Index: src/gas/testsuite/gas/mips/jal-xgot.d
===================================================================
--- src.orig/gas/testsuite/gas/mips/jal-xgot.d 2009-08-03 17:51:53.246022000 -0700
+++ src/gas/testsuite/gas/mips/jal-xgot.d 2009-08-03 17:52:15.135111000 -0700
@@ -27,6 +27,7 @@ Disassembly of section .text:
0+0034 <[^>]*> addiu t9,t9,0
[ ]*34: R_MIPS_LO16 .text
0+0038 <[^>]*> jalr t9
+[ ]*38: R_MIPS_JALR .text
0+003c <[^>]*> nop
0+0040 <[^>]*> lw gp,0\(sp\)
0+0044 <[^>]*> lui t9,0x0
@@ -36,6 +37,7 @@ Disassembly of section .text:
[ ]*4c: R_MIPS_CALL_LO16 weak_text_label
0+0050 <[^>]*> nop
0+0054 <[^>]*> jalr t9
+[ ]*54: R_MIPS_JALR weak_text_label
0+0058 <[^>]*> nop
0+005c <[^>]*> lw gp,0\(sp\)
0+0060 <[^>]*> lui t9,0x0
@@ -45,6 +47,7 @@ Disassembly of section .text:
[ ]*68: R_MIPS_CALL_LO16 external_text_label
0+006c <[^>]*> nop
0+0070 <[^>]*> jalr t9
+[ ]*70: R_MIPS_JALR external_text_label
0+0074 <[^>]*> nop
0+0078 <[^>]*> lw gp,0\(sp\)
0+007c <[^>]*> b 0+0000 <text_label>
Index: src/gas/testsuite/gas/mips/mips-abi32-pic2.d
===================================================================
--- src.orig/gas/testsuite/gas/mips/mips-abi32-pic2.d 2009-08-03 17:51:53.283983000 -0700
+++ src/gas/testsuite/gas/mips/mips-abi32-pic2.d 2009-08-03 17:52:15.140104000 -0700
@@ -16,6 +16,7 @@ Disassembly of section \.text:
0+014 <[^>]*> 273900cc addiu t9,t9,204
14: R_MIPS_LO16 \.text
0+018 <[^>]*> 0320f809 jalr t9
+ 18: R_MIPS_JALR \.text
0+01c <[^>]*> 00000000 nop
0+020 <[^>]*> 8fbc0008 lw gp,8\(sp\)
0+024 <[^>]*> 00000000 nop
@@ -35,6 +36,7 @@ Disassembly of section \.text:
0+050 <[^>]*> 273900cc addiu t9,t9,204
50: R_MIPS_LO16 \.text
0+054 <[^>]*> 0320f809 jalr t9
+ 54: R_MIPS_JALR \.text
0+058 <[^>]*> 00000000 nop
0+05c <[^>]*> 3c010001 lui at,0x1
0+060 <[^>]*> 003d0821 addu at,at,sp
@@ -58,6 +60,7 @@ Disassembly of section \.text:
0+09c <[^>]*> 273900cc addiu t9,t9,204
9c: R_MIPS_LO16 \.text
0+0a0 <[^>]*> 0320f809 jalr t9
+ a0: R_MIPS_JALR \.text
0+0a4 <[^>]*> 00000000 nop
0+0a8 <[^>]*> 3c010001 lui at,0x1
0+0ac <[^>]*> 003d0821 addu at,at,sp
Richard Sandiford
2009-08-05 19:34:31 UTC
Permalink
Post by Fu, Chao-Ying
@@ -5590,11 +5601,12 @@ mips_elf_perform_relocation (struct bfd_
prediction hardware. If we are linking for the RM9000, and we
see jal, and bal fits, use it instead. Note that this
transformation should be safe for all architectures. */
You need to update this comment too. With the new macros, it can
be a lot simpler, such as:

/* Try converting JAL and JALR to BAL, if the target is in range. */
Post by Fu, Chao-Ying
- if (bfd_get_mach (input_bfd) == bfd_mach_mips9000
- && !info->relocatable
+ if (!info->relocatable
&& !require_jalx
- && ((r_type == R_MIPS_26 && (x >> 26) == 0x3) /* jal addr */
- || (r_type == R_MIPS_JALR && x == 0x0320f809))) /* jalr t9 */
+ && ((JAL_TO_BAL_P (input_bfd)
+ && (r_type == R_MIPS_26 && (x >> 26) == 0x3)) /* jal addr */
+ || (JALR_TO_BAL_P (input_bfd) && (r_type == R_MIPS_JALR
+ && x == 0x0320f809)))) /* jalr t9 */
Odd formatting. I think it should be:

if (!info->relocatable
&& !require_jalx
&& ((JAL_TO_BAL_P (input_bfd)
&& r_type == R_MIPS_26
&& (x >> 26) == 0x3) /* jal addr */
|| (JALR_TO_BAL_P (input_bfd)
&& r_type == R_MIPS_JALR
&& x == 0x0320f809))) /* jalr t9 */

More importantly, I think we should be checking the output_bfd
rather than the input bfd. E.g. if you were linking legacy MIPS 3
n32 objects with MIPS64 objects, you'd want the MIPS64-related
optimisations to be applied to both.

Looks good to me with those changes.

Richard
Fu, Chao-Ying
2009-08-05 21:38:16 UTC
Permalink
Post by Richard Sandiford
Post by Fu, Chao-Ying
@@ -5590,11 +5601,12 @@ mips_elf_perform_relocation (struct bfd_
prediction hardware. If we are linking for the RM9000, and we
see jal, and bal fits, use it instead. Note that this
transformation should be safe for all architectures. */
You need to update this comment too. With the new macros, it can
/* Try converting JAL and JALR to BAL, if the target is in
range. */
Yes.
Post by Richard Sandiford
Post by Fu, Chao-Ying
- if (bfd_get_mach (input_bfd) == bfd_mach_mips9000
- && !info->relocatable
+ if (!info->relocatable
&& !require_jalx
- && ((r_type == R_MIPS_26 && (x >> 26) == 0x3)
/* jal addr */
Post by Fu, Chao-Ying
- || (r_type == R_MIPS_JALR && x == 0x0320f809))) /*
jalr t9 */
Post by Fu, Chao-Ying
+ && ((JAL_TO_BAL_P (input_bfd)
+ && (r_type == R_MIPS_26 && (x >> 26) == 0x3))
/* jal addr */
Post by Fu, Chao-Ying
+ || (JALR_TO_BAL_P (input_bfd) && (r_type == R_MIPS_JALR
+ && x == 0x0320f809)))) /* jalr t9 */
if (!info->relocatable
&& !require_jalx
&& ((JAL_TO_BAL_P (input_bfd)
&& r_type == R_MIPS_26
&& (x >> 26) == 0x3) /* jal addr */
|| (JALR_TO_BAL_P (input_bfd)
&& r_type == R_MIPS_JALR
&& x == 0x0320f809))) /* jalr t9 */
Yes.
Post by Richard Sandiford
More importantly, I think we should be checking the output_bfd
rather than the input bfd. E.g. if you were linking legacy MIPS 3
n32 objects with MIPS64 objects, you'd want the MIPS64-related
optimisations to be applied to both.
Looks good to me with those changes.
Yes, I checked in the patch with the first two changes.

To check output_bfd, we need to add a new parameter of output_bfd to
mips_elf_perform_relocation. But now because JALR_TO_BAL_P is true for all
and JAL_TO_BAL_P is true for RM9000, checking input_bfd should be the same
as checking output_bfd for non-RM9000 objects. We still can change to check
output_bfd later. Thanks a lot!

Regards,
Chao-ying
Richard Sandiford
2009-08-05 21:41:42 UTC
Permalink
Post by Fu, Chao-Ying
To check output_bfd, we need to add a new parameter of output_bfd to
mips_elf_perform_relocation. But now because JALR_TO_BAL_P is true for all
and JAL_TO_BAL_P is true for RM9000, checking input_bfd should be the same
as checking output_bfd for non-RM9000 objects. We still can change to check
output_bfd later.
The output bfd is also available via info->output_bfd. (bfd has
a habit of passing both around, but the info parameter makes the
output_bfd parameter redundant.)

Richard
Fu, Chao-Ying
2009-08-05 21:46:41 UTC
Permalink
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
To check output_bfd, we need to add a new parameter of
output_bfd to
Post by Fu, Chao-Ying
mips_elf_perform_relocation. But now because JALR_TO_BAL_P
is true for all
Post by Fu, Chao-Ying
and JAL_TO_BAL_P is true for RM9000, checking input_bfd
should be the same
Post by Fu, Chao-Ying
as checking output_bfd for non-RM9000 objects. We still
can change to check
Post by Fu, Chao-Ying
output_bfd later.
The output bfd is also available via info->output_bfd. (bfd has
a habit of passing both around, but the info parameter makes the
output_bfd parameter redundant.)
Richard
Yes. Here is the patch to fix this. Ok to install? Thanks a lot!

Regards,
Chao-ying

2009-08-05 Chao-ying Fu <***@mips.com>

* elfxx-mips.c (mips_elf_perform_relocation): Pass info->output_bfd
toJAL_TO_BAL_P and JALR_TO_BAL_P.

Index: elfxx-mips.c
===================================================================
RCS file: /cvs/src/src/bfd/elfxx-mips.c,v
retrieving revision 1.257
diff -u -p -r1.257 elfxx-mips.c
--- elfxx-mips.c 5 Aug 2009 21:17:51 -0000 1.257
+++ elfxx-mips.c 5 Aug 2009 21:44:54 -0000
@@ -5600,10 +5600,10 @@ mips_elf_perform_relocation (struct bfd_
/* Try converting JAL and JALR to BAL, if the target is in range. */
if (!info->relocatable
&& !require_jalx
- && ((JAL_TO_BAL_P (input_bfd)
+ && ((JAL_TO_BAL_P (info->output_bfd)
&& r_type == R_MIPS_26
&& (x >> 26) == 0x3) /* jal addr */
- || (JALR_TO_BAL_P (input_bfd)
+ || (JALR_TO_BAL_P (info->output_bfd)
&& r_type == R_MIPS_JALR
&& x == 0x0320f809))) /* jalr t9 */
{
Fu, Chao-Ying
2009-08-07 18:41:29 UTC
Permalink
Hi All,

I tested a new target "mips64-linux-gnu" and found new LD test failures
due to my patch. While fixing mismatching, I found one issue from a relaxation
test "relax-jalr-n32-shared.d".
Ex:
# relax-jalr.s
.globl __start
.space 8
.ent __start
__start:
.Lstart:
.space 16
jal __start <------------ NOT relaxed
.space 32
jal __start <------------ NOT relaxed
.space 64
jal .Lstart <------------ relaxed
.end __start

# make objdump print ...
.space 8

The first two JALRs aren't relaxed in "_bfd_mips_relax_section" due to the check.
as follows.
/* If a symbol is undefined, or if it may be overridden,
skip it. */
if (! ((h->root.root.type == bfd_link_hash_defined
|| h->root.root.type == bfd_link_hash_defweak)
&& h->root.root.u.def.section)
|| (link_info->shared && ! link_info->symbolic
&& ! (h->root.elf_link_hash_flags & ELF_LINK_FORCED_LOCAL)))
continue;

But later, these two JALRs are transformed to BAL in "mips_elf_perform_relocation".
Is it safe to use BAL for the first two JALRs?
And, the relaxation code seems redundant, after we can do the same thing in
"mips_elf_perform_relocation".

The patch to fix LD failures is attached. Thanks!

Regards,
Chao-ying

2009-08-07 Chao-ying Fu <***@mips.com>

* ld-mips-elf/elf-rel-got-n32.d,
ld-mips-elf/elf-rel-got-n64-linux.d,
ld-mips-elf/elf-rel-got-n64.d,
ld-mips-elf/elf-rel-xgot-n32.d,
ld-mips-elf/relax-jalr-n32-shared.d,
ld-mips-elf/relax-jalr-n64-shared.d,
ld-mips-elf/elf-rel-xgot-n64-linux.d,
ld-mips-elf/elf-rel-xgot-n64.d: Change JALR to BAL.

Index: src/ld/testsuite/ld-mips-elf/elf-rel-got-n32.d
===================================================================
--- src.orig/ld/testsuite/ld-mips-elf/elf-rel-got-n32.d 2009-08-06 18:03:32.000000000 -0700
+++ src/ld/testsuite/ld-mips-elf/elf-rel-got-n32.d 2009-08-07 10:28:02.870923000 -0700
@@ -127,10 +127,10 @@ Disassembly of section \.text:
10000260: 8f99805c lw t9,-32676\(gp\)
10000264: 8f998030 lw t9,-32720\(gp\)
10000268: 8f99805c lw t9,-32676\(gp\)
-1000026c: 0320f809 jalr t9
+1000026c: 0411ff90 bal 100000b0 <fn>
10000270: 00000000 nop
10000274: 8f998030 lw t9,-32720\(gp\)
-10000278: 0320f809 jalr t9
+10000278: 0411ff8d bal 100000b0 <fn>
1000027c: 00000000 nop
10000280: 8f858068 lw a1,-32664\(gp\)
10000284: 8f858068 lw a1,-32664\(gp\)
@@ -243,10 +243,10 @@ Disassembly of section \.text:
10000430: 8f998060 lw t9,-32672\(gp\)
10000434: 8f998048 lw t9,-32696\(gp\)
10000438: 8f998060 lw t9,-32672\(gp\)
-1000043c: 0320f809 jalr t9
+1000043c: 0411001d bal 100004b4 <fn2>
10000440: 00000000 nop
10000444: 8f998048 lw t9,-32696\(gp\)
-10000448: 0320f809 jalr t9
+10000448: 0411001a bal 100004b4 <fn2>
1000044c: 00000000 nop
10000450: 1000ff17 b 100000b0 <fn>
10000454: 8f858064 lw a1,-32668\(gp\)
Index: src/ld/testsuite/ld-mips-elf/elf-rel-got-n64-linux.d
===================================================================
--- src.orig/ld/testsuite/ld-mips-elf/elf-rel-got-n64-linux.d 2009-08-06 18:03:32.000000000 -0700
+++ src/ld/testsuite/ld-mips-elf/elf-rel-got-n64-linux.d 2009-08-07 10:28:02.877917000 -0700
@@ -129,10 +129,10 @@ Disassembly of section \.text:
120000290: df9980a8 ld t9,-32600\(gp\)
120000294: df998050 ld t9,-32688\(gp\)
120000298: df9980a8 ld t9,-32600\(gp\)
- 12000029c: 0320f809 jalr t9
+ 12000029c: 0411ff90 bal 1200000e0 <fn>
1200002a0: 00000000 nop
1200002a4: df998050 ld t9,-32688\(gp\)
- 1200002a8: 0320f809 jalr t9
+ 1200002a8: 0411ff8d bal 1200000e0 <fn>
1200002ac: 00000000 nop
1200002b0: df8580c0 ld a1,-32576\(gp\)
1200002b4: df8580c0 ld a1,-32576\(gp\)
@@ -245,10 +245,10 @@ Disassembly of section \.text:
120000460: df9980b0 ld t9,-32592\(gp\)
120000464: df998080 ld t9,-32640\(gp\)
120000468: df9980b0 ld t9,-32592\(gp\)
- 12000046c: 0320f809 jalr t9
+ 12000046c: 0411001d bal 1200004e4 <fn2>
120000470: 00000000 nop
120000474: df998080 ld t9,-32640\(gp\)
- 120000478: 0320f809 jalr t9
+ 120000478: 0411001a bal 1200004e4 <fn2>
12000047c: 00000000 nop
120000480: 1000ff17 b 1200000e0 <fn>
120000484: df8580b8 ld a1,-32584\(gp\)
Index: src/ld/testsuite/ld-mips-elf/elf-rel-got-n64.d
===================================================================
--- src.orig/ld/testsuite/ld-mips-elf/elf-rel-got-n64.d 2009-08-06 18:03:32.000000000 -0700
+++ src/ld/testsuite/ld-mips-elf/elf-rel-got-n64.d 2009-08-07 10:28:02.899899000 -0700
@@ -128,10 +128,10 @@ Disassembly of section \.text:
10000290: df9980a8 ld t9,-32600\(gp\)
10000294: df998050 ld t9,-32688\(gp\)
10000298: df9980a8 ld t9,-32600\(gp\)
- 1000029c: 0320f809 jalr t9
+ 1000029c: 0411ff90 bar 1200000e0 <fn>
100002a0: 00000000 nop
100002a4: df998050 ld t9,-32688\(gp\)
- 100002a8: 0320f809 jalr t9
+ 100002a8: 0411ff8d bal 1200000e0 <fn>
100002ac: 00000000 nop
100002b0: df8580c0 ld a1,-32576\(gp\)
100002b4: df8580c0 ld a1,-32576\(gp\)
@@ -244,10 +244,10 @@ Disassembly of section \.text:
10000460: df9980b0 ld t9,-32592\(gp\)
10000464: df998080 ld t9,-32640\(gp\)
10000468: df9980b0 ld t9,-32592\(gp\)
- 1000046c: 0320f809 jalr t9
+ 1000046c: 0411001d bal 1200004e4 <fn2>
10000470: 00000000 nop
10000474: df998080 ld t9,-32640\(gp\)
- 10000478: 0320f809 jalr t9
+ 10000478: 0411001a bal 1200004e4 <fn2>
1000047c: 00000000 nop
10000480: 1000ff17 b 100000e0 <fn>
10000484: df8580b8 ld a1,-32584\(gp\)
Index: src/ld/testsuite/ld-mips-elf/elf-rel-xgot-n32.d
===================================================================
--- src.orig/ld/testsuite/ld-mips-elf/elf-rel-xgot-n32.d 2009-08-06 18:03:32.000000000 -0700
+++ src/ld/testsuite/ld-mips-elf/elf-rel-xgot-n32.d 2009-08-07 10:28:02.908884000 -0700
@@ -183,11 +183,11 @@ Disassembly of section \.text:
10000340: 3c190000 lui t9,0x0
10000344: 033cc821 addu t9,t9,gp
10000348: 8f39802c lw t9,-32724\(t9\)
-1000034c: 0320f809 jalr t9
+1000034c: 0411ff58 bal 100000b0 <fn>
10000350: 00000000 nop
10000354: 8f998020 lw t9,-32736\(gp\)
10000358: 273900b0 addiu t9,t9,176
-1000035c: 0320f809 jalr t9
+1000035c: 0411ff54 bal 100000b0 <fn>
10000360: 00000000 nop
10000364: 3c050000 lui a1,0x0
10000368: 00bc2821 addu a1,a1,gp
@@ -356,11 +356,11 @@ Disassembly of section \.text:
100005f4: 3c190000 lui t9,0x0
100005f8: 033cc821 addu t9,t9,gp
100005fc: 8f398030 lw t9,-32720\(t9\)
-10000600: 0320f809 jalr t9
+10000600: 0411002b bal 100006b0 <fn2>
10000604: 00000000 nop
10000608: 8f998020 lw t9,-32736\(gp\)
1000060c: 273906b0 addiu t9,t9,1712
-10000610: 0320f809 jalr t9
+10000610: 04110027 bal 100006b0 <fn2>
10000614: 00000000 nop
10000618: 3c050000 lui a1,0x0
1000061c: 00bc2821 addu a1,a1,gp
Index: src/ld/testsuite/ld-mips-elf/relax-jalr-n32-shared.d
===================================================================
--- src.orig/ld/testsuite/ld-mips-elf/relax-jalr-n32-shared.d 2009-08-06 18:03:32.000000000 -0700
+++ src/ld/testsuite/ld-mips-elf/relax-jalr-n32-shared.d 2009-08-07 10:28:02.913882000 -0700
@@ -10,11 +10,11 @@ Disassembly of section \.text:
\.\.\.
\.\.\.
.* lw t9,.*
-.* jalr t9
+.* bal .* <__start>
.* nop
\.\.\.
.* lw t9,.*
-.* jalr t9
+.* bal .* <__start>
.* nop
\.\.\.
.* lw t9,.*
Index: src/ld/testsuite/ld-mips-elf/relax-jalr-n64-shared.d
===================================================================
--- src.orig/ld/testsuite/ld-mips-elf/relax-jalr-n64-shared.d 2009-08-06 18:03:32.000000000 -0700
+++ src/ld/testsuite/ld-mips-elf/relax-jalr-n64-shared.d 2009-08-07 10:28:02.919874000 -0700
@@ -10,11 +10,11 @@ Disassembly of section \.text:
\.\.\.
\.\.\.
.* ld t9,.*
-.* jalr t9
+.* bal .* <__start>
.* nop
\.\.\.
.* ld t9,.*
-.* jalr t9
+.* bal .* <__start>
.* nop
\.\.\.
.* ld t9,.*
Index: src/ld/testsuite/ld-mips-elf/elf-rel-xgot-n64-linux.d
===================================================================
--- src.orig/ld/testsuite/ld-mips-elf/elf-rel-xgot-n64-linux.d 2009-08-06 18:03:32.000000000 -0700
+++ src/ld/testsuite/ld-mips-elf/elf-rel-xgot-n64-linux.d 2009-08-07 10:28:02.925870000 -0700
@@ -185,11 +185,11 @@ Disassembly of section \.text:
120000370: 3c190000 lui t9,0x0
120000374: 033cc82d daddu t9,t9,gp
120000378: df398048 ld t9,-32696\(t9\)
- 12000037c: 0320f809 jalr t9
+ 12000037c: 0411ff58 bal 1200000e0 <fn>
120000380: 00000000 nop
120000384: df998030 ld t9,-32720\(gp\)
120000388: 673900e0 daddiu t9,t9,224
- 12000038c: 0320f809 jalr t9
+ 12000038c: 0411ff54 bal 1200000e0 <fn>
120000390: 00000000 nop
120000394: 3c050000 lui a1,0x0
120000398: 00bc282d daddu a1,a1,gp
@@ -358,11 +358,11 @@ Disassembly of section \.text:
120000624: 3c190000 lui t9,0x0
120000628: 033cc82d daddu t9,t9,gp
12000062c: df398050 ld t9,-32688\(t9\)
- 120000630: 0320f809 jalr t9
+ 120000630: 0411002b bal 1200006e0 <fn2>
120000634: 00000000 nop
120000638: df998030 ld t9,-32720\(gp\)
12000063c: 673906e0 daddiu t9,t9,1760
- 120000640: 0320f809 jalr t9
+ 120000640: 04110027 bal 1200006e0 <fn2>
120000644: 00000000 nop
120000648: 3c050000 lui a1,0x0
12000064c: 00bc282d daddu a1,a1,gp
Index: src/ld/testsuite/ld-mips-elf/elf-rel-xgot-n64.d
===================================================================
--- src.orig/ld/testsuite/ld-mips-elf/elf-rel-xgot-n64.d 2009-08-07 10:24:52.000000000 -0700
+++ src/ld/testsuite/ld-mips-elf/elf-rel-xgot-n64.d 2009-08-07 10:36:28.332542000 -0700
@@ -184,11 +184,11 @@ Disassembly of section \.text:
10000370: 3c190000 lui t9,0x0
10000374: 033cc82d daddu t9,t9,gp
10000378: df398048 ld t9,-32696\(t9\)
- 1000037c: 0320f809 jalr t9
+ 1000037c: 0411ff58 bal 100000e0 <fn>
10000380: 00000000 nop
10000384: df998030 ld t9,-32720\(gp\)
10000388: 673900e0 daddiu t9,t9,224
- 1000038c: 0320f809 jalr t9
+ 1000038c: 0411ff54 bal 100000e0 <fn>
10000390: 00000000 nop
10000394: 3c050000 lui a1,0x0
10000398: 00bc282d daddu a1,a1,gp
@@ -357,11 +357,11 @@ Disassembly of section \.text:
10000624: 3c190000 lui t9,0x0
10000628: 033cc82d daddu t9,t9,gp
1000062c: df398050 ld t9,-32688\(t9\)
- 10000630: 0320f809 jalr t9
+ 10000630: 0411002b bal 100006e0 <fn2>
10000634: 00000000 nop
10000638: df998030 ld t9,-32720\(gp\)
1000063c: 673906e0 daddiu t9,t9,1760
- 10000640: 0320f809 jalr t9
+ 10000640: 04110027 bal 100006e0 <fn2>
10000644: 00000000 nop
10000648: 3c050000 lui a1,0x0
1000064c: 00bc282d daddu a1,a1,gp
Fu, Chao-Ying
2009-08-26 20:49:16 UTC
Permalink
Post by Fu, Chao-Ying
Hi All,
I tested a new target "mips64-linux-gnu" and found new LD
test failures
due to my patch. While fixing mismatching, I found one issue
from a relaxation
test "relax-jalr-n32-shared.d".
# relax-jalr.s
.globl __start
.space 8
.ent __start
.space 16
jal __start <------------ NOT relaxed
.space 32
jal __start <------------ NOT relaxed
.space 64
jal .Lstart <------------ relaxed
.end __start
# make objdump print ...
.space 8
The first two JALRs aren't relaxed in
"_bfd_mips_relax_section" due to the check.
as follows.
/* If a symbol is undefined, or if it may be overridden,
skip it. */
if (! ((h->root.root.type == bfd_link_hash_defined
|| h->root.root.type == bfd_link_hash_defweak)
&& h->root.root.u.def.section)
|| (link_info->shared && ! link_info->symbolic
&& ! (h->root.elf_link_hash_flags & ELF_LINK_FORCED_LOCAL)))
continue;
But later, these two JALRs are transformed to BAL in
"mips_elf_perform_relocation".
Is it safe to use BAL for the first two JALRs?
And, the relaxation code seems redundant, after we can do the
same thing in
"mips_elf_perform_relocation".
The patch to fix LD failures is attached. Thanks!
Regards,
Chao-ying
* ld-mips-elf/elf-rel-got-n32.d,
ld-mips-elf/elf-rel-got-n64-linux.d,
ld-mips-elf/elf-rel-got-n64.d,
ld-mips-elf/elf-rel-xgot-n32.d,
ld-mips-elf/relax-jalr-n32-shared.d,
ld-mips-elf/relax-jalr-n64-shared.d,
ld-mips-elf/elf-rel-xgot-n64-linux.d,
ld-mips-elf/elf-rel-xgot-n64.d: Change JALR to BAL.
Hi,

Does anyone have feedback about the safety issue (JALR->BAL) and this patch?
Thanks a lot!

Regards,
Chao-ying
Adam Nemet
2009-08-27 16:29:28 UTC
Permalink
Post by Fu, Chao-Ying
.globl __start
.space 8
.ent __start
.space 16
jal __start <------------ NOT relaxed
.space 32
jal __start <------------ NOT relaxed
.space 64
jal .Lstart <------------ relaxed
.end __start
# make objdump print ...
.space 8
The first two JALRs aren't relaxed in
"_bfd_mips_relax_section" due to the check.
as follows.
/* If a symbol is undefined, or if it may be overridden,
skip it. */
if (! ((h->root.root.type == bfd_link_hash_defined
|| h->root.root.type == bfd_link_hash_defweak)
&& h->root.root.u.def.section)
|| (link_info->shared && ! link_info->symbolic
&& ! (h->root.elf_link_hash_flags & ELF_LINK_FORCED_LOCAL)))
continue;
But later, these two JALRs are transformed to BAL in
"mips_elf_perform_relocation".
Is it safe to use BAL for the first two JALRs?
No it's not safe in a shared library unless -Bsymbolic is used. This is
actually causing a regression with my GCC testing of the R_MIPS_JALR
patch. It seems that my patch:

http://sourceware.org/ml/binutils/2009-08/msg00404.html

needs to exclude global symbols in shared libs just like it's done under
relaxation. I'll update that patch.

Also did you test your binutils R_MIPS_JALR patch with a GCC bootstrap?
If not then next time you probably should. The binutils testsuite is
not very good in this area and I don't think the problems I have found
since your patch are "-mexplicit-relocs"-specific.
Post by Fu, Chao-Ying
And, the relaxation code seems redundant, after we can do the
same thing in
"mips_elf_perform_relocation".
Yes, it does seem redundant.

Adam

Adam Nemet
2009-08-01 19:57:39 UTC
Permalink
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
+/* True if ABFD is for CPUs that are faster if jal/jalr is
converted to bal.
Post by Fu, Chao-Ying
+ This should be safe for all architectures, but for now
we enable it
Post by Fu, Chao-Ying
+ for RM9000, mips32, mips32r2, mips64, and mips64r2. */
+#define JAL_JALR_TO_BAL_P(abfd) \
+ ( ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) ==
E_MIPS_MACH_9000) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32R2) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64R2))
I think this should be a negative predicate. As you say JALR->BAL
should be a profitable transformation on most CPUs.
Yes. If everyone is ok, we can just set JAL_JALR_TO_BAL_P(abfd) to 1.
(And, fix new test failures due to BAL mismatching.)
Just to be sure, what I said applies to JALR->BAL for Octeon. JAL->BAL is not
necessarily profitable on Octeon but I thought the relaxation code was
performing JALR->BAL or JALR->JAL and not JAL->BAL? Am I missing something
here?

Adam
David Daney
2009-08-03 16:19:34 UTC
Permalink
Post by Adam Nemet
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
+/* True if ABFD is for CPUs that are faster if jal/jalr is
converted to bal.
Post by Fu, Chao-Ying
+ This should be safe for all architectures, but for now
we enable it
Post by Fu, Chao-Ying
+ for RM9000, mips32, mips32r2, mips64, and mips64r2. */
+#define JAL_JALR_TO_BAL_P(abfd) \
+ ( ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) ==
E_MIPS_MACH_9000) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32R2) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64R2))
I think this should be a negative predicate. As you say JALR->BAL
should be a profitable transformation on most CPUs.
Yes. If everyone is ok, we can just set JAL_JALR_TO_BAL_P(abfd) to 1.
(And, fix new test failures due to BAL mismatching.)
Just to be sure, what I said applies to JALR->BAL for Octeon. JAL->BAL is not
necessarily profitable on Octeon but I thought the relaxation code was
performing JALR->BAL or JALR->JAL and not JAL->BAL? Am I missing something
here?
In -fPIC code we would want JALR -> BAL, one would hope that JAL would
not be emitted with -fPIC. It probably varies on a per CPU basis
whether or not JAL or BAL is preferable, but if the branch range is
exceeded you would probably want to fall back to JAL.

David Daney
Fu, Chao-Ying
2009-08-03 17:29:37 UTC
Permalink
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
Post by Fu, Chao-Ying
+/* True if ABFD is for CPUs that are faster if jal/jalr is
converted to bal.
Post by Fu, Chao-Ying
+ This should be safe for all architectures, but for now
we enable it
Post by Fu, Chao-Ying
+ for RM9000, mips32, mips32r2, mips64, and mips64r2. */
+#define JAL_JALR_TO_BAL_P(abfd) \
+ ( ((elf_elfheader (abfd)->e_flags & EF_MIPS_MACH) ==
E_MIPS_MACH_9000) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_32R2) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64) \
Post by Fu, Chao-Ying
+ || ((elf_elfheader (abfd)->e_flags & EF_MIPS_ARCH) ==
E_MIPS_ARCH_64R2))
I think this should be a negative predicate. As you say JALR->BAL
should be a profitable transformation on most CPUs.
Yes. If everyone is ok, we can just set
JAL_JALR_TO_BAL_P(abfd) to 1.
Post by Fu, Chao-Ying
(And, fix new test failures due to BAL mismatching.)
Just to be sure, what I said applies to JALR->BAL for Octeon.
JAL->BAL is not
necessarily profitable on Octeon but I thought the relaxation code was
performing JALR->BAL or JALR->JAL and not JAL->BAL? Am I
missing something
here?
The transformation checks two things: JAL and JALR, to convert them to BAL.
Maybe we can split the predicate to two: JAL_TO_BAL_P and JALR_TO_BAL_P.
Then, you can disable JAL_TO_BAL_P for Octeon.

Ex:
/* On the RM9000, bal is faster than jal, because bal uses branch
prediction hardware. If we are linking for the RM9000, and we
see jal, and bal fits, use it instead. Note that this
transformation should be safe for all architectures. */
if (!info->relocatable
&& !require_jalx
&& ((JAL_TO_BAL_P && (r_type == R_MIPS_26 && (x >> 26) == 0x3)) /* jal addr */
|| (JALR_TO_BAL_P && (r_type == R_MIPS_JALR && x == 0x0320f809)))) /* jalr t9 */
{
bfd_vma addr;
bfd_vma dest;
bfd_signed_vma off;

addr = (input_section->output_section->vma
+ input_section->output_offset
+ relocation->r_offset
+ 4);
if (r_type == R_MIPS_26)
dest = (value << 2) | ((addr >> 28) << 28);
else
dest = value;
off = dest - addr;
if (off <= 0x1ffff && off >= -0x20000)
x = 0x04110000 | (((bfd_vma) off >> 2) & 0xffff); /* bal addr */
}

Regards,
Chao-ying
Adam Nemet
2009-08-03 18:45:47 UTC
Permalink
Post by Fu, Chao-Ying
The transformation checks two things: JAL and JALR, to convert them to BAL.
Oh I see. I was talking about the relaxation code where we do JALR->BAL or
JR->B; I didn't know about this other place. Thanks for clarifying.
Post by Fu, Chao-Ying
Maybe we can split the predicate to two: JAL_TO_BAL_P and JALR_TO_BAL_P.
Then, you can disable JAL_TO_BAL_P for Octeon.
Yes that would work, thanks.

Adam
Richard Sandiford
2009-08-01 08:08:36 UTC
Permalink
Post by Fu, Chao-Ying
Ex 2: (Calls not in a shared library)
# cat call.c
int t2() { return 1984 + t3(); }
int t3() { return 0; }
# cc1 -quiet call.c -O2 -mabicalls -mno-shared -G0 -o call.s -fno-inline-small-functions
# as-new call.s -o call.o -mips32r2
# objdump -dr call.o
call.o: file format elf32-tradbigmips
0: 03e00008 jr ra
4: 00001021 move v0,zero
8: 27bdffe0 addiu sp,sp,-32
c: afbf001c sw ra,28(sp)
10: 0c000000 jal 0 <t3> <-----------------
10: R_MIPS_26 t3
14: 00000000 nop
18: 8fbf001c lw ra,28(sp)
1c: 244207c0 addiu v0,v0,1984
20: 03e00008 jr ra
24: 27bd0020 addiu sp,sp,32
# ld-new call.o -o call
# objdump -dr call
call: file format elf32-tradbigmips
40006c: 03e00008 jr ra
400070: 00001021 move v0,zero
400074: 27bdffe0 addiu sp,sp,-32
400078: afbf001c sw ra,28(sp)
40007c: 0411fffb bal 40006c <t3> <-----------------
400080: 00000000 nop
400084: 8fbf001c lw ra,28(sp)
400088: 244207c0 addiu v0,v0,1984
40008c: 03e00008 jr ra
400090: 27bd0020 addiu sp,sp,32
You probably know this already, but since it wasn't explicitly
mentioned: -mno-shared -mexplicit-relocs will achieve the same effect
in cases like these, and should be more efficient. The optimisation is
still useful for cross-TU calls though. Hopefully LTO will eventually
make that case work with -mno-shared -mexplicit-relocs too.

Richard
Loading...