Discussion:
extract ELF load address with binutils?
Radouch, Zdenek
2014-03-18 17:02:32 UTC
Permalink
I am writing a firmware updater that takes an ELF executable and needs to extract the RAM data
and the address to where the data should be loaded. I create the data chunk with objcopy -O binary,
and need the address of the first section that went into that chunk. I'd like to do that
from a shell script invoking binutils (rather than writing my own version of a binutil),
but can't figure out how. My first intuitive solution "readelf -l" does not work at all.

Here is an example file (b2.axf) I get from my vendor.
[the file represents a RAM image with 3032 bytes @ 0x15f000]

$ file b2.axf
b2.axf: ELF 32-bit LSB executable, ARM, version 1 (SYSV), statically linked, not stripped
$ arm-none-eabi-size b2.axf
text data bss dec hex filename
3028 4 2052 5084 13dc b2.axf
$ arm-none-eabi-objcopy -O binary b2.axf xxx
$ wc -c xxx
3032 xxx
$

So far, all is well. I got my 3032-byte chunk of data and confirmed
its size (3028 text + 4 data). The question is where is this chunk loaded?

$ arm-none-eabi-readelf -l b2.axf

Elf file type is EXEC (Executable file)
Entry point 0x15f001
There are 2 program headers, starting at offset 52

Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00158000 0x00158000 0x07bd8 0x07bd8 RWE 0x8000
LOAD 0x00f000 0x2001f000 0x2001f000 0x00000 0x00804 RW 0x8000

Section to Segment mapping:
Segment Sections...
00 .text .data
01 .bss .main_stack
$

I don't understand the purpose of this output; it appears (certainly from the loading
perspective) wrong, as the second segment should not be loaded at all, and the first
one includes some 28k of alignment-related padding loaded at addresses that may not
even exist (0x158000) within the hardware.

Clearly, the ELF file has what I need: the Addr field of the .text section ([1])
is the load address. See below

$
$ arm-none-eabi-readelf -S b2.axf
There are 18 section headers, starting at offset 0x246a4:

Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0015f000 007000 000bd4 00 AX 0 0 4
[ 2] .data PROGBITS 0015fbd4 007bd4 000004 00 WA 0 0 4
[ 3] .bss NOBITS 2001f000 00f000 000004 00 WA 0 0 4
[ 4] .main_stack NOBITS 2001f004 00f000 000800 00 WA 0 0 1
[ 5] .debug_info PROGBITS 00000000 007bd8 00e4e3 00 0 0 1
[ 6] .debug_abbrev PROGBITS 00000000 0160bb 001a2a 00 0 0 1
[ 7] .debug_loc PROGBITS 00000000 017ae5 002e67 00 0 0 1
[ 8] .debug_aranges PROGBITS 00000000 01a94c 000690 00 0 0 1
[ 9] .debug_ranges PROGBITS 00000000 01afdc 000688 00 0 0 1
[10] .debug_line PROGBITS 00000000 01b664 0025eb 00 0 0 1
[11] .debug_str PROGBITS 00000000 01dc4f 005aeb 01 MS 0 0 1
[12] .comment PROGBITS 00000000 02373a 000030 01 MS 0 0 1
[13] .ARM.attributes ARM_ATTRIBUTES 00000000 02376a 000033 00 0 0 1
[14] .debug_frame PROGBITS 00000000 0237a0 000e4c 00 0 0 4
[15] .shstrtab STRTAB 00000000 0245ec 0000b7 00 0 0 1
[16] .symtab SYMTAB 00000000 024974 000bf0 10 17 142 4
[17] .strtab STRTAB 00000000 025564 0003aa 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
$

The question is can I somehow convince one of the binutils to give me the load address alone,
so that I don't have to invent an algorithm extracting the address from the section dump?

Thanks,
-Z
P***@Dell.com
2014-03-18 17:12:30 UTC
Permalink
Post by Radouch, Zdenek
I am writing a firmware updater that takes an ELF executable and needs to extract the RAM data
and the address to where the data should be loaded. ...
The question is can I somehow convince one of the binutils to give me the load address alone,
so that I don't have to invent an algorithm extracting the address from the section dump?
I’m not sure the notion of “THE load address” makes sense. It may be valid for your specific case, but not in general. ELF files can have multiple load sections, each of which has a load address.

Normally I would say: look in the program headers. Each header of type LOAD describes something that’s loaded, and it shows the addresses.

paul
Radouch, Zdenek
2014-03-18 17:44:11 UTC
Permalink
But as I (and objcopy) have illustrated:
1. Not all LOAD types get loaded
2. The address where the segment is loaded can be "wrong", when the loaded segment
has been padded.

I agree about the multiple segments, but I can't make it work even for one.
The padding clearly comes from the non-embedded world where throwing away 32k of memory
may be OK; I don't have that luxury.

Objcopy says:
"When objcopy generates a raw binary file, it will essentially produce a memory dump
of the contents of the input object file." Memory dump is what I want, to me "undumping"
the dump is synonymous with loading the right data in the right place in memory.
It is my understanding that the LOAD segment itself can contain an ELF header so it
can be loaded without additional metadata. I have no idea how (or whether) I can make
readelf give me the actual data to be loaded.
That's where I see a big difference between objcopy and readelf.

-Z


-----Original Message-----
From: ***@Dell.com [mailto:***@Dell.com]
Sent: Tuesday, March 18, 2014 1:13 PM
To: Radouch, Zdenek
Cc: ***@sourceware.org
Subject: Re: extract ELF load address with binutils?
Post by Radouch, Zdenek
I am writing a firmware updater that takes an ELF executable and needs
to extract the RAM data and the address to where the data should be loaded. ...
The question is can I somehow convince one of the binutils to give me
the load address alone, so that I don't have to invent an algorithm extracting the address from the section dump?
I'm not sure the notion of "THE load address" makes sense. It may be valid for your specific case, but not in general. ELF files can have multiple load sections, each of which has a load address.

Normally I would say: look in the program headers. Each header of type LOAD describes something that's loaded, and it shows the addresses.

paul
Alan Modra
2014-03-18 23:44:19 UTC
Permalink
Post by Radouch, Zdenek
1. Not all LOAD types get loaded
Correct, only those with p_filsiz (readelf -l FileSize column)
non-zero. p_memsiz specifies a bss type area that is usually cleared
to zero by a program loader.
Post by Radouch, Zdenek
2. The address where the segment is loaded can be "wrong", when the loaded segment
has been padded.
The address you showed isn't due to padding. You're seeing 0x158000
when .text starts at 0x15f000 because you linked the object for
dynamic paging with a page size of 0x8000. That imposes constraints
on p_vaddr. You will also be loading the ELF file header and program
headers, which may not be what you want.. See ld -n and ld -N
options.
--
Alan Modra
Australia Development Lab, IBM
Radouch, Zdenek
2014-03-19 12:04:18 UTC
Permalink
I'll re-phrase my question to make it clear what I am looking for.

First some background -- while I never looked at any of the BFD code
I do have a pretty good idea about both the linking process and ARM/ELF
issues, including the 0x8000 page size (OK, so I called it "alignment-related
pading" and now I've been corrected -- it's not "padding" :-)).

What I am after is improving the back end of the embedded development process, the
firmware loading at manufacturing or (as in my case) at run time.
The commonly used paradigm today (I have seen it on numerous ARM
projects, and I always set it up the same way, too) is the following:

1. Hard-code the memory load address in the linker script
2. Build and link (often with back-box environment - forget ld -n -N)
or as in my case don't link, get an already linked file from someone
3. Run objcopy -O binary to extract the memory image
4. Load the memory image using the hard-coded address from the step #1

That's the reality of the embedded life. Asking people to change/ link differently
when they can see that what they produce works fine won't fly, they wouldn't
do it even if they were willing since the linking and linker scripts in the embedded
space are typically not understood even by the people who set up the projects.

I am simply questioning whether or not I could, with some moderate effort (i.e., shell/python)
fix the very fragile last step (4) that requires the load address to be "carried along" with
the object file. I do know that all of the necessary info is in the ELF file, objcopy knows
that, too, and successfully extracts the data, the only problem is that objcopy is silent
about what it did. So my question is:

Can I get the load address (i.e., the address corresponding to the first section
that went into the image objcopy made)?
And yes it is THE address, as objcopy produces a single image.

If the answer is no, not with binutils as they are today, then my second question is what is the
algorithm objcopy uses? I can duplicate it on the output of "readelf -S".

Thanks
-Z


________________________________________
From: Alan Modra [***@gmail.com]
Sent: Tuesday, March 18, 2014 7:44 PM
To: Radouch, Zdenek
Cc: ***@Dell.com; ***@sourceware.org
Subject: Re: extract ELF load address with binutils?
Post by Radouch, Zdenek
1. Not all LOAD types get loaded
Correct, only those with p_filsiz (readelf -l FileSize column)
non-zero. p_memsiz specifies a bss type area that is usually cleared
to zero by a program loader.
Post by Radouch, Zdenek
2. The address where the segment is loaded can be "wrong", when the loaded segment
has been padded.
The address you showed isn't due to padding. You're seeing 0x158000
when .text starts at 0x15f000 because you linked the object for
dynamic paging with a page size of 0x8000. That imposes constraints
on p_vaddr. You will also be loading the ELF file header and program
headers, which may not be what you want.. See ld -n and ld -N
options.

--
Alan Modra
Australia Development Lab, IBM
Erik Christiansen
2014-03-19 13:24:04 UTC
Permalink
Post by Radouch, Zdenek
1. Hard-code the memory load address in the linker script
2. Build and link (often with back-box environment - forget ld -n -N)
or as in my case don't link, get an already linked file from someone
3. Run objcopy -O binary to extract the memory image
4. Load the memory image using the hard-coded address from the step #1
...
Post by Radouch, Zdenek
I am simply questioning whether or not I could, with some moderate
effort (i.e., shell/python) fix the very fragile last step (4) that
requires the load address to be "carried along" with the object file.
I do know that all of the necessary info is in the ELF file, objcopy
knows that, too, and successfully extracts the data, the only problem
Since you can run a script at the programming site, is there any reason
not to just send the elf file, so the script can extract address(es),
(E.g. from "objdump -h xxx.elf" and a line or two of awk, or
nm xxx.elf | grep <Your_start_address_symbol>),
as well as run the "objcopy -O binary"?

For myself, I've always just sent intel or motorola hex files for
loading to the platform - then the load address is also included,
though not so easily human-readable. (But the programming tools handle
it without human intervention, since that's what these formats are for.)

Erik
--
"No one, however smart, however well-educated, however experienced, is the
suppository of all wisdom." - Tony Abbott, then Australian opposition leader, now PM.
Radouch, Zdenek
2014-03-20 18:11:28 UTC
Permalink
Post by Erik Christiansen
Since you can run a script at the programming site, is there any reason
not to just send the elf file, so the script can extract address(es),
That *is* the idea, i.e., sending a single file.
Post by Erik Christiansen
(E.g. from "objdump -h xxx.elf" and a line or two of awk,
Well if you re-read my original post, I said that I simply do not
know how to do it. I am an end user; I don't speak binultils internals.
That is, I don't know how much variation I may face in the sections.
I have no idea why "objdump -h" gives me .text as the first section, and
"readelf -S" on the same file gives it to me as the second one.
I do not know if .text will always be the first chunk in the segment.
That's why my preferred solution would be to rely on one of the binutils
to figure it out for me.

My second solution is to extract the info from a binutil output,
as you suggested. For that, I need an algorithm for finding the section
that will be placed at the beginning of the first segment.
I was willing to throw ten or twenty lines of python at it, if you can
do it with awk, then could I please have the one English sentence that
describes what needs to be done to get the load address?
Post by Erik Christiansen
or nm xxx.elf | grep <Your_start_address_symbol>),
Can't use nm as there may not be a symbol for the load address
Post by Erik Christiansen
For myself, I've always just sent intel or motorola hex files for
loading to the platform - then the load address is also included,
though not so easily human-readable. (But the programming tools handle
it without human intervention, since that's what these formats are for.)
Erik
You just gave me a great idea for an alternate solution (i.e., if there is
no straight way to parse the section headers):

objcopy -O binary foo.elf foo.data
objcopy -O srec foo.elf | s19toladdr >foo.addr
All I have to do in s19toladdr is to parse the first line to extract
the load address.

Thanks!

-Z

Loading...