Discussion:
[Patch] multibyte encodings in strings
m***@exitno.de
2018-10-30 11:04:09 UTC
Permalink
Dear binutils team,

I am sending you a patch fixing an issue in the binary 'strings'.
The issue concerned finding multibyte encoded strings at odd offsets.

To test this issue try the following line

echo "aa53007400720069006e0067003100aa53007400720069006e0067003200bb" | xxd -p -r | strings -el

Only the second string will be found.

The attached patch fixes this problem by doing only a single byte step if an
invalid character has been found.

Do not hesitate to contact me in case of questions or feedback.

Happy to contribute,

Mathias
Nick Clifton
2018-11-09 12:49:28 UTC
Permalink
Hi Mathias,
Post by m***@exitno.de
I am sending you a patch fixing an issue in the binary 'strings'.
The issue concerned finding multibyte encoded strings at odd offsets.
Thanks for the bug report and the patch.

I have decided to consider the patch as "obvious", in the sense that
it does not need an FSF copyright assignment, although I would still
encourage you to apply for such an assignment should you want to
contribute further patches.

I have applied the patch along with a couple of extra updates.
Specifically I have created a ChangeLog entry for the patch, and a
new test in the binutils testsuite in order to make sure that the
patch continues to work.

There is one issue however that I would like to check with you.
The original (unpatched) decoding of the test binary that you
supplied produces the output: "String2" whereas the patched
version of string produces: "String1" and "tring2". Is this
correct ? I was kind of expecting the output to be "String1"
and "String2".

Cheers
Nick

binutils/ChangeLog
2018-11-09 Mathias <***@exitno.de>

* strings.c (print_strings): Check for multibyte encodings.
* binutils-all/strings-1.bin: New file. Test binary for string decoding.
* testsuite/binutils-all/strings.exp: New file. Test the strings program.
* testsuite/config/default.exp (STRINGS): Define if not provided
by the environment.
(STRINGSFLAGS): Likewise.
m***@exitno.de
2018-11-13 10:38:55 UTC
Permalink
Hi Nick,
Post by Nick Clifton
I have applied the patch along with a couple of extra updates.
Specifically I have created a ChangeLog entry for the patch, and a
new test in the binutils testsuite in order to make sure that the
patch continues to work.
Thanks.
Post by Nick Clifton
There is one issue however that I would like to check with you.
The original (unpatched) decoding of the test binary that you
supplied produces the output: "String2" whereas the patched
version of string produces: "String1" and "tring2". Is this
correct ? I was kind of expecting the output to be "String1"
and "String2".
Yes the output should be "String1" and "String2". As I am getting the expected
output on all my systems it is difficult for me to guess what is going wrong.
It looks as if your system accepts "aa53" as a printable utf-16-le char.
Could you please run

echo "aa53007400720069006e0067003100aa53007400720069006e0067003200bb" | xxd -p -r | ./strings -el | xxd

to check if an additional character gets printed after "String1" and run

echo "aa53007400720069006e00670031000053007400720069006e0067003200bb" | xxd -p -r | ./strings -el
echo "aa53007400720069006e0067003100aabbcc53007400720069006e0067003200bb" | xxd -p -r | ./strings -el

to check if these yield the expected results?

Kind regards
Mathias
Nick Clifton
2018-11-13 15:02:22 UTC
Permalink
Hi Mathias,
Post by m***@exitno.de
Post by Nick Clifton
correct ? I was kind of expecting the output to be "String1"
and "String2".
Yes the output should be "String1" and "String2". As I am getting the expected
output on all my systems it is difficult for me to guess what is going wrong.
Ah, so if "strings -el" is working for you, then at least I can be sure that
I applied the patch correctly.
Post by m***@exitno.de
It looks as if your system accepts "aa53" as a printable utf-16-le char.
Could you please run
echo "aa53007400720069006e0067003100aa53007400720069006e0067003200bb" | xxd -p -r | ./strings -el | xxd
to check if an additional character gets printed after "String1"
If by "additional" you mean a period character then yes:

00000000: 5374 7269 6e67 310a 5374 7269 6e67 320a String1.String2.
Post by m***@exitno.de
and run
echo "aa53007400720069006e00670031000053007400720069006e0067003200bb" | xxd -p -r | ./strings -el
Gives:

String1
String2
Post by m***@exitno.de
echo "aa53007400720069006e0067003100aabbcc53007400720069006e0067003200bb" | xxd -p -r | ./strings -el
Gives the same:

String1
String2
Post by m***@exitno.de
to check if these yield the expected results?
Which I assume is not right. Presumably this is due to my environment. In particular
I have LC_CTYPE=C which I think affects character display, right ?

Cheers
Nick
m***@exitno.de
2018-11-13 19:45:46 UTC
Permalink
Hi Nick,
Post by Nick Clifton
Post by m***@exitno.de
Could you please run
echo "aa53007400720069006e0067003100aa53007400720069006e0067003200bb" | xxd -p -r | ./strings -el | xxd
to check if an additional character gets printed after "String1"
00000000: 5374 7269 6e67 310a 5374 7269 6e67 320a String1.String2.
As the period is just a placeholder for the newline (0x0a) the output would be

String1
String2

Which is correct.
Post by Nick Clifton
Post by m***@exitno.de
echo "aa53007400720069006e00670031000053007400720069006e0067003200bb" | xxd -p -r | ./strings -el
String1
String2
Post by m***@exitno.de
echo "aa53007400720069006e0067003100aabbcc53007400720069006e0067003200bb" | xxd -p -r | ./strings -el
String1
String2
All commands should give

String1
String2

So to me it looks like everything works just fine. If you can reproduce the
conditions where the output is "String1" and "tring2" please let me know.
Otherwise it should be ok to change the testcase to expect "String1" and
"String2"

Thanks a lot
Mathias

Loading...