11

PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|al...

 2 years ago
source link: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553212.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Qing Zhao [email protected]
Thu Sep 3 14:29:54 GMT 2020


Hi,

Per request, I collected runtime performance data and code size data with CPU2017 on a X86 platform. 

*** Machine info:
model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
$ lscpu | grep NUMA
NUMA node(s):          2
NUMA node0 CPU(s):     0-21,44-65
NUMA node1 CPU(s):     22-43,66-87

***CPU2017 benchmarks: 
all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks. 

***Configures:
Intrate and fprate, 22 copies. 

***Compiler options:
no : 				-g -O2 -march=native
used_gpr_arg:  	no + -fzero-call-used-regs=used-gpr-arg
used_arg:  	 	no + -fzero-call-used-regs=used-arg
all_arg:			no + -fzero-call-used-regs=all-arg
used_gpr:		no + -fzero-call-used-regs=used-gpr
all_gpr:			no + -fzero-call-used-regs=all-gpr
used:			no + -fzero-call-used-regs=used
all:				no + -fzero-call-used-regs=all

***each benchmark runs 3 times. 

***runtime performance data:
Please see the attached csv file


From the data, we can see that:
On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
Looks like the overhead of zeroing vector registers is much bigger. 

For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.

***code size increase data:

Please see the attached file 


From the data, we can see that:
The code size impact in general is very small, the biggest is “all_arg”, which is 1.06% for integer benchmark, and 1.13% for FP benchmarks.

So, from the data collected, I think that the run-time overhead and code size increase from this option are very reasonable. 

Let me know you comments and opinions.

thanks.

Qing

> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches <[email protected]> wrote:
>>>>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <[email protected]> wrote:
>>>> Hi!
>>>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <[email protected]> wrote:
>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <[email protected]> wrote:
>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>>> use it, it helps security at most none at all :-(
>>>>>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>>> enough users that it will be worth the effort for us.  Which is why I
>>>> keep hammering on this point.
>>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>>> For this testing? (Is CPU2017 good enough)?
>>>> I would use something more real-life, not 12 small pieces of code.
>> There is some basic information about the benchmarks of CPU2017 in below link:
>> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> >
>> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
> And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).
>> thanks.
>> Qing





More information about the Gcc-patches mailing list

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK