PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|al...

Thu Sep 3 14:29:54 GMT 2020

Hi,

Per request, I collected runtime performance data and code size data with CPU2017 on a X86 platform. 

*** Machine info:
model name>-----: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
$ lscpu | grep NUMA
NUMA node(s):          2
NUMA node0 CPU(s):     0-21,44-65
NUMA node1 CPU(s):     22-43,66-87

***CPU2017 benchmarks: 
all the benchmarks with C/C++, 9 Integer benchmarks, 10 FP benchmarks. 

***Configures:
Intrate and fprate, 22 copies. 

***Compiler options:
no : 				-g -O2 -march=native
used_gpr_arg:  	no + -fzero-call-used-regs=used-gpr-arg
used_arg:  	 	no + -fzero-call-used-regs=used-arg
all_arg:			no + -fzero-call-used-regs=all-arg
used_gpr:		no + -fzero-call-used-regs=used-gpr
all_gpr:			no + -fzero-call-used-regs=all-gpr
used:			no + -fzero-call-used-regs=used
all:				no + -fzero-call-used-regs=all

***each benchmark runs 3 times. 

***runtime performance data:
Please see the attached csv file

From the data, we can see that:
On average, all the options starting with “used_…”  (i.e, only the registers that are used in the routine will be zeroed) have very low runtime overheads, at most 1.72% for integer benchmarks, and 1.17% for FP benchmarks. 
If all the registers will be zeroed, the runtime overhead is bigger, all_arg is 5.7%, all_gpr is 3.5%, and all is 17.56% for integer benchmarks on average. 
Looks like the overhead of zeroing vector registers is much bigger. 

For ROP mitigation, -fzero-call-used-regs=used-gpr-arg should be enough, the runtime overhead with this is very small.

***code size increase data:

Please see the attached file 

From the data, we can see that:
The code size impact in general is very small, the biggest is “all_arg”, which is 1.06% for integer benchmark, and 1.13% for FP benchmarks.

So, from the data collected, I think that the run-time overhead and code size increase from this option are very reasonable. 

Let me know you comments and opinions.

thanks.

Qing

> On Aug 25, 2020, at 4:54 PM, Qing Zhao via Gcc-patches <[email protected]> wrote:
>>>>> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool <[email protected]> wrote:
>>>> Hi!
>>>> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool <[email protected]> wrote:
>>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>>>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool <[email protected]> wrote:
>>>>>> Numbers on how expensive this is (for what arch, in code size and in
>>>>>> execution time) would be useful.  If it is so expensive that no one will
>>>>>> use it, it helps security at most none at all :-(
>>>>>>>> Without numbers on this, no one can determine if it is a good tradeoff
>>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>>> enough users that it will be worth the effort for us.  Which is why I
>>>> keep hammering on this point.
>>> I can collect some run-time overhead data on this, do you have a recommendation on what test suite I can use
>>> For this testing? (Is CPU2017 good enough)?
>>>> I would use something more real-life, not 12 small pieces of code.
>> There is some basic information about the benchmarks of CPU2017 in below link:
>> https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$ <https://urldefense.com/v3/__https://www.spec.org/cpu2017/Docs/overview.html*suites__;Iw!!GqivPVa7Brio!PmRE_sLg10gVnn1UZLs1q1TPoTV0SCnw0Foo5QQlZgD03MeL0KIyPVXl0XlvVVRP$> >
>> GCC itself is one of the benchmarks in CPU2017 (502.gcc_r). And 526.blender_r is even larger than 502.gcc_r. 
> And there are several other quite big benchmarks as well (perlbench, xalancbmk, parest, imagick, etc).
>> thanks.
>> Qing

PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|al...

PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Recommend

[PATCH] Makefile: Introduce CONFIG_ZERO_CALL_USED_REGS

90后体检报告：一边养生，一边慢性自杀

Samsung One UI 4 beta 2 includes Material You-style color themes and new mic con...

国庆海报合集，品牌们用创意为祖国庆生

Adding a code coverage badge to a PowerShell project

ROPgadget/ropmakerx64.py at master · JonathanSalwan/ROPgadget · GitHub

Linux Plumbers Conference 2021

[1902.10880] Is Less Really More? Why Reducing Code Reuse Gadget Counts via Soft...

7 位独具魅力的女神创意人

Drop the needle on YouTube Music's new record-shaped widget, now available in An...

About Joyk