4

Arm announces first Armv9 cores, including powerhouse Cortex-X2

 3 years ago
source link: http://linuxgizmos.com/arm-announces-first-armv9-cores-including-powerhouse-cortex-x2/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Arm announces first Armv9 cores, including powerhouse Cortex-X2

May 25, 2021 — by Eric Brown

— 286 views

arm_armv9_arch-thm.jpg
Arm unveiled its first Armv9 cores: Cortex-X2 (16% faster than -X1), Cortex-A710 (10% faster than -A78), and Cortex-A510 (35% faster than -A55), all with at least twice the AI power of their predecessors.

After announcing the Armv9 architecture in March, Arm has quickly followed up with the first three core IP designs based on it: the flagship, customizable Cortex-X2, as well as the “Big” Cortex-A710 and “Little” Cortex-A510. Arm skipped over the Cortex-A79 and -A56-59 naming schemes, perhaps to emphasize the fresh start with Armv9. Arm also announced a DSU-110 L3 bus follow-on to DynamIQ, which enables 8MB L3 on the Armv9 designs, as well as a high-end Mali-G710 GPU (see farther below).

Armv9 features the SVE2 (Scalable Vector Extensions) successor to the short-lived, HPC-oriented SVE and widely adopted NEON, enabling greater machine learning (ML) support and DSP-like operations. SVE2 offers a broader instruction set than SVE, with scalable SIMD capabilities ranging from 128-bit to 2048-bit. Among other advantages, this enables easier porting of SVE2-enabled code from low-end IoT processors to datacenter chips and back again. Intel has something similar with its Advanced Vector Extensions (AVX) technology, and RISC-V has the RV64GCV extension.

A typical Armv9 SoC configuration compared to Armv8 (at left)
Source: Arm
(click image to enlarge)

Armv9 greatly improves security with a Confidential Compute Architecture, which introduces secured, containerized execution environments called realms. The realms are represented by realm manager code that is 10 percent the size of a hypervisor and enables software to test it for trust in a way that is impossible with hypervisors. Other new security features include memory tagging extensions (MTEs) that can help tag buffer overflows and other potential design vulnerabilities.

As noted by AnandTech, the Cortex-X2 and Cortex-A510 are the first Arch64-only Cortex-A microarchitectures, which means they cannot execute AArch32 code. Oddly, the Cortex-A710 does support AArch32, at least through 2023, which is intended to offer better legacy support for the Chinese Android app market.

AnandTech also notes that the Cortex-X2 and Cortex-A710 offer more modest performance improvements over the previous generations compared to other recent Arm Cortex-A introductions. This is said to be due to the need to focus on the new Armv9 architecture before presumably amping up the optimizations in the following generation.

Cortex-X2

The Cortex-X2 design follows last year’s Cortex-X1, which has appeared on Qualcomm’s 5nm fabricated Snapdragon 888 and Samsung LSI’s Exynos 2100. The Cortex-X1 2100 offers 22 percent faster integer performance than the Cortex-A78, and Cortex-X2 is at least 16 percent faster overall than Cortex-X1, claims Arm.

Armv9 conceptual diagram
Source: Arm
(click image to enlarge)

The Cortex-X line is a departure from the usual Arm Cortex-A balancing act between performance, power, and area considerations. Instead, Arm has focused almost entirely on performance, especially with single-threaded “bursty” workloads. This is one reason why the Snapdragon 888 and Exynos 2100 use the X1 only on a single core clocked to 2.84GHz and 2.9GHz, respectively.

The Cortex-X line also breaks from the norm in that Arm is willing to customize the architecture for leading semiconductor licensees. This is a far cry from the freedom allowed by RISC-V, but a step in the right direction.

A 3.3GHz Cortex-X2 core enables 30 percent single-threaded performance improvements over the single X1 core on Snapdragon 888 and Exynos 2100 based smartphones, claims Arm. A 3.5GHz X2 offers 40 percent faster single-threaded performance compared to a quad-core Intel Core i5-1135G7 from its 11th Gen Tiger Lake family, claims the company.

According to AnandTech, Cortex-X2 provides improved branch prediction accuracy and reduces its pipeline length from 11 cycles to 10 for improved latency. The out-of-order ROB (reorder buffer) increases by 30 percent to 288 entries.

On the back end, the Cortex-X2 increases load-store windows and structure sizes by 33 percent. The X2 also provides all the Armv9 improvements in SVE2, security, and other enhancements

Cortex-A710

Unlike the Cortex-A2, the Cortex-A710, and Cortex-A510 continue to find a balance between performance and power consumption, as required by the mobile market. The Cortex-A710 is claimed to offer 10 percent higher “uplift performance at the same power envelope” than Cortex-A78, which is used for the mid-range cores on the Snapdragon 888. AnandTech suggests the comparison is unfair since Arm is comparing an -A710 with an 8MB L3 cache design, enabled by the new DSU-110 bus (see farther below), to 4MB L3 cache designs.

— ADVERTISEMENT —

Clear.png

Arm also claims up to 30 percent lower power consumption than Cortex-A78. Although this is a major leap, AnandTech points out that it is a long way from catching up with the lean consumption of the Apple M1.

On the front end, Cortex-A710 offers the same branch prediction and dispatch stage improvements as on the -X2. There are also changes to the prefetcher design for greater optimization from the DSU-110.

Cortex-A510

Like Cortex-A55, the Cortex-A510 is intended to run low-power tasks on a multi-core SoC. Its performance improvements mean “workloads can run longer on the ‘Little’ CPUs before switching to the ‘Big’ CPUs.” says Arm. This in turn “boosts the overall efficiency in the CPU cluster.”

Cortex-A510 offers a substantial 35 percent performance improvement over Cortex-A55, although once again Arm is comparing an -A510 with 8MB L3 with an -A55 with 4MB. Power efficiency is claimed to improve by up to 20 percent and ML uplift is 3x higher compared to 2x for the Cortex-X2 and -A710. Arm told AnandTech the Cortex-A510 was roughly equivalent with Cortex-A73, which powers SBCs such as the Odroid-N2 via an Amlogic S922X.

Although Cortex-A510 sticks with the in-order execution flow of the Cortex-A55, it is a “clean sheet” design that breaks with Cortex-A55 in other ways aside from switching to Armv9. For one thing, it’s designed by Arm’s Cambridge, UK team instead of the Austin, Texas operation that has been building recent Cortex-A cores.

Like Cortex-A710, the -A510 borrows branch prediction and prefetching enhancements from the X2. The Cortex-A510’s new 3-wide in-order design features a new merged-core microarchitecture to improve area efficiency. With merged core, manufacturers can design SoCs with a complex of up to two-core pairs, which share L2 cache and SVE2 pipelines.

DSU-110 and Mali-G710

Thanks to the newly announced DSU-110 L3 DynamIQ Shared Unit, the Cortex-X2 can be deployed in up to all-X2 octa-core configurations in a single cluster. The DSU-110 bus is the follow-on to Arm’s DynamIQ version of its Big.Little multi-core scheme.

DSU-110 scalability
Source: Arm
(click image to enlarge)

The DSU-110 enables the Cortex-X2 to support an L3 cache up to 16MB, although 8MB will be the norm for the first batch of X2-enable SoCs. The technology offers up to 5x times the L3 bandwidth and is scalable across the Cortex-X2, -A710 and -A510. The DSU-110 also improves single-core bandwidth, says AnandTech. Their report takes a deep dive into the DSU-110’s bi-directional dual-ring transport topology, each of which has 4x ring-stops, and which supports up to 8x cache slices.

Finally, Arm announced a Mali-G710 GPU, which offers a 20 percent performance improvement for compute intensive experiences, claims Arm. The GPU is claimed to provide 35 percent ML uplift for tasks such as image enhancement for new camera and video modes, claims Arm. There is also a scaled down Mali-G610 model. With the acquisition of Arm by Nvidia moving forward, one wonders whether Mali GPUs will disappear or become Nvidified.

Further information

We did not see any availability information for processors based on the new Armv9 Cortex-X2, Cortex-A710, and Cortex-A510 IP. More information may be found in Arm’s announcement and product page.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK