[2306.11987] Training Transformers with 4-bit Integers

1 year ago

source link: https://arxiv.org/abs/2306.11987
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Computer Science > Machine Learning

[Submitted on 21 Jun 2023 (v1), last revised 22 Jun 2023 (this version, v2)]

Training Transformers with 4-bit Integers

Download PDF

Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural network training. However, existing 4-bit training methods require custom numerical formats which are not supported by contemporary hardware. In this work, we propose a training method for transformers with all matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging. To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers. For backpropagation, we leverage the structural sparsity of gradients by proposing bit splitting and leverage score sampling techniques to quantize gradients accurately. Our algorithm achieves competitive accuracy on a wide range of tasks including natural language understanding, machine translation, and image classification. Unlike previous 4-bit training methods, our algorithm can be implemented on the current generation of GPUs. Our prototypical linear operator implementation is up to 2.2 times faster than the FP16 counterparts and speeds up the training by up to 35.1%.

Comments:	9 pages, 8 figures
Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2306.11987 [cs.LG]
	(or arXiv:2306.11987v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.11987

Submission history

From: Haocheng Xi [view email]
[v1] Wed, 21 Jun 2023 02:45:01 UTC (3,822 KB)
[v2] Thu, 22 Jun 2023 20:09:02 UTC (3,822 KB)

Recommend

www.aclweb.org 3 years ago
Cache

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingJacob Devlin,

arxiv.org 3 years ago
Cache

[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language...

[Submitted on 11 Oct 2018 (v1), last revised 24 May 2019 (this version, v2)] BERT: Pre-training of Deep Bidirectional Transformers for Language Understandin...

Github github.com 3 years ago
Cache

Github perf: Avoid an Option in the `Map*` futures by Marwes · Pull Request #230...

perf: Avoid an Option in the `Map*` futures #2306

132

Github github.com 3 years ago
Cache

Github GitHub - facebookresearch/dino: PyTorch code for Vision Transformers trai...

Self-Supervised Vision Transformers with DINO PyTorch implementation and pretrained models for DINO. For details, see Emerging Properties in Self-Supervised Vision Transformers. [

arxiv.org 1 year ago
Cache

[2306.00238] Bytes Are All You Need: Transformers Operating Directly On File Byt...

[Submitted on 31 May 2023] Bytes Are All You Need: Transformers Operating Directly On File Bytes

blogs.sap.com 1 year ago
Cache

SAP Business Network for Logistics 2306 Release – What’s New?

Corinna Kramer June 22, 2023 4 minute read...

blog.xiaket.org 1 year ago
Cache

Pensieve: 2306

Pensieve: 2306 2023-06-25 21:30 仍然是玩游戏中抽空读书, 这个月好歹读了三本. 第一本是一本漫画, Revenge of the Librarians, 基本算...

www.51cto.com 1 year ago
Cache

微软推出 2306 版 Windows 11虚拟机，更新“Moment 3”

微软推出 2306 版 Windows 11虚拟机，更新“Moment 3” 作者：远洋 2023-06-26 06:50:25 WDE 2306 基于 Windows 11 企业版 SKU，版本号是 22621.1848，这是目前稳定通道中最新的版本，22621.1848 也被称为“Moment 3...

arxiv.org 1 year ago
Cache

[2306.09633] The False Dawn: Reevaluating Google's Reinforcement Learning for Ch...

[Submitted on 16 Jun 2023 (v1), last revised 21 Jun 2023 (this version, v3)] The False Dawn: Reevaluating Google's Reinforcement...

www.exploit-db.com 1 year ago
Cache

[remote] Microsoft Outlook Microsoft 365 MSO (Version 2306 Build 16.0.16529.2010...

Microsoft Outlook Microsoft 365 MSO (Version 2306 Build 16.0.16529.20100) 32-bit - Remote C...

[2306.11987] Training Transformers with 4-bit Integers

Computer Science > Machine Learning

Training Transformers with 4-bit Integers

Submission history

Recommend

About Joyk