11

[2304.11062] Scaling Transformer to 1M tokens and beyond with RMT

1 year ago

source link: https://arxiv.org/abs/2304.11062
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

neoserver,ios ssh client

[Submitted on 19 Apr 2023]

Scaling Transformer to 1M tokens and beyond with RMT

This technical report presents the application of a recurrent memory to extend the context length of BERT, one of the most effective Transformer-based models in natural language processing. By leveraging the Recurrent Memory Transformer architecture, we have successfully increased the model's effective context length to an unprecedented two million tokens, while maintaining high memory retrieval accuracy. Our method allows for the storage and processing of both local and global information and enables information flow between segments of the input sequence through the use of recurrence. Our experiments demonstrate the effectiveness of our approach, which holds significant potential to enhance long-term dependency handling in natural language understanding and generation tasks as well as enable large-scale context processing for memory-intensive applications.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2304.11062 [cs.CL]
	(or arXiv:2304.11062v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2304.11062

Recommend

85
- ldqk.org 3 years ago
- Cache
FL Studio 20.8.3.2304 免安装学习版 (修正 4)
FL Studio 20.8.3.2304 免安装学习版 (修正 4)...
39
- masuit.com 3 years ago
- Cache
FL Studio 20.8.3.2304 免安装学习版 (修正 10) + 学习补丁
FL Studio 20.8.3.2304 免安装学习版 (修正 10) + 学习补...
37
- segmentfault.com 2 years ago
- Cache
TypeScript Error 2304: Cannot find name 'div' - CRA TS Template
TypeScript Error 2304: Cannot find name 'div' - CRA TS Template发布于 51 分钟前使用 react 官方文档上的命令 npx create-react-app my-app --template t...
4
- finance.sina.com.cn 2 years ago
- Cache
AMD正式发布RX 6700显卡：2304流处理器，10GB显存
AMD正式发布RX 6700显卡：2304流处理器，10GB显存　　新酷产品第一时间免费试玩，还有众多优...
7
- finance.sina.com.cn 2 years ago
- Cache
英伟达RTX 3050 OEM显卡现身：CUDA核心缩至2304
英伟达RTX 3050 OEM显卡现身：CUDA核心缩至2304
5
- arxiv.org 1 year ago
- Cache
[2304.12210] A Cookbook of Self-Supervised Learning
[Submitted on 24 Apr 2023] A Cookbook of Self-Supervised Learning Download PDF
6
- blogs.sap.com 1 year ago
- Cache
SAP Business Network for Logistics 2304 Release – What’s New?
Corinna Kramer April 27, 2023 4 minute read...
5
- arxiv.org 1 year ago
- Cache
[2304.12517] The 2-MAXSAT Problem Can Be Solved in Polynomial Time
[Submitted on 25 Apr 2023] The 2-MAXSAT Problem Can Be Solved in Polynomial Time
1
- arxiv.org 1 year ago
- Cache
[2304.14717] faulTPM: Exposing AMD fTPMs' Deepest Secrets
Computer Science > Cryptography and Security [Submitted on 28 Apr 2023] faulTPM: Exposing AMD fTPMs...
5
- blog.xiaket.org 1 year ago
- Cache
Pensieve: 2304
Pensieve: 2304 2023-04-30 22:33 所观所读所玩所听重读了Jay Rubin翻译的挪威森林, 直接原因是月初的时候翻上个月读的那本轻小说, 我加载了恋爱游戏. 这本书里提及了很多文学作品, 但是完全没有提及挪威的森林....

About Joyk

Aggregate valuable and interesting links.
Joyk means Joy of geeK