11

[2304.11062] Scaling Transformer to 1M tokens and beyond with RMT

 1 year ago
source link: https://arxiv.org/abs/2304.11062
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

[Submitted on 19 Apr 2023]

Scaling Transformer to 1M tokens and beyond with RMT

Download PDF

This technical report presents the application of a recurrent memory to extend the context length of BERT, one of the most effective Transformer-based models in natural language processing. By leveraging the Recurrent Memory Transformer architecture, we have successfully increased the model's effective context length to an unprecedented two million tokens, while maintaining high memory retrieval accuracy. Our method allows for the storage and processing of both local and global information and enables information flow between segments of the input sequence through the use of recurrence. Our experiments demonstrate the effectiveness of our approach, which holds significant potential to enhance long-term dependency handling in natural language understanding and generation tasks as well as enable large-scale context processing for memory-intensive applications.

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2304.11062 [cs.CL]
  (or arXiv:2304.11062v1 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2304.11062

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK