5

[2301.13779] FLAME: A small language model for spreadsheet formulas

 1 year ago
source link: https://arxiv.org/abs/2301.13779
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

[Submitted on 31 Jan 2023]

FLAME: A small language model for spreadsheet formulas

Download PDF

The widespread use of spreadsheet environments by billions of users presents a unique opportunity for formula-authoring assistance. Although large language models, such as Codex, can assist in general-purpose languages, they are expensive to train and challenging to deploy due to their large model sizes (up to billions of parameters). Moreover, they require hundreds of gigabytes of training data. We present FLAME, a T5-based model trained on Excel formulas that leverages domain insights to achieve competitive performance with a substantially smaller model (60M parameters) and two orders of magnitude less training data. We curate a training dataset using sketch deduplication, introduce an Excel-specific formula tokenizer for our model, and use domain-specific versions of masked span prediction and noisy auto-encoding as pretraining objectives. We evaluate FLAME on formula repair, formula auto-completion, and a novel task called syntax reconstruction. FLAME (60M) can outperform much larger models, such as Codex-Davinci (175B), Codex-Cushman (12B), and CodeT5 (220M), in 6 out of 10 settings.

Subjects: Programming Languages (cs.PL); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Cite as: arXiv:2301.13779 [cs.PL]
  (or arXiv:2301.13779v1 [cs.PL] for this version)
  https://doi.org/10.48550/arXiv.2301.13779

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK