Typing Less, Coding More: How we delivered IntelliCode whole line completions wi...
source link: https://devblogs.microsoft.com/visualstudio/typing-less-coding-more-how-we-delivered-intellicode-whole-line-completions-with-a-transformer-model/?WT_mc_id=DOP-MVP-4025064
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Typing Less, Coding More: How we delivered IntelliCode whole line completions with a transformer model
Shengyu
February 16th, 20223
Introduction
Great code completions make you more productive while composing your code. Visual Studio 2022 now automatically completes C# code up to a whole line at a time, using a rich knowledge of your coding context. We have also released the IntelliCode Completions extension in Visual Studio Code (VSCode) to speed up coding in Python/TypeScript/JavaScript. Both Visual Studio and VSCode achieve this using a transformer model trained on large volume of code data; The research has been published in ESEC/FSE 2020. In this post weβll dive deeper into the technical advances weβve made to deliver the IntelliCode whole line completions experience.
Example of IntelliCode whole line completions for C# in Visual Studio
Example of IntelliCode whole line completions for python in Visual Studio Code
Multilingual Transformer Model for Code (GPT-C)
The IntelliCode whole line completion task is modeled to predict a sequence of tokens π = {πi}, i = 1β¦π, conditioned on preceding code tokens {ππ‘}, π‘ = 1β¦π. We need to estimate the following conditional probability distribution:
With the autoregressive generation, the objective is to maximize the sum of the log-likelihood:
πΏ(π) = βi log π (ππ |π0, π1, β¦ππ, ππβπ, ππβπ+1, β¦ππβ1; Ξ)
where π is the length of predicted code sequence, and the conditional probability π is modeled using a neural network with parameters Ξ. Ξ are learned through stochastic gradient descent optimization.
Recurrent Neural Networks (RNN) and its variance, Long Short-Term Memory (LSTM), formed the base of many Natural Language Processing (NLP) tasks. The main limitation of RNN is capturing long sequence dependencies. Transformers are a family of neural networks introduced to capture long sequence dependency through the attention mechanism. They have found numerous applications in the fields of NLP, including machine translation, question answering, and document summarization. Inspired by the GPT-2 transformer model developed by OpenAI, we trained a multi-layer transformer model for code generation (GPT-C) on more than half-million public open-source repositories for multiple programming languages.
During data pre-processing, we parse the source code into a sequence of tokens through a syntactic parser. Instead of learning representations for each token, we learn representations for sub-tokens generated through Byte Pair Encoding (BPE) tokenization. BPE tokenization is known for its benefits on solving the out-of-vocabulary problem and helping to reduce the size of the vocabulary to a substantial extent.
During model training, we scale the computation using a synchronous data-parallel distributed training algorithm with local gradient accumulation. The training module is implemented by integrating PyTorch and Horovod with the AdaSum algorithm for gradient summation. The model is trained on the ND-series virtual machines provisioned by Azure machine learning platform.
Besides evaluating our model with NLP metrics presented in the research paper, we also did extensive offline evaluation based on the location, length, and log-likelihood of the completion suggestions. The extensive offline evaluation and online metrics collected through internal previews guided us to set the right completion-triggering locations and confidence threshold.
The IntelliCode whole line completions run efficiently right on your local machine while youβre coding. To make this happen we needed to overcome the technical challenges deploying the model in Visual Studio and VSCode with limited memory on CPU. Below are the key steps we took to reduce the model size and boost the inference speed:
- By distilling the model from 26-layers to 8-layers, we reduced the model size from ~370 MB to ~200MB and boosted the inferencing speed by ~4x.
- By applying model quantization from FP32 to INT8 through the ONNX (Open Neural Network Exchange) Runtime, we further reduced the model size from ~200MB to ~80MB.
- By moving beam search implementation from managed code to the ONNX computing graph, we further boosted the inferencing speed by ~4x. The beam search optimization work has been contributed back to the ONNX Runtime on GitHub.
- By leveraging Microsoftβs open-sourced BlingFire tokenizer, the time spent on BPE tokenization has been reduced by ~3X.
Through the optimizations above, we successfully shipped the GPT-C transformer model running locally in both Visual Studio and VSCode, thanks to our collaborators across Microsoft: Microsoft Research Asia, Azure AI Platform and Turing team.
Whatβs Next: More context for better predictions
Currently we are only using limited code context to generate the recommendations. In the next version of the model, we will be incorporating extended code context inside the document to improve model accuracy. We have published our research in ENMLP 2021, and in the process of productizing the new model with extended code context.
Help Us to Improve
If you are a C# coder in VS, please install Visual Studio 2022 to try out the new IntelliCode. For VSCode users, please install IntelliCode Completions extension to code in Python/TypeScript/JavaScript, and watch for more languages (e.g., Java) to be enabled. IntelliCode has benefited from all the constructive feedback received from you β Thank you!
Please report any issues you see via Developer Community and file feature requests. Happy coding!
Shengyu Fu
Principal Applied Science Manager, DevDiv Data&AI Team
Follow
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK