10

Introducing code referencing for GitHub Copilot

 1 year ago
source link: https://github.blog/2023-08-03-introducing-code-referencing-for-github-copilot/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Introducing code referencing for GitHub Copilot

Today, we’re announcing a private beta of GitHub Copilot with code referencing that includes a filter to detect code suggestions matching public code on GitHub.

Introducing code referencing for GitHub Copilot
Author
August 3, 2023

Make more informed decisions about the code you use. In the rare case where a GitHub Copilot suggestion matches public code, this update will show a list of repositories where that code appears and their licenses. Sign up for the private beta today.

Over the course of the last year, GitHub Copilot, the world’s first at-scale AI pair programmer trained on billions of lines of public code, has attracted more than 1 million developers and helped over 27,000 organizations build faster and more productively. During that time, many developers told us they want to see when GitHub Copilot’s suggestions match public code.

Today, we’re announcing a private beta of GitHub Copilot with code referencing that includes an updated filter which detects and shows context of code suggestions matching public code on GitHub. When the filter is enabled, GitHub Copilot checks code suggestions with surrounding code of about 150 characters and compares it against an index of all the public code on GitHub.com. Matches—along with information about every repository in which they appear—are displayed right in the editor. Developers can now choose whether to block suggestions containing matching code, or allow those suggestions with information about matches.

Why? Some want to learn from others’ work, others may want to take a dependency rather than introduce new app logic, and still others want to give or receive credit for similar work. Whatever the reason, it’s nice to know when similar code is out there.

Let’s see how it works.

How GitHub Copilot code referencing works

With billions of files to index and a latency budget of only 10-20ms, it’s a miracle of engineering that this is even possible. Still, if there’s a match, a notification appears in the editor showing: (1) the matching code, (2) the repositories where that code appears, and (3) the license governing each repository.

Why code referencing matters

In our journey to create a code referencing tool, we discovered a few interesting things:

First, our previous research suggests that matches occur in less than one percent of GitHub Copilot suggestions. But that one percent isn’t evenly distributed across all use cases. In the context of an existing application with surrounding code, we almost never see a match. But in an empty or nearly empty file, we see matches far more often.

Suggestions are heavily biased toward the prompt so GitHub Copilot can provide suggestions tailor-made for your current task. That means, in an existing app with lots of context, you’ll get a suggestion customized for your code. But in an empty, or nearly empty file, there’s little to no context. So, you’re more likely to get a suggestion that matches public code.

We’ve also found that when suggestions match public code, those matches frequently appear in dozens, if not hundreds of repositories. In some ways, this isn’t surprising because the models that power GitHub Copilot are akin to giant probability machines. A code fragment that appears in many repositories is more likely to be a “pattern” detected by the model—similar to the patterns we see elsewhere in public code.

For example, research on Java projects finds that up to 11% of repositories may contain code that resembles solutions posted to Stack Overflow, and the vast majority of those snippets appear without attribution. Another study on Python found that many matches are too generic to trace to an original usage. A smaller-scale study found that Stack Overflow answers contain code from Android applications.

Finally, the repositories with matching code are often governed by multiple, sometimes conflicting licenses, which makes attributing a match to its source more challenging. By consulting a list of references, developers can now:

  • Determine whether, what, and who to attribute rather than having matches simply blocked from the outset.
  • Learn from other developers by studying their approaches to similar problems.
  • Take a dependency on an open source library to avoid new business logic.
  • Evaluate the context of code before accepting a matching suggestion.
  • Discover new projects and contribute upstream.

Building a better developer experience and community

This is just the beginning. As GitHub continues to innovate, we will always strive to help developers stay in the flow, build creatively, and maintain a transparent connection to the community. We’re excited for you to try GitHub Copilot with code referencing.

Happy coding.

The GitHub Insider Newsletter

Discover tips, technical guides, and best practices in our monthly newsletter for developers.

Subscribe

More on GitHub Copilot

Smarter, more efficient coding: GitHub Copilot goes beyond Codex with improved AI model

Smarter, more efficient coding: GitHub Copilot goes beyond Codex with improved AI model

We're thrilled to announce two major updates to GitHub Copilot code Completion's capabilities that will help developers work even more efficiently and effectively.

How to build a GPT-3 App with Nextjs, React, and GitHub Copilot

In this step-by-step tutorial, you will learn how to use GitHub Copilot to build an application with OpenAI’s gpt-3.5-turbo model.

How to responsibly adopt GitHub Copilot with the GitHub Copilot Trust Center

We’re launching the GitHub Copilot Trust Center to provide transparency about how GitHub Copilot works and help organizations innovate responsibly with generative AI.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK