![](/style/images/good.png)
![](/style/images/bad.png)
Amazon's CodeWhisperer
source link: https://lwn.net/Articles/900045/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Amazon's CodeWhisperer
CodeWhisperer’s reference tracker detects whether a code recommendation may be similar to particular CodeWhisperer training data, and can provide those references to you. This allows you to easily find and review that reference code and how it is used in the context of another project.
(Log in to post comments)
Amazon's CodeWhisperer
Posted Jul 5, 2022 14:45 UTC (Tue) by NightMonkey (subscriber, #23051) [Link]
So, let's say that a developer uses GItHub's CoPilot or Amazon's CodeWhisperer or other similar code Mad Lib tools. They love the MIT or Apache-licensed code (maybe even some GPL2?) that they see and use lots of it. 6 months later, a court finds the 'training data' code patented, and is, therefore, no longer Free. What then for the developer? How are they alerted to this problem? Or is it only a problem for the services, not the developer? Cheers.
Amazon's CodeWhisperer
Posted Jul 5, 2022 16:16 UTC (Tue) by dskoll (subscriber, #1630) [Link]
Also not a lawyer, but I know a little about patents from a previous job. A patent is different from copyright. To infringe copyright, you have to distribute a work contrary to the terms of its license, or derive a work from a copyrighted work and distribute it contrary to the original work's license.
For a patent, the only thing that matters is what you do, not how you got there. So for example, when the LZW compression algorithm was patented, it wouldn't matter if you copied a reference implementation, created a brand-new implementation on your own, or used a Copilot-derived implementation... you'd still be infringing the patent.
If you do infringe on a patent, it's sometimes better not to know, because willful infringement carries a lot higher penalty than inadvertent infringement.
I doubt Amazon or MSFT would be responsible for notifying users of their AI code-generating software about potential patent infringement... that risk lies entirely with the users.
Amazon's CodeWhisperer
Posted Jul 5, 2022 17:22 UTC (Tue) by Wol (subscriber, #4433) [Link]
So when Code Whisperer makes a suggestion, it looks like it tells you where you got it from, and you have the information you need to do due diligence.
It seems Copilot doesn't bother ...
Cheers,
Wol
Amazon's CodeWhisperer
Posted Jul 5, 2022 16:21 UTC (Tue) by nim-nim (subscriber, #34454) [Link]
That service aspect apart, it changes nothing for you as a consumer or publisher of code. The service can be sued as accessory to copyright infringement, but the infringement is still yours (unless the service promises legal insurance as part of its terms of use).
As a consumer, you’re still supposed to perform legal due diligence on the third party code you integrate.
As a publisher, you’re supposed to make sure your legal terms are clearly written and clearly notified.
Copyright is still the same dangerous hairball than when AT&T published Unix (Lions book and all) and everyone involved ended up in court due to general carelessness.
Amazon's CodeWhisperer
Posted Jul 5, 2022 18:30 UTC (Tue) by nickodell (subscriber, #125165) [Link]
For a patent, if you invent the same thing as a previous patent, then you're infringing on that patent. It doesn't matter if you invented it independently. (However, the penalties for willful infringement are higher.)
For copyright, if you come up with the same idea, the way you came up with it matters. One interpretation is that language models are doing some form of reasoning, so a similar work appearing in the training data isn't necessarily proof that the language model is copying that previous work. Another interpretation is that a language model is just copying part of its input and changing a few things.
There are awkward effects for both possible interpretations. If you accept the first interpretation, then how do you measure whether a model is doing "enough" reasoning? If you accept the second interpretation, that implies that the output of e.g. GPT-3 is jointly owned by every person who's written anything on the internet. Practically speaking, it would become illegal to train an AI on common crawl data.
I don't think any court has ruled on it either way.
Amazon's CodeWhisperer
Posted Jul 5, 2022 18:46 UTC (Tue) by nim-nim (subscriber, #34454) [Link]
You can take all the words in a text, and arrange them in sentences meaning something else, and the result will be non infringing.
You can take the same text, and replace every single word with a synonym, and the result will be definitely infringing. None of the words survived but the structure is still the same.
That makes models, that analyze the structure of the code being written, and suggest bits to make it closer to someone else’s structure, especially problematic.
Amazon's CodeWhisperer
Posted Jul 5, 2022 21:11 UTC (Tue) by rgmoore (✭ supporter ✭, #75) [Link]
The classic example with writing is that you can change the medium or genre of a work and it can still be a derivative. All those comic book movies are still derivatives of the original comics, even if they don't directly swipe story lines. Similarly, The Magnificent Seven is still a derivative of Seven Samurai even though the setting, character names, and even the language all changed.
That said, the functional nature of code makes it a more difficult case than something purely expressive like fiction or poetry. If there are few enough ways of achieving the same purpose efficiently, it's possible to argue the code is determined purely by functional constraints and therefore isn't expressive. This is especially true if the code is implementing a published algorithm, like quicksort or the sieve of Eratosthenes.
Amazon's CodeWhisperer
Posted Jul 6, 2022 5:48 UTC (Wed) by nim-nim (subscriber, #34454) [Link]
Amazon's CodeWhisperer
Posted Jul 6, 2022 7:18 UTC (Wed) by LtWorf (subscriber, #124958) [Link]
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK