3

Show HN: Verify LLM Generated Code with a Spreadsheet

 1 year ago
source link: https://news.ycombinator.com/item?id=36152787
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Show HN: Verify LLM Generated Code with a Spreadsheet Show HN: Verify LLM Generated Code with a Spreadsheet 23 points by narush 3 hours ago | hide | past | favorite | 3 comments

Hey HN! Been a minute. We launched Mito here last year (https://news.ycombinator.com/item?id=32723766).

Mito is a spreadsheet that generates Python code as you edit it. We've spent the past three years trying to lower the startup cost to use Python for data work. In doing so, we’ve been thrust into the middle of many Python transition processes at larger enterprises, and we’ve seen up-close how non-technical folks interact with generated code.

The Mito AI chatbot lives inside of the Mito spreadsheet (https://www.trymito.io/>. The obvious benefit of this is that you can use the chatbot to transform your data and write a repeatable Pythons script. The less obvious (but equally important) benefit is that by connecting a spreadsheet and chatbot, Mito helps you understand the impact of your edits and verify LLM generated code. Every time you use the chatbot, Mito highlights the changed data in the spreadsheet. You can see a quick demo here (https://www.tella.tv/video/clibtwssv00000fl65oky13nu/view).

Three main insights shaped our approach to LLM code generation:

# Consumers of generated code don't know enough Python to verify and correct the code

Mito users span the range of Python experience. For new programmers, generating code using LLMs is an easy step one. Ensuring the generated code is correct is the forgotten step two.

In practice, LLMs often generate incorrect code, or code with unexpected side effects. A user will prompt an LLM to calculate a total_revenue column from price and quantity columns. The LLM correctly calculates total_revenue = price * quantity but then mistakenly deletes price and quantity.

New programmers find it almost impossible to verify generated code by reading it alone. They need tooling designed for their skillsets.

# Not everyone knows how to use a chat interface for transformations

We were surprised to learn that many Mito users a) had no experience with ChatGPT, and b) didn’t understand the chat interface at all! Mito AI presents users a few example prompts and an input field. A surprising number of users thought the example prompts were all they could use Mito AI for.

AI chatbots are new. Us builders might be using them for natural language interactions, but users are still learning how to use them in new contexts. This stands in stark contrast to spreadsheets, where pretty much ever business user has experience. Shout out 40 years of Excel dominance!

# The more context a prompt has about the user’s data + edits, the better the LLM results

For the LLM to generate code that can execute correctly, the prompt should include the names of the dataframes, the column headers, (some) dataframe values, and a few previous edits as examples. Duh.

But there’s no reason users should be responsible for writing this prompt. No one loves writing long chats, and in practice Mito AI users expect to be able to write ~12 words. Spreadsheets are well-suited to building the rest of the prompt for you - they have all of your data context, and know your recent edits.

With these three insights, it became very clear to us what role a spreadsheet could play in LLM based code-gen: a spreadsheet is the prompt builder, and a spreadsheet is the code verifier.

Mito AI builds an effective prompt by supplementing your input with the context of your data and recent edits.

Mito AI then helps you to verify the LLM generated code by highlighting the added, modified, and removed data within the chat interface - and within the spreadsheet. This way, you can ensure your LLM generated code is correct.

Give it a spin. Let us know what you think of the recon and how we can make it more helpful!

Also, if you like what we’re doing, we’re hiring – come help us build! (https://www.ycombinator.com/companies/mito/jobs)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK