Launch HN: Vocode (YC W23) – Library for voice conversation with LLMs - JOYK Joy of Geek, Geek News, Link all geek

Launch HN: Vocode (YC W23) – Library for voice conversation with LLMs Launch HN: Vocode (YC W23) – Library for voice conversation with LLMs 173 points by KianHooshmand 7 hours ago | hide | past | favorite | 74 comments

Hey everyone! Kian and Ajay here from Vocode–an open source library for building LLM applications you can talk to. Vocode makes it easy to take any text-based LLM and make it voice-based. Our repo is at https://github.com/vocodedev/vocode-python and our docs are at https://docs.vocode.dev.

Building realtime voice apps with LLMs is powerful but hard. You have to orchestrate the speech recognition, LLM, and speech synthesis in real-time (all async)–while handling the complexity of conversation (like understanding when someone is finished speaking or handling interruptions).

Our library is easy to get up and running–you can set up a conversation in <15 lines of code. Check out our Gen Z GPT hotline demo: https://replit.com/@vocode/Gen-Z-Phone (try it out at +1-650-729-9536).

It all started with our PrankGPT project that we built for fun (quick demo at https://www.loom.com/share/0d0d68f1a62f409eb5ae24521293d2dc). We realized how powerful voice + LLMs are but that it was hard to build.

Once we got everything working, it was really cool and useful. Talking to LLMs is better than all the voice AI experiences we’ve had before. And, we imagined a host of cool applications that people can build on top of that.

So, we decided to build a developer tool to make it easy. Our library is open source and gives you everything you need in a single place.

We give you a bunch of integrations out-of-the-box to speech recognition/synthesis providers and let you swap them out easily. We have platform support across web and telephony (via Twilio), with mobile coming soon. We also provide abstractions for streaming conversation (this is good for realtime apps like phone calls) and for command-based/turn-based applications (like voice-based chess). And, we provide customizability around how the conversation is done—things like how to know when someone is finished speaking, changing emotion, sending filler audio if there are delays, etc.

In terms of “how do you make money” – we have a hosted version that we’re going to charge for (though right now you can get it for free! https://app.vocode.dev) and we're also going to build enterprise products in the future.

We’d love for you to try it out and give us some feedback! And, if you have any demos you'd like to see – let us know and we’ll take a crack at building them. We’re curious about your experiences using or building voice AI, what features or use cases you’d love to see, and any other ideas you have to share!

Launch HN: Vocode (YC W23) – Library for voice conversation with LLMs

Recommend

Meta Launches New Reels Ad Options as it Leans into Evolving Consumption Shifts

Running Total / Cumulative Sum using M_time dimension in HANA Calculation View

We’re Going to See Big Tech Subsidizing Telcos, So We Need to Look Beyond

Using SAS to solve an introductory programming assignment

How the March 2023 Google core update compared to previous core updates

ChatGPT vs. Google Bard vs. Bing Chat: Which generative AI solution is best?

助力消费提振，拼多多联合央视新闻打造“特色直播+百亿补贴”

Treat your to-read pile like a river | Oliver Burkeman

New SEO study reveals the biggest investments of 2023

Fast Unix Commands

About Joyk