7

Launch HN: Vocode (YC W23) – Library for voice conversation with LLMs

 1 year ago
source link: https://news.ycombinator.com/item?id=35358873
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Launch HN: Vocode (YC W23) – Library for voice conversation with LLMs Launch HN: Vocode (YC W23) – Library for voice conversation with LLMs 173 points by KianHooshmand 7 hours ago | hide | past | favorite | 74 comments

Hey everyone! Kian and Ajay here from Vocode–an open source library for building LLM applications you can talk to. Vocode makes it easy to take any text-based LLM and make it voice-based. Our repo is at https://github.com/vocodedev/vocode-python and our docs are at https://docs.vocode.dev.

Building realtime voice apps with LLMs is powerful but hard. You have to orchestrate the speech recognition, LLM, and speech synthesis in real-time (all async)–while handling the complexity of conversation (like understanding when someone is finished speaking or handling interruptions).

Our library is easy to get up and running–you can set up a conversation in <15 lines of code. Check out our Gen Z GPT hotline demo: https://replit.com/@vocode/Gen-Z-Phone (try it out at +1-650-729-9536).

It all started with our PrankGPT project that we built for fun (quick demo at https://www.loom.com/share/0d0d68f1a62f409eb5ae24521293d2dc). We realized how powerful voice + LLMs are but that it was hard to build.

Once we got everything working, it was really cool and useful. Talking to LLMs is better than all the voice AI experiences we’ve had before. And, we imagined a host of cool applications that people can build on top of that.

So, we decided to build a developer tool to make it easy. Our library is open source and gives you everything you need in a single place.

We give you a bunch of integrations out-of-the-box to speech recognition/synthesis providers and let you swap them out easily. We have platform support across web and telephony (via Twilio), with mobile coming soon. We also provide abstractions for streaming conversation (this is good for realtime apps like phone calls) and for command-based/turn-based applications (like voice-based chess). And, we provide customizability around how the conversation is done—things like how to know when someone is finished speaking, changing emotion, sending filler audio if there are delays, etc.

In terms of “how do you make money” – we have a hosted version that we’re going to charge for (though right now you can get it for free! https://app.vocode.dev) and we're also going to build enterprise products in the future.

We’d love for you to try it out and give us some feedback! And, if you have any demos you'd like to see – let us know and we’ll take a crack at building them. We’re curious about your experiences using or building voice AI, what features or use cases you’d love to see, and any other ideas you have to share!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK