Ask HN: Why isn't there a standard network audio protocol?
source link: https://news.ycombinator.com/item?id=31027526
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Ask HN: Why isn't there a standard network audio protocol?
Ask HN: Why isn't there a standard network audio protocol? 97 points by armagon 11 hours ago | hide | past | favorite | 124 comments Having been frustrated again in using bluetooth from a computer to a smart speaker -- ugh! I swear connections only work half the time, and it isn't due to RF interference -- I'm wondering why there isn't a standard protocol for transmitting audio over the network. I think it would be so much easier to use.
[I'm talking about having my devices at home talk to each other. They are already on the same network.]
Edit/Addendum: Are there any streaming audio protocols that work from Mac/Windows/iOS to Amazon Echo Dots? I'm looking for a drop-in replacement for bluetooth audio streaming, where I can play sounds on my computer (ex. a youtube video) and hear it on a louder speaker.
AES50 works over cat5 cables, but doesn't use ethernet; it uses a synchronous clock to transmit PCM audio. A lot of the Midas/X32 product lineup uses this to great effect.
Dante allows normal IP equipment to function as audio distribution devices, but has noticeable latency for close-quarters stuff (sound travels ~ .9ms / foot +-10%).
AES50 has extraordinarily low latency, pretty much as good as analog, but only allows point-to-point links.
On the consumer side, RAOP existed for awhile before silicon valley elitism infected Apple: https://en.wikipedia.org/wiki/Remote_Audio_Output_Protocol
EDIT ====
I had in my head the RAOP was an open standard, it's not.
Or even better: https://en.wikipedia.org/wiki/Cut-through_switching
Let's say your original source could be heard by the audience. The audio can travel from source, to mic, be digitized, transmitted over IP, arrive at console, be digitally mixed, then transmitted over IP to speaker, then broadcast at an unknown distance from the source. How much latency occurred there?
The trick is to get the original source wave to roughly line up with the time the amplified wave leaves the PA speaker, otherwise weird echos are heard by the audience at best, and awful band-specific noise cancellation happens at worse.
Bluetooth isn't the same as your WiFi network. Most of the comments here are talking about IP-based protocols that aren't relevant for Bluetooth anyway.
Bluetooth is probably the best example of a widely adopted protocol for connecting to devices and sending audio streams. The protocol isn't exactly the problem. It's the buggy implementations of Bluetooth stacks and Bluetooth software in embedded devices.
Getting it right is actually extremely difficult because Bluetooth grew in complexity to be everything to everyone. It isn't only an audio sending protocol. Almost nobody owns the entire Bluetooth stack, so it's a mix of pieces from different companies and vendors.
Apple's implementation isn't perfect, but from experience I can tell you it's 10X better than the nightmare that is Android Bluetooth. It's getting better, but for years you had to collect a lot of different Android phones so you could make your software work around all of the different quirks in each vendor's different Bluetooth stacks.
Seem like I have different experience with them. I don't have issues with Android Bluetooth, I do have issues with Apple bluetooth.
Half of the time, my iPad couldn't detect my bluetooth devices (keyboard and audio accessory) are trying to connect to it (already paired). When that occurred, I have to go to the Command Center to force connect my bluetooth devices and half of the time iPad will obligate and connect. Other time it just give up and said couldn't connect or cannot find it (while my bluetooth device is poking iPad to connect). It is a hassle to use my bluetooth devices with the iPad daily.
On the Android side, it instantly connects, even my phone is sleeping.
The cheap anker headsets I mostly use are rock solid. I have an android head unit (second one actually, first one was garbage) and a Bluetooth radar detector. The detector always works with my phones, and never with my head unit(s).
The spec is way too complicated at some 3000 pages long which apparently leads to faulty implementations left and right.
https://www.wired.com/story/bluetooth-complex-security-risk/
I'm aware of that. I want audio over WiFi and audio over LAN, as Bluetooth has left me scarred.
But your problem doesn't have anything to do with a lack of standards; Amazon has no incentive to just let you send RTP to the Echo Dot on a port — nobody is asking for that, and they would have no control over the "experience".
I don't see that this is any worse for the experience than the bluetooth situation. It'd certainly make the device more valuable for me. But that doesn't mean Amazon will see it as worth their time.
[1] https://github.com/badaix/snapcast [2] https://github.com/geekuillaume/soundsync
Like a lot of other people doing (or trying to do) Whole Home Audio, I'm using the Home Assistant open source platform as the central automation controller. You may want to look at creating a Home Assistant integration for SoundSync as it will expose it to the massive HA community (https://developers.home-assistant.io/docs/development_index/).
I'm boycotting spotify, so I'm looking for something for soundcloud, deezer, or youtube music.
Tbh, skip deezer, as they actively refuse to create something similar to spotify connect. IMO this is the USP of sonos.. it acts as spotify connect for all services
Those are working and used in live venues and studios, the hardware used for those might however be out of scope in terms of price for the typical user and it certainly won't work with your smart speaker.
Does that smart speaker have line in (3.5 mm TRS)? If so you could just send your audio analog over the ethernet cable and build an ethernet to TRS adapter on the other end : ) For longer distances balanced line drivers might be needed, tho.
But shielded Ethernet cables work surprisingly well for analog audio purposes, especially if you send balanced signals through them. If you transformer-balance them you even get galvanic isolation for free.
bluetooth sucks because it was invented by a bunch of guys in suits and consumer electronics companies rather than people who understand latency, performance etc. i designed my own protocol in the 2.4ghz band and wrote firmware and middleware for it and it deals with all the weaknesses of BT.
BT should have been designed by those who design the products and applications and deal first hand with end users.
It was not designed to solely carry audio. It just sort of morphed into being primarily used as an audio exchange format (because it's "good enough"). A little bit like how USB morphed into a peripheral bus even though it was designed to be more all encompassing (USB Ethernet, for example). In fact, the USB protocol is somewhat mucked up by the fact that it was designed to be a network instead of a more direct connection.
i think BT wwas first designed for exchanging photos. so mass storage transfer. it should have been designed for streaming latency sensitive data like audio first, and then the “easier” scenarios could have been built on top of that.
at least with USB there was the common sense to include ISO transfers although drivers for that in OSes happened relatively late and OS vendors have ignored the standard for many years, requiring the purchase of analyzers.
in that regard there is similarity with BT but with USB it seems easier to come up with a solution as a firmware/driver/application developer. at least in my experience.
I then came to love the simplicity and reliability of SDI. Nowadays I work in uncompressed ST2110, and while there are many advantages of network based video and audio, paying $1,000 for a QSFP to handle just a few streams is a hard pill to swallow!
There's Ravenna and AES67 (which I believe Dante supports), which are open standards but are not as common as Dante.
as for your bluetooth issues, PC bluetooth is a mess.
some of bluetooth's messiness comes from having the higher level elements of the stack designed 20+ years ago to operate on microcontrollers of that era. they've got N different audio profiles because the hardware it was expected to operate on originally would've been hard pressed to handle a single audio profile that could negotiate the gamut of use cases.
From my Windows 10 PC, TuneBlade AirPlay streaming provides a great experience:
I can play stream anything that is playing on the PC to any AirPlay device on my LAN, and all the playback devices will be in perfect audio sync.
AirPlay 1 devices on my network include an AppleTV, Apple HomePod Minis, Nexum Airplay receivers attached to powered speakers, and DAPs with Airplay reception.
There is a significant buffer delay—about 2 seconds—that messes with video streaming. TuneBlade has the ability to stream video to VLC with synced audio, but doesn't support other video streaming endpoints. There is a bufferless mode with no delay, but it doesn't work well on my network.
There is also DLNA, which is actually a standard. I think it's rarely supported for push audio streaming since the protocol is poorly specified.
― Andrew S. Tanenbaum
But really, RTP is the closest thing. Outside of the consumer space nearly everything is RTP + some out of band signalling protocol. It's low latency, designed to be multicast, has RFCs for evey codec under the sun, etc.
But more broadly, not everything can be in scope. At the time of design having 10 MB and a decompressor in earbuds wasn’t realistic.
But blaming your headphones is ignorant - the headphones implement a protocol. They don’t have control over the protocol.
That's not how those words work.
Twitch is streaming, right? Under certain flaky playback conditions it can buffer a full minute. Which is 50 megabytes at full quality.
Yes, the headphones could store up N seconds of audio data ahead of playback. However, the value of buffering is that if you miss a chunk of data, you can tell the sender "give me that again". Protocols that allow buffering account for that by giving the data sink a means to tell the source "send me chunk F again". Bluetooth A2DP and other streaming protocols, because they prioritize constant latency over data reliability, don't have a means to allow that; the source keeps sending new chunks even if the sink didn't receive one.
As a result, there would be no value in headphones storing up a bunch of audio before playback; if a chunk is missing, there are no means to remedy that in the protocol, so it will still be missing when you play it back.
Of course, the experience of clicking play on a song and having it only start a number of seconds later is not something that'd sell particularly well, I guess. And then you'd have to renegotiate the BT profile if a call comes in that has to happen live. And switching back to the song will have another big delay.
And let's not forget this was a discussion of buffering. A buffer of 5 minutes (50MiB) buys you 5 minutes of not having to be real-time, or to be slowly lagging behind — if that covers 3h of continuous listening time, you probably covered 99% of uses where latency is not a big deal anyway (like playing music — calls and movies are another game).
I already acknowledge practical UX problems with just relying on buffering, but it doesn't make much sense to say how it can't be done because of the protocol either.
But wanting a better protocol isn't 'silly'.
If the headphones are implementing a protocol that isn't suitable for purpose, there is very good reason to blame the headphones. What's the point in having headphones if you need to be in a Faraday cage to use them?
> What's the point in having headphones if you need to be in a Faraday cage to use them?
Surely it’s the opposite? They don’t work in a Faraday cage, because they’re streaming and need to be connected.
What is the use case for headphones that cut out every couple of seconds?
> Surely it’s the opposite? They don’t work in a Faraday cage, because they’re streaming and need to be connected.
In this case the broadcast source would be in the Faraday cage along with the listener.
I can use my hdmi ARC soundbar from my computer. We live in a backwards world.
Otherwise, I see a huge opportunity for a consortium to develop a new hardware and software stack for high quality low latency audio and being them as a package to their products. I would love a completely wireless Dolby Atmos like setup with no central receiver, your mobile device itself being the av receiver. New speakers from any manufacturer and form factor could be added wherever you want as you buy them. Calibration according to your speaker placement would be wireless and automatic.
Microsoft did it with Xbox Wireless Protocol which is used to transfer input from controllers and high quality sound without latency.
But, yes. It only works on Xbox or on Windows with an adapter and you can count the manufacturers using it one one hand. Microsoft being the thumb.
I found SnapCast which lets me send audio from laptop to phone (with huge latency) but not the other way (phone mic to laptop).
Hate to say it but you are probably better off getting something other than Echo Dots for music. Too bad Google discontinued the Chromecast Audio - I love mine. The biggest plus for me is that once you have a compatible app (such as Spotify) streaming is done entirely on the chromecast audio from your internet itself instead of continuing to use your phones battery and wifi.
For that reason, I have extensively worked with pulseaudio over network. There is no UI that works for this. NTP for some reason is important which seems like bad design to me. zeroconf doesn't work at all.
Once you get it working... dont dare change anything. It will break in inexplicable ways that drive you up a wall.
And then the RIAA and MPAA discovered the plan and killed it good.
One problem is the fact that the codec needs to be negotiated, and if you're unlucky with codec compatibility, both callers fall back to crappy old codecs. Then there are tons of options for audio profile selection depending on requirements and bandwidth available (see https://en.wikipedia.org/wiki/Enhanced_Voice_Services for an overview) which makes it difficult to say what cause your specific problems.
Without VolTE, you're falling back to 3G audio, probably AMR or AMR-WB, which is quite old and doesn't compress as well as modern standards.
Unless you mean the headphone profile for Bluetooth headsets: that's terrible because the standard is ancient, back when Bluetooth had even less capacity for low latency data transfer, and the codec is suboptimal making the situation even worse. There are better codecs out there, and some headsets will support what some call mSBC, which massively improves the audio quality (but not exactly to a HD audio stream because of limitations). There have been several proprietary attempts to fix this issue, but implementing those solutions costs money so many headphones ship without them.
Cabled connections are superior to wireless ones, even more so because traditional landlines had dedicated connections and as such had no need to compress anything.
The problem is that it's a protocol with a ton of warts -- having two connections, one UDP and one TCP, has been a massive headache for decades now. But it's not awful enough to get ripped up and redone.
The Asterisk VOIP platform had a really awesome protocol called IAX that was basically RTP with the two streams merged into a single UDP connection (and a bare-bones TCP-like reliability layer for the control frames inside of UDP). IAX was never meant for anything other than VOIP, but I wish it had been turned into a wholesale replacement for RTP. If that had happened, it would have been wonderful.
I guess the downside is that your neighbors could listen to whatever you're listening to but who listens to terrestrial radio in their home that is received OTA anymore?
https://en.wikipedia.org/wiki/HD_Radio
https://www.amazon.com/Home-FM-Transmitter-Whole-House/dp/B0...
Also, authorities like the FCC take a dim view of FM broadcasting beyond miniscule power levels as seen in car radio adapters due to the easy potential for intentional or unintentional abuse. For example, a 5 Watt FM transmitter sold on eBay may have you thinking it will yield a small amount of power, but spitballing some numbers: outputting it through an FM band turnstile antenna atop a high building or hill could have an Effective Radiated Power in the 7 or 8 kW range, great enough to cover a small city in a round pattern.
Your proposed devices would therefore fall into that very low power range for certification but there would need to be some sort of clear channel hopping required. That's fine in rural areas but quite difficult in large metropolitan areas.
I think this paints a better picture of the situation than any one person can provide.
It's because the replies are interpreting the op's question differently from the intent.
When op asks: "why there isn't a standard protocol?" -- he's asking "why isn't there a SingleDominantThatWorksOnOnEveryDevice audio protocol that lets me connect devices seamlessly?"
The op's word of "standard" is just doing a lot of heavy lifting to convey a frustration with stuff not working intuitively.
The analogy is TCP/IP being a standard (SingleDominantThatWorksOnOnEveryDevice) network protocol that won over Apple AppleTalk, Novell SPX/IPX, and Microsoft LANMAN NETBIOS.
But many replies interpreted "standard" as "any available existing specification regardless of marketshare or device availability" -- so that's where you get various examples of audio protocols that are idiosyncratic to particular domains which are not analogous to the ubiquity and reliability of TCP/IP. E.g. the Dante audio protocol which doesn't seem relevant to op's use case.
And what's the scope of an "audio protocol"? Is it a "media query of music files" protocol like DLNA? Or is it a "virtual hardware audio device endpoint" like Bluetooth Audio?
Why isn't there a widely interoperable audio-over-the-network transmission protocol I can use, so that when I am playing sound (from a song, a video, or a game), I can hear it on an external speaker? [The scope is just a 'virtual hardware audio device endpoint' like bluetooth audio]
> ...(from) a video, or a game
So then you need low latency, like less than 10ms? So that lip-sync works, and the game is playable?
Do you need it distributed across different endpoints, also with low latency?
Does it need to run using unreliable WiFi connections, and not kill all audio just because one endpoint is under-performing?
These are all hard, hard enough that doing it well (and keeping it proprietary) makes companies like Sonos big.
OTOH, streaming mp3 from one endpoint to another is trivial.
I for one would accept latency and the audio going silent (or better, an audio indicator) if the connection isn't up-to-snuff but I don't know if other people would.
It manages it... sometimes.
I have an NVIDIA Shield I use for my video needs, and attempting to pair my Sony WH-1000XM4 headphones with it results in crappy latency and out-of-sync audio. These are both high end products from respected companies, and they work together with pretty shitty results.
Edit: I just tried this again after writing that and magically things work much better than they did before... but I stick by the general point.
In general, I'd describe the Bluetooth experience as mediocre at best.
The reason there isn't a standard (other than Sonos, or those discontinued Chromecast dongles) is that you need the following to work seamlessly:
- network attached DAC of some sort (in-speaker, or not; don't care)
- iOS app
- Android app
- the top 10 streaming services
- radio streaming directories, like TuneIn, or the open source ones.
- airplay
- Chromecast
- network / device auto discovery
- sound synchronization
- power management
- desktop apps
- NFS/cifs/etc bridge
- hdmi/fiber/??? bridge
- N.M surround sound (for N = 2, 3,5,7,9 and M=0,1,2)
- Some battery powered, waterproof speaker that works in direct sunlight on hot days
- Hardware distribution at places like IKEA, BestBuy, Amazon, etc.
- A healthy used hardware market
- 10+ year support lifetimes on the speakers + amps (note: discrete, cheap DAC dongles could disrupt sonos on this point)
And other things I forgot about.
I bought my first Sonos device last year, the Roam. Using it as a bluetooth speaker is fine and I love the sound and portability, but oh boy do I hate the experience of trying to use Sonos services over wifi.
Nine times out of ten, perhaps even more often, the iOS app says it can't connect and "let's fix it". If I go through the slow reconnection wizard it invariably ends up telling me to reboot my router(!?). I learned to either switch the Roam on/off a bunch of times, or kill and restart the app a bunch of times, before the app eventually decides yes, it can find the device ... only to then fail again when half hour later I want to add something else to the queue or switch station.
Try plugging into Ethernet, or placing it close to your router. If that fixes it, then you have a root cause.
market leader? no idea. first-mover? wrong. Slim Devices were the first mover in this space with the Squeezebox (subsequently purchased by Logitech). Sonos came shortly afterwards.
I have multi-room streaming using “dumb” speakers, and copper wire (for audio and network). I control one content box and aim it at different speakers from my phone, tablet, laptop. Siri Shortcuts decouple me from waiting for an MBA to approve adding voice commands.
I know; brave flex sticking with simple wire versus going wireless.
There's not much in way of open source solutions to using it, and not many devices you would want to buy as a consumer that uses it, however.
But there are some? Can an arbitrary Linux box or Raspberry Pi be fitted with free software to receive AES67 over Ethernet from commercial solutions, or is there a catch?
So for AES67 receive, in principle no as PTP stack exists for RPI yet. You could cheat like the majority of manufacturers do and just play the audio as it arrives instead of using the timestamps. You'd also need a way of drifting the audio out clock to match the frequency of the PTP clock. If you didn't care about bitexact audio, you can resample, though ALSAs clock measurement kind of sucks.
I listen to my Apple devices on a knock-off add-on Bluetooth for my car with no issues. I’ve sent audio to a vast variety of non-Apple Bluetooth devices. In fact the only Apple-branded BT device I use are my AirPods.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK