5

Ask HN: Why isn't there a standard network audio protocol?

 2 years ago
source link: https://news.ycombinator.com/item?id=31027526
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Ask HN: Why isn't there a standard network audio protocol?

Ask HN: Why isn't there a standard network audio protocol? 97 points by armagon 11 hours ago | hide | past | favorite | 124 comments Having been frustrated again in using bluetooth from a computer to a smart speaker -- ugh! I swear connections only work half the time, and it isn't due to RF interference -- I'm wondering why there isn't a standard protocol for transmitting audio over the network. I think it would be so much easier to use.

[I'm talking about having my devices at home talk to each other. They are already on the same network.]

Edit/Addendum: Are there any streaming audio protocols that work from Mac/Windows/iOS to Amazon Echo Dots? I'm looking for a drop-in replacement for bluetooth audio streaming, where I can play sounds on my computer (ex. a youtube video) and hear it on a louder speaker.

There is several standards in the professional space, Dante being the biggest for "audio over ethernet". It's packet switched, so buffering is required.

AES50 works over cat5 cables, but doesn't use ethernet; it uses a synchronous clock to transmit PCM audio. A lot of the Midas/X32 product lineup uses this to great effect.

Dante allows normal IP equipment to function as audio distribution devices, but has noticeable latency for close-quarters stuff (sound travels ~ .9ms / foot +-10%).

AES50 has extraordinarily low latency, pretty much as good as analog, but only allows point-to-point links.

On the consumer side, RAOP existed for awhile before silicon valley elitism infected Apple: https://en.wikipedia.org/wiki/Remote_Audio_Output_Protocol

EDIT ====

I had in my head the RAOP was an open standard, it's not.

s.gif
Regular ethernet can do much better than .9 ms / ft, but it would indeed require some buffer size tuning to a level most consumer routers/switches wouldn't allow.

Or even better: https://en.wikipedia.org/wiki/Cut-through_switching

s.gif
The real problem with Dante is the buffering required, the wire speed is more than sufficient. Ethernet Packets are "best effort", making latency less predictable.

Let's say your original source could be heard by the audience. The audio can travel from source, to mic, be digitized, transmitted over IP, arrive at console, be digitally mixed, then transmitted over IP to speaker, then broadcast at an unknown distance from the source. How much latency occurred there?

The trick is to get the original source wave to roughly line up with the time the amplified wave leaves the PA speaker, otherwise weird echos are heard by the audience at best, and awful band-specific noise cancellation happens at worse.

> I'm wondering why there isn't a standard protocol for transmitting audio over the network.

Bluetooth isn't the same as your WiFi network. Most of the comments here are talking about IP-based protocols that aren't relevant for Bluetooth anyway.

Bluetooth is probably the best example of a widely adopted protocol for connecting to devices and sending audio streams. The protocol isn't exactly the problem. It's the buggy implementations of Bluetooth stacks and Bluetooth software in embedded devices.

Getting it right is actually extremely difficult because Bluetooth grew in complexity to be everything to everyone. It isn't only an audio sending protocol. Almost nobody owns the entire Bluetooth stack, so it's a mix of pieces from different companies and vendors.

Apple's implementation isn't perfect, but from experience I can tell you it's 10X better than the nightmare that is Android Bluetooth. It's getting better, but for years you had to collect a lot of different Android phones so you could make your software work around all of the different quirks in each vendor's different Bluetooth stacks.

s.gif
> Apple's implementation isn't perfect, but from experience I can tell you it's 10X better than the nightmare that is Android Bluetooth

Seem like I have different experience with them. I don't have issues with Android Bluetooth, I do have issues with Apple bluetooth.

Half of the time, my iPad couldn't detect my bluetooth devices (keyboard and audio accessory) are trying to connect to it (already paired). When that occurred, I have to go to the Command Center to force connect my bluetooth devices and half of the time iPad will obligate and connect. Other time it just give up and said couldn't connect or cannot find it (while my bluetooth device is poking iPad to connect). It is a hassle to use my bluetooth devices with the iPad daily.

On the Android side, it instantly connects, even my phone is sleeping.

s.gif
It sounds like you are talking from a user perspective and the parent is talking from a vendor perspective, no?
s.gif
I've had the exact same experience when trying to connect my Airpods to my iPhone SE 2nd Gen. When I still used a Samsung S8 the phone would instantly connect to my Airpods. Same experience with Bluetooth headphones.
s.gif
that is the exact reason I don't like to use bluetooth for audio devices. Nothing beats physical jack cables.
s.gif
Seconded. Any bluetooth issues I have on Android are specific to a particular device.

The cheap anker headsets I mostly use are rock solid. I have an android head unit (second one actually, first one was garbage) and a Bluetooth radar detector. The detector always works with my phones, and never with my head unit(s).

s.gif
> The protocol isn't exactly the problem. It's the buggy implementations of Bluetooth stacks and Bluetooth software in embedded devices.

The spec is way too complicated at some 3000 pages long which apparently leads to faulty implementations left and right.

https://www.wired.com/story/bluetooth-complex-security-risk/

s.gif
> Bluetooth isn't the same as your WiFi network. Most of the comments here are talking about IP-based protocols that aren't relevant for Bluetooth anyway.

I'm aware of that. I want audio over WiFi and audio over LAN, as Bluetooth has left me scarred.

There is a standard protocol for audio (and other realtime media) over an IP network: RTP.

But your problem doesn't have anything to do with a lack of standards; Amazon has no incentive to just let you send RTP to the Echo Dot on a port — nobody is asking for that, and they would have no control over the "experience".

s.gif
Good point -- I just put in a feature request, so now someone has asked for it.

I don't see that this is any worse for the experience than the bluetooth situation. It'd certainly make the device more valuable for me. But that doesn't mean Amazon will see it as worth their time.

There are some projects like Snapcast[1] or SoundSync[2] (disclaimer: I'm the creator of Soundsync) to let multiple devices communicate together on the same network. The transmission-side isn't that complex: you choose an audio codec, transmit chunks of data and add a synchronization layer (to keep multiple outputs in sync and to correctly delay video playback to match the soundtrack). The bigger problem is building an ecosystem big enough to make it attractive. Bluetooth sucks but is everywhere.

[1] https://github.com/badaix/snapcast [2] https://github.com/geekuillaume/soundsync

s.gif
I hadn't seen SoundSync before. It looks neat.

Like a lot of other people doing (or trying to do) Whole Home Audio, I'm using the Home Assistant open source platform as the central automation controller. You may want to look at creating a Home Assistant integration for SoundSync as it will expose it to the massive HA community (https://developers.home-assistant.io/docs/development_index/).

s.gif
Multi output audio is one thing, but for me, something similar to spotify connect (having one master player, either elected or dedicated, and the others are remote controls for it, is more important).

I'm boycotting spotify, so I'm looking for something for soundcloud, deezer, or youtube music.

Tbh, skip deezer, as they actively refuse to create something similar to spotify connect. IMO this is the USP of sonos.. it acts as spotify connect for all services

s.gif
What’s the latency like on Soundsync compared to Snapcast?
s.gif
Ooh, SoundSync sounds awesome (no pun intended).
s.gif
That looks really neat, I'm not this far in my home automation system dreams (yet), but as I get closer to settling on how I will communicate back and forth to each room, I may need to take a closer look here.
I was thinking about something like Dante, AES50, AES67, AVB, Ultranet, Ravenna or any other of those professional Audio over Ethernet standards out there.

Those are working and used in live venues and studios, the hardware used for those might however be out of scope in terms of price for the typical user and it certainly won't work with your smart speaker.

Does that smart speaker have line in (3.5 mm TRS)? If so you could just send your audio analog over the ethernet cable and build an ethernet to TRS adapter on the other end : ) For longer distances balanced line drivers might be needed, tho.

But shielded Ethernet cables work surprisingly well for analog audio purposes, especially if you send balanced signals through them. If you transformer-balance them you even get galvanic isolation for free.

Probably get flamed for this, but pulseaudio is good enough for IP networks and handles delay calculation pretty well, when used via multicast it's reasonable but a lack of ecosystem means non-linux support is poor and control is basically non existent, but I did operate pulseaudio as my home audio for TV/PlayStation/phone audio for a time, with some extras like casting receivers etc it's almost useful, but not convenient (there is a gap here someone could fill)
s.gif
Yeah, I've used pulseaudio to play same music on multiple computers and their speakers in multiple rooms, and it worked well enough for that: however, that won't solve the issue for the original poster who wants their music to go to a "smart" speaker.
there is already ABV and DANTE in the pro audio world. you are not aware of it because you probably are not in the recording/audio/music business.

bluetooth sucks because it was invented by a bunch of guys in suits and consumer electronics companies rather than people who understand latency, performance etc. i designed my own protocol in the 2.4ghz band and wrote firmware and middleware for it and it deals with all the weaknesses of BT.

BT should have been designed by those who design the products and applications and deal first hand with end users.

s.gif
BT was designed to be a general purpose peer to peer wireless communication protocol.

It was not designed to solely carry audio. It just sort of morphed into being primarily used as an audio exchange format (because it's "good enough"). A little bit like how USB morphed into a peripheral bus even though it was designed to be more all encompassing (USB Ethernet, for example). In fact, the USB protocol is somewhat mucked up by the fact that it was designed to be a network instead of a more direct connection.

s.gif
Now it's actually used in this way with USB4/Thunderbolt 4.
s.gif
yes, general purpose, another expression for mediocre or garbage.

i think BT wwas first designed for exchanging photos. so mass storage transfer. it should have been designed for streaming latency sensitive data like audio first, and then the “easier” scenarios could have been built on top of that.

at least with USB there was the common sense to include ISO transfers although drivers for that in OSes happened relatively late and OS vendors have ignored the standard for many years, requiring the purchase of analyzers.

in that regard there is similarity with BT but with USB it seems easier to come up with a solution as a firmware/driver/application developer. at least in my experience.

s.gif
Pro stuff gives a glimpse of if we lived in a perfect world, SDI (and HD-SDI) would have been the de facto standard for video everywhere.
s.gif
It's almost like including a BNC automatically rules it out of use as a consumer like there's some ridiculous royalty payment owed or something. I love BNC over every other type of connection for a coax cable. Nothing in the consumer world makes as sure of a connection.
s.gif
Before entering the pro A/V industry I used to equate BNC with "ewww, old as dirt."

I then came to love the simplicity and reliability of SDI. Nowadays I work in uncompressed ST2110, and while there are many advantages of network based video and audio, paying $1,000 for a QSFP to handle just a few streams is a hard pill to swallow!

s.gif
Dante is great but sadly it's proprietary. Low latency and allows you to replace a loom of analog cable with a single ethernet run.

There's Ravenna and AES67 (which I believe Dante supports), which are open standards but are not as common as Dante.

s.gif
Dante supports AES67 in a degraded mode (multicast only, 1ms minimum latency, 48kHz only, at least if you're not using Dante Domain Manager).
s.gif
Bluetooth is for sure designed by committee as no sane person would intertwine software protocols with wire protocols. But here we are with an endless myriad of profile/protocol mixes all doing essentially the same thing of moving bytes back and forth through the air but with different levers for each.
s.gif
USB suffers similarly but it’s not as bad IMHO
Some people might say DLNA, but trust me you want absolutely nothing to do with that disaster of a protocol and tech. I have tried off and on for _15 years_ to use different DLNA tech and every single time it ends in total disappointment and failure.
s.gif
I've got an external HDD with battery and its own small WiFi, it makes its contents available through DLNA. It works great, I usually connect through VLC or a gaming console.
s.gif
I'm using DLNA to play music from my laptop it at the moment (pulseaudio sink, opus encoded) to a raspberry pi (gmediastreamer) that uses pulseaudio to upmix to 5.1 and play on a usb soundcard. It works, and the quality is good, but the lag is crap and I had to wrap everything in crappy scripts that would fix everything if it died. It's been in place for a year but I'd love to ditch it.
clearly (given all the other responses), there are a bunch of different conflicting requirements which lead to different protocols.

as for your bluetooth issues, PC bluetooth is a mess.

some of bluetooth's messiness comes from having the higher level elements of the stack designed 20+ years ago to operate on microcontrollers of that era. they've got N different audio profiles because the hardware it was expected to operate on originally would've been hard pressed to handle a single audio profile that could negotiate the gamut of use cases.

Specifically for computers to smart speakers, I use AirPlay 1, but this works better from Windows with a 3rd-party app than from iOS or MacOS—the 3rd party app is perfectly happy to play to as many endpoints as I like, while Apple will only transmit to one endpoint at a time if it's an AirPlay 1 device.

From my Windows 10 PC, TuneBlade AirPlay streaming provides a great experience:

I can play stream anything that is playing on the PC to any AirPlay device on my LAN, and all the playback devices will be in perfect audio sync.

AirPlay 1 devices on my network include an AppleTV, Apple HomePod Minis, Nexum Airplay receivers attached to powered speakers, and DAPs with Airplay reception.

There is a significant buffer delay—about 2 seconds—that messes with video streaming. TuneBlade has the ability to stream video to VLC with synced audio, but doesn't support other video streaming endpoints. There is a bufferless mode with no delay, but it doesn't work well on my network.

There is, it's called AES67. It just isn't used much in consumer products. The acronym to google is AoIP ("audio over IP")
There is Airplay 1, which is the only widely supported protocol I'm aware of. See for example https://github.com/mikebrady/shairport-sync.

There is also DLNA, which is actually a standard. I think it's rarely supported for push audio streaming since the protocol is poorly specified.

> The good thing about standards is that there are so many to choose from.

― Andrew S. Tanenbaum

But really, RTP is the closest thing. Outside of the consumer space nearly everything is RTP + some out of band signalling protocol. It's low latency, designed to be multicast, has RFCs for evey codec under the sun, etc.

Another question. Why aren't my bluetooth headphones better at buffering larger amounts of data. I should be able to load a complete song without skipping with interference.
s.gif
Because you might be unhappy if there were 30 second latency on a bluetooth voice call, and there would be a whole lot of overhead in an already complex protocol to enable buffered audio instead of live audio.
s.gif
Imagine watching a movie with this. I believe apple actually does something like this, slightly delaying the video playback so the AirPods can buffer and the video stays in sync. But this only works if the video player and headphones can communicate.
s.gif
In fact, Apple aren’t doing this alone. It’s a pretty common feature of video players. I’m pretty sure even VLC supports this.
s.gif
Because that adds a massive amount of latency, something that is a no. 1 complaint for Bluetooth headphones.
s.gif
This will change (hopefully) soon with Bluetooth LE audio!
s.gif
The protocol doesn’t support that - it’s streaming audio.
s.gif
Why wouldn't a streaming audio protocol allow for that?
s.gif
I don’t know if you understand what ‘streaming’ means? Streaming doesn’t support large buffering… because that’s not streaming.

But more broadly, not everything can be in scope. At the time of design having 10 MB and a decompressor in earbuds wasn’t realistic.

But blaming your headphones is ignorant - the headphones implement a protocol. They don’t have control over the protocol.

s.gif
> I don’t know if you understand what ‘streaming’ means? Streaming doesn’t support large buffering… because that’s not streaming.

That's not how those words work.

Twitch is streaming, right? Under certain flaky playback conditions it can buffer a full minute. Which is 50 megabytes at full quality.

s.gif
The headphones and earbuds could easily and realistically incorporate a buffer today. How’s that being ignorant?
s.gif
To be clearer:

Yes, the headphones could store up N seconds of audio data ahead of playback. However, the value of buffering is that if you miss a chunk of data, you can tell the sender "give me that again". Protocols that allow buffering account for that by giving the data sink a means to tell the source "send me chunk F again". Bluetooth A2DP and other streaming protocols, because they prioritize constant latency over data reliability, don't have a means to allow that; the source keeps sending new chunks even if the sink didn't receive one.

As a result, there would be no value in headphones storing up a bunch of audio before playback; if a chunk is missing, there are no means to remedy that in the protocol, so it will still be missing when you play it back.

s.gif
The protocol doesn’t support that. The headphones can do nothing about that.
s.gif
In theory, headphones could store music in a buffer instead of playing it, and then delay playing it by say 2 minutes (or 5 seconds or whatever). Even if existing BT profiles preferred losing quality, you could have BT headsets that pretend to be storage devices and accept file uploads and which then play them after they've been completely received. Ideally though, you'd use one of the BT profiles that already provide guaranteed lossless audio transmission (or develop one if there's none). In a sense, BT profiles are protocols within a protocol, so you can develop almost anything you want (ofc, you need devices to support those profiles too).

Of course, the experience of clicking play on a song and having it only start a number of seconds later is not something that'd sell particularly well, I guess. And then you'd have to renegotiate the BT profile if a call comes in that has to happen live. And switching back to the song will have another big delay.

s.gif
So the upload speed per song is real-time? Come-on - this conversation has turned silly.
s.gif
BT 3.0 offered up to 24Mbps bandwidth, with other variants offering up to 3Mbps. CD quality music is 1.4Mbps. If you cannot come up with an error correcting scheme that will let you upload music in real time with those parameters, what parameters would you need? (And sure, these rates are hard to achieve with BT in real world because of varying distance and interference, and yes, CD quality music is not the highest quality encoding you can use, but you can achieve similar or better quality with less bandwidth too)

And let's not forget this was a discussion of buffering. A buffer of 5 minutes (50MiB) buys you 5 minutes of not having to be real-time, or to be slowly lagging behind — if that covers 3h of continuous listening time, you probably covered 99% of uses where latency is not a big deal anyway (like playing music — calls and movies are another game).

I already acknowledge practical UX problems with just relying on buffering, but it doesn't make much sense to say how it can't be done because of the protocol either.

s.gif
But the protocol just doesn't support sending audio faster than it's supposed to be played. The sender doesn't know what to send to do what you want. There's no mechanism to do what you want for the headphones.
s.gif
Sure, the current protocol doesn't support it.

But wanting a better protocol isn't 'silly'.

s.gif
Why can’t the headphones buffer the sound for a second? Why would it need protocol support? I’m thinking something like anti-disk-skipping on portable CD players.
s.gif
I only suggested a buffer, not one of an entire song length, so maybe you’ve mistaken me for someone. What I’m trying to figure out is why we can’t apply the same concept as in the anti skipping technology to Bluetooth cutouts.
s.gif
> But blaming your headphones is ignorant - the headphones implement a protocol. They don’t have control over the protocol.

If the headphones are implementing a protocol that isn't suitable for purpose, there is very good reason to blame the headphones. What's the point in having headphones if you need to be in a Faraday cage to use them?

s.gif
If you buy Bluetooth headphones and complain they don’t buffer full songs then that’s your problem, not the headphones.

> What's the point in having headphones if you need to be in a Faraday cage to use them?

Surely it’s the opposite? They don’t work in a Faraday cage, because they’re streaming and need to be connected.

s.gif
> If you buy Bluetooth headphones and complain they don’t buffer full songs then that’s your problem, not the headphones.

What is the use case for headphones that cut out every couple of seconds?

> Surely it’s the opposite? They don’t work in a Faraday cage, because they’re streaming and need to be connected.

In this case the broadcast source would be in the Faraday cage along with the listener.

I’ll link to https://news.ycombinator.com/item?id=29514876 here, it may have valuable insight.

I can use my hdmi ARC soundbar from my computer. We live in a backwards world.

Making a Bluetooth alternative, I think apple is the only company that can pull it off. But they will absolutely do it such that only their headphones will work with Apple devices. And then they will license other manufacturers to be able to use their tech to connect to apple devices.

Otherwise, I see a huge opportunity for a consortium to develop a new hardware and software stack for high quality low latency audio and being them as a package to their products. I would love a completely wireless Dolby Atmos like setup with no central receiver, your mobile device itself being the av receiver. New speakers from any manufacturer and form factor could be added wherever you want as you buy them. Calibration according to your speaker placement would be wireless and automatic.

s.gif
> Making a Bluetooth alternative, I think apple is the only company that can pull it off.

Microsoft did it with Xbox Wireless Protocol which is used to transfer input from controllers and high quality sound without latency.

But, yes. It only works on Xbox or on Windows with an adapter and you can count the manufacturers using it one one hand. Microsoft being the thumb.

s.gif
Ha! Came to post this...I assumed I was the only one to remember it. I got it working when it was part of NCDWare for the NCD X terminals (mostly on the later 700-series terms). Worked, though the audio hardware on the terminals was basic, so it wasn't exactly an audiophile experience. Very clever work, tho.
s.gif
I remember it from the times where you had ESD (enlightenment sound demon) running on Linux, and this in addition. At least that was the default on some Redhat systems, IIRC?
There is. You build a network of analog cables. Use a sound board to 'switch' the channels. This leads to headphones on 3.5mm jacks, and one or more zones of stereos connected by RCA. This 'network' is as solid in 2022 as it was in 1975.
s.gif
thank someone brought this up. Wire is always better than wireless. Wish the world would go back to everything being wired, more secure as well.
If you want to turn your home into a TV/Radio station, have a look at Audio Video Bridging[1]. It requires special hardware, but once you're set up devices can reserve bandwidth for their streams which will be prioritized by switches over other Ethernet traffic thus ensuring 100% reliability and sub-2ms latency accross 7 hops.

[1] https://en.m.wikipedia.org/wiki/Audio_Video_Bridging

I've run into this. I would like to use my Android phone as speaker and microphone for my laptop, so I can walk around without leaving my call. For some reason this is impossible, Bluetooth supports it of course, and so does Pulse Audio on my laptop, but an Android phone will only act as the host not the speaker/mic.

I found SnapCast which lets me send audio from laptop to phone (with huge latency) but not the other way (phone mic to laptop).

If you don't like audio over ethernet do not even think about doing high quality video :-D

Hate to say it but you are probably better off getting something other than Echo Dots for music. Too bad Google discontinued the Chromecast Audio - I love mine. The biggest plus for me is that once you have a compatible app (such as Spotify) streaming is done entirely on the chromecast audio from your internet itself instead of continuing to use your phones battery and wifi.

I too found bluetooth to be unreliable.

For that reason, I have extensively worked with pulseaudio over network. There is no UI that works for this. NTP for some reason is important which seems like bad design to me. zeroconf doesn't work at all.

Once you get it working... dont dare change anything. It will break in inexplicable ways that drive you up a wall.

Firewire was supposed to be an AV standard that allowed connecting anything to anything and completely eliminate the RCA cables etc.

And then the RIAA and MPAA discovered the plan and killed it good.

And while we’re on the subject, why is cell phone audio so horrible? It is worse than that delivered by the cast metal telephones with rotating dials of my youth.
s.gif
It doesn't need to be. With VoLTE the sound quality is usually pretty crisp in my experience. It all depends on the carrying technology, bandwidth, compression parameters and codecs used. EVS supports up to 128kbps audio streams, which makes voice data come across crystal clear, and that's a technology from 8 years ago.

One problem is the fact that the codec needs to be negotiated, and if you're unlucky with codec compatibility, both callers fall back to crappy old codecs. Then there are tons of options for audio profile selection depending on requirements and bandwidth available (see https://en.wikipedia.org/wiki/Enhanced_Voice_Services for an overview) which makes it difficult to say what cause your specific problems.

Without VolTE, you're falling back to 3G audio, probably AMR or AMR-WB, which is quite old and doesn't compress as well as modern standards.

Unless you mean the headphone profile for Bluetooth headsets: that's terrible because the standard is ancient, back when Bluetooth had even less capacity for low latency data transfer, and the codec is suboptimal making the situation even worse. There are better codecs out there, and some headsets will support what some call mSBC, which massively improves the audio quality (but not exactly to a HD audio stream because of limitations). There have been several proprietary attempts to fix this issue, but implementing those solutions costs money so many headphones ship without them.

s.gif
Most likely because those landline phones transmitted via a copper cable while mobile phones send the audio via a heavily compressed and shared wireless connection that isn't exactly all that reliable.

Cabled connections are superior to wireless ones, even more so because traditional landlines had dedicated connections and as such had no need to compress anything.

s.gif
Analog-only phones had great quality because they didn't sample voice. Once phone systems were changed to digital backbones, it became necessary to sample voices, and the sampling rates that were chosen were done so for efficiency using the tech of the time. Usually 4 khz samples. While there are better quality standards today, many phone systems will fall back on old standards.
s.gif
Does your phone not support VoLTE? You might have to explicitly turn it on. Sounds great on my phone.
To pile on further, you may have better success getting a small device (like a pi) and connect the audio out to your speaker.
s.gif
I'm using Amazon Alexa Echo Dots. I really wish they had a line-in connection, as it, too, would make life much easier when I want to play audio from a device.
The PulseAudio protocol supports network audio.
s.gif
At work: why do we have 9 different ways to identify a physical location? Because there are 5 different teams that need to do that, and our team hasn't gotten around to re-inventing the wheel like the other teams have.
There sort of is, RTP/RTSP, and in fact it's been around since the earliest pre-web days of the Internet.

The problem is that it's a protocol with a ton of warts -- having two connections, one UDP and one TCP, has been a massive headache for decades now. But it's not awful enough to get ripped up and redone.

The Asterisk VOIP platform had a really awesome protocol called IAX that was basically RTP with the two streams merged into a single UDP connection (and a bare-bones TCP-like reliability layer for the control frames inside of UDP). IAX was never meant for anything other than VOIP, but I wish it had been turned into a wholesale replacement for RTP. If that had happened, it would have been wonderful.

audio signal is enough. jackd is nice as an option.
The answer is DRM. In fact, almost any audio/video standard attempts have to address the elephants in the room: Disney, Warner Media, Universal Music Group, etc, and they all require DRM.
s.gif
Is there DRM added to bluetooth audio connections?
What about HDRadio? A home scale FM broadcast could accomplish this efficiently and cheaply. Each speaker would just need an FM receiver.

I guess the downside is that your neighbors could listen to whatever you're listening to but who listens to terrestrial radio in their home that is received OTA anymore?

https://en.wikipedia.org/wiki/HD_Radio

https://www.amazon.com/Home-FM-Transmitter-Whole-House/dp/B0...

s.gif
iBiquity (now owned by DTS) has never, to the best of my knowledge, open sourced their HDC codec, nor has it been reverse-engineered. To me that's a show-stopper towards any kind of widespread buy-in of HD Radio beyond commercial stations.

Also, authorities like the FCC take a dim view of FM broadcasting beyond miniscule power levels as seen in car radio adapters due to the easy potential for intentional or unintentional abuse. For example, a 5 Watt FM transmitter sold on eBay may have you thinking it will yield a small amount of power, but spitballing some numbers: outputting it through an FM band turnstile antenna atop a high building or hill could have an Effective Radiated Power in the 7 or 8 kW range, great enough to cover a small city in a round pattern.

Your proposed devices would therefore fall into that very low power range for certification but there would need to be some sort of clear channel hopping required. That's fine in rural areas but quite difficult in large metropolitan areas.

It's a bit funny that there are already a bunch of comments that are stating "There already is, it's called 'X'", each with a different value for X.

I think this paints a better picture of the situation than any one person can provide.

s.gif
>It's a bit funny that there are already a bunch of comments that are stating "There already is, it's called 'X'", each with a different value for X.

It's because the replies are interpreting the op's question differently from the intent.

When op asks: "why there isn't a standard protocol?" -- he's asking "why isn't there a SingleDominantThatWorksOnOnEveryDevice audio protocol that lets me connect devices seamlessly?"

The op's word of "standard" is just doing a lot of heavy lifting to convey a frustration with stuff not working intuitively.

The analogy is TCP/IP being a standard (SingleDominantThatWorksOnOnEveryDevice) network protocol that won over Apple AppleTalk, Novell SPX/IPX, and Microsoft LANMAN NETBIOS.

But many replies interpreted "standard" as "any available existing specification regardless of marketshare or device availability" -- so that's where you get various examples of audio protocols that are idiosyncratic to particular domains which are not analogous to the ubiquity and reliability of TCP/IP. E.g. the Dante audio protocol which doesn't seem relevant to op's use case.

And what's the scope of an "audio protocol"? Is it a "media query of music files" protocol like DLNA? Or is it a "virtual hardware audio device endpoint" like Bluetooth Audio?

s.gif
Yes, that's what I meant.

Why isn't there a widely interoperable audio-over-the-network transmission protocol I can use, so that when I am playing sound (from a song, a video, or a game), I can hear it on an external speaker? [The scope is just a 'virtual hardware audio device endpoint' like bluetooth audio]

s.gif
As someone who works on the code for a competitor of Sonos, the answer is that it is hard to do, depending on your requirements.

> ...(from) a video, or a game

So then you need low latency, like less than 10ms? So that lip-sync works, and the game is playable?

Do you need it distributed across different endpoints, also with low latency?

Does it need to run using unreliable WiFi connections, and not kill all audio just because one endpoint is under-performing?

These are all hard, hard enough that doing it well (and keeping it proprietary) makes companies like Sonos big.

OTOH, streaming mp3 from one endpoint to another is trivial.

s.gif
True enough. Somehow bluetooth audio manages these issues.

I for one would accept latency and the audio going silent (or better, an audio indicator) if the connection isn't up-to-snuff but I don't know if other people would.

s.gif
> Somehow bluetooth audio manages these issues.

It manages it... sometimes.

I have an NVIDIA Shield I use for my video needs, and attempting to pair my Sony WH-1000XM4 headphones with it results in crappy latency and out-of-sync audio. These are both high end products from respected companies, and they work together with pretty shitty results.

Edit: I just tried this again after writing that and magically things work much better than they did before... but I stick by the general point.

In general, I'd describe the Bluetooth experience as mediocre at best.

s.gif
In true HN fashion, first-mover / market-leader Sonos isn't even mentioned yet.

The reason there isn't a standard (other than Sonos, or those discontinued Chromecast dongles) is that you need the following to work seamlessly:

- network attached DAC of some sort (in-speaker, or not; don't care)

- iOS app

- Android app

- the top 10 streaming services

- radio streaming directories, like TuneIn, or the open source ones.

- airplay

- Chromecast

- network / device auto discovery

- sound synchronization

- power management

- desktop apps

- NFS/cifs/etc bridge

- hdmi/fiber/??? bridge

- N.M surround sound (for N = 2, 3,5,7,9 and M=0,1,2)

- Some battery powered, waterproof speaker that works in direct sunlight on hot days

- Hardware distribution at places like IKEA, BestBuy, Amazon, etc.

- A healthy used hardware market

- 10+ year support lifetimes on the speakers + amps (note: discrete, cheap DAC dongles could disrupt sonos on this point)

And other things I forgot about.

s.gif
> In true HN fashion, first-mover / market-leader Sonos isn't even mentioned yet.

I bought my first Sonos device last year, the Roam. Using it as a bluetooth speaker is fine and I love the sound and portability, but oh boy do I hate the experience of trying to use Sonos services over wifi.

Nine times out of ten, perhaps even more often, the iOS app says it can't connect and "let's fix it". If I go through the slow reconnection wizard it invariably ends up telling me to reboot my router(!?). I learned to either switch the Roam on/off a bunch of times, or kill and restart the app a bunch of times, before the app eventually decides yes, it can find the device ... only to then fail again when half hour later I want to add something else to the queue or switch station.

s.gif
Interesting. My experience with the Sonos app has been a revelation in GOOD audio networking experiences. It just works. I download the app - connect to a play 1/3/5 near me and stream music. All in the space of about 2 minutes. Nothing else I've tried comes close to this experience.
s.gif
I've had a (S1) sonos for many years. That only happens to me if the speakers (or phone) are repeatedly falling off the WiFi network.

Try plugging into Ethernet, or placing it close to your router. If that fixes it, then you have a root cause.

s.gif
> In true HN fashion, first-mover / market-leader Sonos isn't even mentioned yet.

market leader? no idea. first-mover? wrong. Slim Devices were the first mover in this space with the Squeezebox (subsequently purchased by Logitech). Sonos came shortly afterwards.

s.gif
In true HN fashion they ignore people were pirating and streaming content to multiple rooms before a big company brand caught onto the idea and profited from it.

I have multi-room streaming using “dumb” speakers, and copper wire (for audio and network). I control one content box and aim it at different speakers from my phone, tablet, laptop. Siri Shortcuts decouple me from waiting for an MBA to approve adding voice commands.

I know; brave flex sticking with simple wire versus going wireless.

s.gif
In the pro world there was Dante, Ravenna, and to a lesser extent AVB. People didn't like that nothing worked with each other. The AES got the AoIP manufacturers together and standardized a union of these technologies and called AES67. Now most pro gear is compatible and it is in widespread use in (mostly) large audio installations (think stadiums/venues, broadcast, theme parks, etc).

There's not much in way of open source solutions to using it, and not many devices you would want to buy as a consumer that uses it, however.

s.gif
> There's not much in way of open source solutions to using it

But there are some? Can an arbitrary Linux box or Raspberry Pi be fitted with free software to receive AES67 over Ethernet from commercial solutions, or is there a catch?

s.gif
Ooooh something I know quite a lot about:

So for AES67 receive, in principle no as PTP stack exists for RPI yet. You could cheat like the majority of manufacturers do and just play the audio as it arrives instead of using the timestamps. You'd also need a way of drifting the audio out clock to match the frequency of the PTP clock. If you didn't care about bitexact audio, you can resample, though ALSAs clock measurement kind of sucks.

s.gif
There's a kernel module for handling the networking connection and exposing it as an alsa device: https://bitbucket.org/MergingTechnologies/ravenna-alsa-lkm/s..., and some FOSS stuff for managing the discovery/control layer. It's not as simple as plugging in a USB device and selecting your i/o, though.
s.gif
That kernel module userland part has an EULA that makes it very much non-free, is it required or do the FOSS alternatives work with the kernel module?
s.gif
That's because "sending audio over a network" isn't a single self-contained problem but a huge area which requires lots of different approaches depending on the specific use case.
You can't even send anything from Apple to non Apple by Bluetooth. Why do you expect audio would work.
s.gif
I don’t understand what you’re saying here.

I listen to my Apple devices on a knock-off add-on Bluetooth for my car with no issues. I’ve sent audio to a vast variety of non-Apple Bluetooth devices. In fact the only Apple-branded BT device I use are my AirPods.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK