June 7, 2023June 8, 2023

State of screen reading reading on desktop Linux

Reading a computer screen wears out your delicate eye-balls. I would like the computer to read some web-pages aloud for me so I can use my ears instead.

Here’s what I found out recently about the available text-to-speech technology we have on desktop Linux today. (This is not a comprehensive survey, just the result of some basic web searches on the topic).

The Read Aloud browser extension

Read Aloud is a browser extension that can read web pages out for you. That seems a nice way to take a break from screen-staring.

I tried this in Firefox and, it worked, but sounded like a robot made from garbage. It wasn’t pleasant to listen to articles like that.

Read Aloud supports some for-pay cloud services that probably sound better, but I want TTS running on my laptop, not on Amazon or Google’s servers.

Speech Dispatcher

The central component for text-to-speech on Linux is Speech Dispatcher. Firefox uses Speech Dispatcher to implement the TTS part of the Web Speech API. This is what the Read Aloud extension is then using to read webpages.

You can test Speech Dispatcher on your system using the spd-say tool, e.g.

spd-say "Let's see how speech-to-text works"

You might hear the old-skool espeak-ng voice robotically reading out the text. espeak was incredible technology when it was released in 1995 on RISC OS as a 7KB text-to-speech engine. It sounds a little outdated in 2023.

Coqui TTS

Mozilla did some significant open research into text-to-speech as part of the “Mozilla TTS” project. After making great progress they stopped development (you may have heard this story before), and the main developers set up Coqui AI to continue working on the project. Today this is available for you as Coqui TTS.

You can try it out fairly easily via a Docker image, the instructions are in the README file. I spent some time playing with Coqui TTS and learned a lot about modern speech synthesis, which I will write up separately.

The resource consumption of Coqui TTS is fairly high, at least for the higher quality models. We’re talking GBs of disk space, and minutes to generate audio.

It’s possible that GPU acceleration would help, but I can’t use that on my laptop as it requires a proprietary API that only works on a certain brand of GPU. It’s also likely that exporting the models from PyTorch, using TorchScript or ONNX, would make them a lot more lightweight. This is on the roadmap.

Piper

Thanks to an issue comment I then discovered Piper. This rather amazing project does TTS at a similar quality to Coqui TTS, but additionally can export the models in ONNX format and then use onnxruntime to execute them, which makes them lightweight enough to run on single-board computers like the Raspberry Pi (remember those ?).

It’s part of a project I wasn’t aware of called Home Assistant, which aims to develop an open-source home assistant, and is being driven by a company called Nabu Casa. Something to keep an eye on.

Thanks to Piper I can declare success on this mini-project to get some basic screen reading functionality on my desktop. When I get time I will write up how I’ve integrated Piper with Speech Dispatcher – it was a little tricky. And I will write up the short research I did into the different Coqui TTS models that are available. Speak soon!

State of screen reading reading on desktop Linux – Sam Thursfield

State of screen reading reading on desktop Linux

The Read Aloud browser extension

Speech Dispatcher

Coqui TTS

Piper

Recommend

经纬徐传陞：做VC这么多年，人们总在问我同一个问题

Apple Reportedly Can't Make Many Vision Pros Due To Supply Constraints

Greedy Algorithm | 一直进步做喜欢的

Graph Theory | 一直进步做喜欢的

Sort Algorithm | 一直进步做喜欢的

Trustwave report finds attacks targeting Microsoft's MS SQL are skyrocketing

专访品众创新腾讯运营总经理张大波：素材能力是投放的终极

C 语言教程： Hello, World!

After Decades of Independence, Moog Sells Out Again

Forza Motorsport 8 could be the last entry in the long-running series | TechSpot

About Joyk