2

State of screen reading reading on desktop Linux – Sam Thursfield

 1 year ago
source link: https://samthursfield.wordpress.com/2023/06/07/state-of-screen-reading-reading-on-desktop-linux/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

State of screen reading reading on desktop Linux

Reading a computer screen wears out your delicate eye-balls. I would like the computer to read some web-pages aloud for me so I can use my ears instead.

Here’s what I found out recently about the available text-to-speech technology we have on desktop Linux today. (This is not a comprehensive survey, just the result of some basic web searches on the topic).

The Read Aloud browser extension

Read Aloud is a browser extension that can read web pages out for you. That seems a nice way to take a break from screen-staring.

I tried this in Firefox and, it worked, but sounded like a robot made from garbage. It wasn’t pleasant to listen to articles like that.

Read Aloud supports some for-pay cloud services that probably sound better, but I want TTS running on my laptop, not on Amazon or Google’s servers.

Speech Dispatcher

The central component for text-to-speech on Linux is Speech Dispatcher. Firefox uses Speech Dispatcher to implement the TTS part of the Web Speech API. This is what the Read Aloud extension is then using to read webpages.

You can test Speech Dispatcher on your system using the spd-say tool, e.g.

spd-say "Let's see how speech-to-text works"

You might hear the old-skool espeak-ng voice robotically reading out the text. espeak was incredible technology when it was released in 1995 on RISC OS as a 7KB text-to-speech engine. It sounds a little outdated in 2023.

Coqui TTS

Mozilla did some significant open research into text-to-speech as part of the “Mozilla TTS” project. After making great progress they stopped development (you may have heard this story before), and the main developers set up Coqui AI to continue working on the project. Today this is available for you as Coqui TTS.

You can try it out fairly easily via a Docker image, the instructions are in the README file. I spent some time playing with Coqui TTS and learned a lot about modern speech synthesis, which I will write up separately.

The resource consumption of Coqui TTS is fairly high, at least for the higher quality models. We’re talking GBs of disk space, and minutes to generate audio.

It’s possible that GPU acceleration would help, but I can’t use that on my laptop as it requires a proprietary API that only works on a certain brand of GPU. It’s also likely that exporting the models from PyTorch, using TorchScript or ONNX, would make them a lot more lightweight. This is on the roadmap.

Piper

Thanks to an issue comment I then discovered Piper. This rather amazing project does TTS at a similar quality to Coqui TTS, but additionally can export the models in ONNX format and then use onnxruntime to execute them, which makes them lightweight enough to run on single-board computers like the Raspberry Pi (remember those ?).

It’s part of a project I wasn’t aware of called Home Assistant, which aims to develop an open-source home assistant, and is being driven by a company called Nabu Casa. Something to keep an eye on.

Thanks to Piper I can declare success on this mini-project to get some basic screen reading functionality on my desktop. When I get time I will write up how I’ve integrated Piper with Speech Dispatcher – it was a little tricky. And I will write up the short research I did into the different Coqui TTS models that are available. Speak soon!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK