![](/style/images/good.png)
![](/style/images/bad.png)
Accelerate augmentation of bird audio
source link: https://donghao.org/2023/07/14/accelerate-augmentation-of-bird-audio/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Accelerate augmentation of bird audio
audiomentions is a very convenient library for my bird sound classification. As the code below:
from audiomentations import Compose, AddGaussianNoise, AddGaussianSNR, TimeStretch, PitchShift self.augment = Compose([ AddGaussianNoise(min_amplitude=0.005, max_amplitude=0.015, p=poss), AddGaussianSNR(min_snr_in_db=5.0, max_snr_in_db=40.0, p=poss), TimeStretch(min_rate=0.8, max_rate=1.2, p=poss), PitchShift(min_semitones=-2, max_semitones=2, p=poss) ])
from audiomentations import Compose, AddGaussianNoise, AddGaussianSNR, TimeStretch, PitchShift
self.augment = Compose([
AddGaussianNoise(min_amplitude=0.005, max_amplitude=0.015, p=poss),
AddGaussianSNR(min_snr_in_db=5.0, max_snr_in_db=40.0, p=poss),
TimeStretch(min_rate=0.8, max_rate=1.2, p=poss),
PitchShift(min_semitones=-2, max_semitones=2, p=poss)
])
These four augmentation methods are enough for current training. But the PitchShift
method will cost a lot of CPU resources therefore the GPU couldn’t run to full load and the CPU usage jumps to 100%.
Failed to find an audio augmentation library that uses GPU, I started to check the source code of “audiomentions” and noticed that it uses librosa as its implementation:
try: pitch_shifted_samples = librosa.effects.pitch_shift( samples, sr=sample_rate, n_steps=self.parameters["num_semitones"] ) except librosa.util.exceptions.ParameterError:
try:
pitch_shifted_samples = librosa.effects.pitch_shift(
samples, sr=sample_rate, n_steps=self.parameters["num_semitones"]
)
except librosa.util.exceptions.ParameterError:
Then the code of “librosa” for “pitch_shift”:
def pitch_shift( y: np.ndarray, *, sr: float, n_steps: float, bins_per_octave: int = 12, res_type: str = "soxr_hq", scale: bool = False, **kwargs: Any, ) -> np.ndarray:
def pitch_shift(
y: np.ndarray,
*,
sr: float,
n_steps: float,
bins_per_octave: int = 12,
res_type: str = "soxr_hq",
scale: bool = False,
**kwargs: Any,
) -> np.ndarray:
The default “res_type” for “pitch_shift” is “soxr_hq”. This is a slow resource. After changing “it”res_type” to “linear” in “audiomentions”, the CPU usage jumps back to 50% on my desktop and the GPU ramps up to 100% when training.
—— 2023.07.28 ——
Thanks for the correction from Iver.
After I run this test snippet:
import time import librosa sound, sr = librosa.load("./song/background/AirportAnnouncements_1.wav") for resource in [None, "linear", "soxr_hq", "kaiser_best"]: begin = time.time() for _ in range(10): if resource: librosa.effects.pitch_shift(sound, sr=sr, n_steps=1, res_type=resource) else: librosa.effects.pitch_shift(sound, sr=sr, n_steps=1) if resource: print(f"{resource} time:", time.time() - begin) else: print("default time:", time.time() - begin)
import time
import librosa
sound, sr = librosa.load("./song/background/AirportAnnouncements_1.wav")
for resource in [None, "linear", "soxr_hq", "kaiser_best"]:
begin = time.time()
for _ in range(10):
if resource:
librosa.effects.pitch_shift(sound, sr=sr, n_steps=1, res_type=resource)
else:
librosa.effects.pitch_shift(sound, sr=sr, n_steps=1)
if resource:
print(f"{resource} time:", time.time() - begin)
else:
print("default time:", time.time() - begin)
and got the result
default time: 8.455572366714478 linear time: 3.3037502765655518 soxr_hq time: 3.3474862575531006 kaiser_best time: 8.467342615127563
default time: 8.455572366714478
linear time: 3.3037502765655518
soxr_hq time: 3.3474862575531006
kaiser_best time: 8.467342615127563
Iver is right: the soxr_hq
is as fast as linear
. And the actual default res_type of librosa which I was using is kaiser_best
.
Related Posts
- Accelerate reading of NumPy array from files
In the training process, I need to read array data from .npy file and get…
- "Show" the sound of a bird
Seems librosa is a really popular python library for audio processing. By using librosa, I…
- Accelerate the speed of data loading in PyTorch
I got a desktop computer to train deep learning model last week. The GPU is…
July 14, 2023 - 3:30
RobinDong
machine learning
librosa, python
2 Comments
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
Website
Save my name, email, and website in this browser for the next time I comment.
2 thoughts on “Accelerate augmentation of bird audio”
-
soxr_hq is actually just as fast as linear, or faster, while also being a higher quality resampler (less aliasing). The reason you were experiencing slow execution was that it was actually using kaiser_best, not soxr_hq.
Ref https://github.com/iver56/audiomentations/pull/280#issuecomment-1576609746
Since audiomentations 0.31.0 the fast resampler is used for pitch shifting.
-
Really thanks for your reply.
You are right. `soxr_hq` is as fast as `linear, and the actual default res_type of my version of librosa is `kaiser_best`. I have cloned the up-to-date version source code of librosa, which is not the actually used one.Also, thanks for your `audiomentations`. It really helped my proejct.
-
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK