2

Accelerate augmentation of bird audio

 10 months ago
source link: https://donghao.org/2023/07/14/accelerate-augmentation-of-bird-audio/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Accelerate augmentation of bird audio

audiomentions is a very convenient library for my bird sound classification. As the code below:

from audiomentations import Compose, AddGaussianNoise, AddGaussianSNR, TimeStretch, PitchShift

        self.augment = Compose([
            AddGaussianNoise(min_amplitude=0.005, max_amplitude=0.015, p=poss),
            AddGaussianSNR(min_snr_in_db=5.0, max_snr_in_db=40.0, p=poss),
            TimeStretch(min_rate=0.8, max_rate=1.2, p=poss),
            PitchShift(min_semitones=-2, max_semitones=2, p=poss)
        ])
Python
from audiomentations import Compose, AddGaussianNoise, AddGaussianSNR, TimeStretch, PitchShift
        self.augment = Compose([
            AddGaussianNoise(min_amplitude=0.005, max_amplitude=0.015, p=poss),
            AddGaussianSNR(min_snr_in_db=5.0, max_snr_in_db=40.0, p=poss),
            TimeStretch(min_rate=0.8, max_rate=1.2, p=poss),
            PitchShift(min_semitones=-2, max_semitones=2, p=poss)
        ])

These four augmentation methods are enough for current training. But the PitchShift method will cost a lot of CPU resources therefore the GPU couldn’t run to full load and the CPU usage jumps to 100%.

Failed to find an audio augmentation library that uses GPU, I started to check the source code of “audiomentions” and noticed that it uses librosa as its implementation:

        try:
            pitch_shifted_samples = librosa.effects.pitch_shift(
                samples, sr=sample_rate, n_steps=self.parameters["num_semitones"]
            )
        except librosa.util.exceptions.ParameterError:
Python
        try:
            pitch_shifted_samples = librosa.effects.pitch_shift(
                samples, sr=sample_rate, n_steps=self.parameters["num_semitones"]
            )
        except librosa.util.exceptions.ParameterError:

Then the code of “librosa” for “pitch_shift”:

def pitch_shift(
    y: np.ndarray,
    *,
    sr: float,
    n_steps: float,
    bins_per_octave: int = 12,
    res_type: str = "soxr_hq",
    scale: bool = False,
    **kwargs: Any,
) -> np.ndarray:
Python
def pitch_shift(
    y: np.ndarray,
    *,
    sr: float,
    n_steps: float,
    bins_per_octave: int = 12,
    res_type: str = "soxr_hq",
    scale: bool = False,
    **kwargs: Any,
) -> np.ndarray:

The default “res_type” for “pitch_shift” is “soxr_hq”. This is a slow resource. After changing “it”res_type” to “linear” in “audiomentions”, the CPU usage jumps back to 50% on my desktop and the GPU ramps up to 100% when training.

—— 2023.07.28 ——

Thanks for the correction from Iver.

After I run this test snippet:

import time
import librosa

sound, sr = librosa.load("./song/background/AirportAnnouncements_1.wav")

for resource in [None, "linear", "soxr_hq", "kaiser_best"]:
    begin = time.time()
    for _ in range(10):
        if resource:
            librosa.effects.pitch_shift(sound, sr=sr, n_steps=1, res_type=resource)
        else:
            librosa.effects.pitch_shift(sound, sr=sr, n_steps=1)
    if resource:
        print(f"{resource} time:", time.time() - begin)
    else:
        print("default time:", time.time() - begin)
Python
import time
import librosa
sound, sr = librosa.load("./song/background/AirportAnnouncements_1.wav")
for resource in [None, "linear", "soxr_hq", "kaiser_best"]:
    begin = time.time()
    for _ in range(10):
        if resource:
            librosa.effects.pitch_shift(sound, sr=sr, n_steps=1, res_type=resource)
        else:
            librosa.effects.pitch_shift(sound, sr=sr, n_steps=1)
    if resource:
        print(f"{resource} time:", time.time() - begin)
    else:
        print("default time:", time.time() - begin)

and got the result

default time: 8.455572366714478
linear time: 3.3037502765655518
soxr_hq time: 3.3474862575531006
kaiser_best time: 8.467342615127563
Python
default time: 8.455572366714478
linear time: 3.3037502765655518
soxr_hq time: 3.3474862575531006
kaiser_best time: 8.467342615127563

Iver is right: the soxr_hq is as fast as linear. And the actual default res_type of librosa which I was using is kaiser_best.

Related Posts

July 14, 2023 - 3:30 RobinDong machine learning
librosa, python
2 Comments

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Save my name, email, and website in this browser for the next time I comment.

2 thoughts on “Accelerate augmentation of bird audio”

  1. soxr_hq is actually just as fast as linear, or faster, while also being a higher quality resampler (less aliasing). The reason you were experiencing slow execution was that it was actually using kaiser_best, not soxr_hq.

    Ref https://github.com/iver56/audiomentations/pull/280#issuecomment-1576609746

    Since audiomentations 0.31.0 the fast resampler is used for pitch shifting.

    • Really thanks for your reply.
      You are right. `soxr_hq` is as fast as `linear, and the actual default res_type of my version of librosa is `kaiser_best`. I have cloned the up-to-date version source code of librosa, which is not the actually used one.

      Also, thanks for your `audiomentations`. It really helped my proejct.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK