Discussion:
[SoX-devel] silence problems
Jan Stary
2014-03-18 11:09:21 UTC
Permalink
There seem to be problems with the silence effect.
I believe it has been brought up some time ago,
but here is a (longer) complete story with examples.

This is the test file I will be testing on, using 14.4.1:
sox -D -n -c 1 file.wav synth 3 trap 440 sin 480 gain -6 pad ***@0 ***@1 ***@2 ***@3
That makes it three seconds of a dial tone, interpadded with four
seconds of silence: "silence TONE silence TONE silence TONE silence",
seven seconds in total.

First, basic silence trimming at the beginning of the file:

When above-periods is non-zero, you must also specify a duration
and threshold. Duration indications the amount of time that non-
silence must be detected before it stops trimming audio.

I think it would be an improvement if the manpage said explicitly
that the non-silence that is detected (and stops the trimming)
remains itself in the output stream; as opposed to only starting
the output with samples that come _after_ that non-silence.

At least that's how I understand what the manpage says,
and it is what any of the following commands does,
resulting in six seconds starting with the first tone.

sox file.wav out.wav silence 1 0.1 10%
sox file.wav out.wav silence 1 0.5 10%
sox file.wav out.wav silence 1 1.0 10%

There seems to be some rounding (buffers?) involved, e.g.

sox file.wav out.wav silence 1 1.01 10%

produces the same, although there is no occurence
of a non-silence of length 1.01 in the source file.
On the other hand,

sox file.wav out.wav silence 1 1.02 10%

already does the expected thing, i.e. results in an empty file.
However, SoX does not fill in the zero length in the header:

Input File : 'out.wav'
Channels : 1
Sample Rate : 48000
Precision : 32-bit
Sample Encoding: 32-bit Signed Integer PCM


Now, trimming up to the _second_ non-silence
already presents a problem for me:

sox file.wav out.wav silence 2 0.1 10%

I would expect this to trim the leading "silence TONE silence"
and result in an output file starting with the second TONE
(as the second above-period). That's the intended behaviour, right?

For example, if you had an audio file with two songs that each
contained 2 seconds of silence before the song, you could specify
an above-period of 2 to strip out both silence periods and the first
song.

That's my situation. But no, the result is a 00:00:05.90 file
where the first silence and the first 0.1 second of the first
tone are removed. If this is the intended behaviour,
the two-songs example is wrong.

It seems that instead of the first TONE counting
as the first above-period (to be trimmed) and the second TONE
counting as the second above-period (to start the output),
only the first 0.1 seconds if the first TONE count as
the first above-period (trimmed), and after that the output begins.
That's what the above command seems to do.

But is that intended? With the above two-songs example
from the manpage, specifying "silence 2 3 2%" would
just trim the first silence and the first 3 seconds
of the first song, as in my example, right? Let's try:

sox -D -n -c 1 songs.wav synth 60 trap 440 sin 480 gain -6 pad ***@0 ***@30
That's 00:02 of silence, 00:30 of song, 00:02 of silence, 00:30 of song,
as in the manpage example. Now running

sox songs.wav out.wav silence 1 3 10%

does the expected thing: trims the first 00:02 of silence away,
and leaves the rest as 00:30 + 00:02 + 00:30 of output.

Now running "sox songs.wav out.wav silence 2 3 10%" should trim
the first silence, the first song, and the second silence - right?
That's what the example says, but that's not the case:
the result is the same as before, i.e. only the first
00:02 of silence is removed.

That seems wrong, and is also inconsistent with the previous example:
if it was to do the same, the first 00:03 above-period (i.e. the first
00:03 of the first song) would be removed and the rest would
go in the output, right?

Whichever the expected behaviour is, there seems to be a bug.
Or am I missing something in what the manage says?

There are other problems with the silence effect
(trimming from the end), but let's resolve this first.

Thank you for you time

Jan

Loading...