Converting and Processing Audio

PA79 · Post by **PA79** » Tue Mar 29, 2022 6:38 am

I just wonder where you got that version from, did you compile it yourself?

There is a github page where you can download the zip file.

fefanto on github:

Hi there @jarl-dk @nikreiman. Don't know if this is still relevant, but I recently forked and modified MrsWatson in order to have DelayCompensation "neutralized", as well as having the output file exactly the same length of the input file.
code is here : https://github.com/fefanto/MrsWatson
all I did was adding a "validSamples" field to the sample buffer object, that is updated accordingly to plugin latency (at first read) and to required samples for aligning output to input (at last read).

Sehr brav

Vielen Dank

emcodem · Post by **emcodem** » Tue Mar 29, 2022 9:30 am

Oh now i see, the build is contained in the source code, got it.
It's a little unfortunate that mrswatson is not continued but i believe it is really mighty and ready to be used in production for certain scenarios anyways.

PA79 · Post by **PA79** » Tue Mar 29, 2022 9:52 am

Could you give me a (for this project) last hint?

I try to remove the last "x" seconds or milliseconds of the audio in the last (make MP3) Node. I think the VST Plugins StereoTool generates theese Cklick Sounds when audio stops. (See Pic)

- I tried the filter "afade" with

Code: Select all

afade=t=out:st=%f_duration%:d=1

- apad with negativ values doesnt work.
- and many many more (about 120 Variations

)

I´m able to cut X Seconds at the beginning but not X Seconds at the end of the file

This is the Original Code withot any fade or cut commands. What Command is able to use and where to place it?

Code: Select all

"%s_ffmpeg%" -i "%s_job_work%\%s_original_name%_WATSON.wav" -filter:a volume=-0.5dB -b:a 320k -codec:a libmp3lame -y "C:\Users\USERNAME\Desktop\VST Host Test\OUT\%s_original_name%_EDIT_%i_hour%_%i_min%_%i_sec%.mp3"

emcodem · Post by **emcodem** » Tue Mar 29, 2022 12:24 pm

Sure but unfortuantely i dont know any way to cut "relative from the end" using ffmpeg, so i can only show you how to do it using a calculation.

What this workflow does is to generate a silent wav file (10s duration but that dont matter), which simulates your mrswatson generated wav file.
Also at this step, i use the commandline processors feature to "set s_source to:" at the bottom - and i set s_source variable to be the generated wav file. (note that it does not matter here which file you submit to the workflow as we always generate a silent file at start and discard whatever you submitted)

%s_source% is set to: %s_job_work%\temp.wav

You would need to do this at the commandline proc that executes mrswatson and insert "set s_source to:" %s_job_work%\%s_original_name%_WATSON.wav

The use of setting s_source to the current working file is that only this way, all the media variables are updated (source file is being analyzed by ffastrans) and we can go on and use the analyzed duration to calculate our final duration - we store the new duration in a "user variable" called f_new_duration. (the f_ at beginning means it is a float number, so a number with comma, this enables us to do do calculations using this variable and the duration variable).
The middle processor is doing the calculation to get the new duration minus a hardcoded value of 0.99 seconds: %f_duration% - 0.99
Instead of 0.99 you can insert your desired value (again in seconds - with dot as comma!

)

After calculating the new duration in seconds (that means e.g. 1 second is 48000 samples and 0.1 seconds is 4800 samples), we just use "-t" in the final ffmpeg command to set the new duration.

Lemme know any question:

discard_last_x_seconds_emcodem.json: (4.13 KiB) Downloaded 124 times

PA79 · Post by **PA79** » Wed Mar 30, 2022 12:06 pm

That works!

I made the Rollout for the Audio Encoder today. "ProudAsHell"
It is now available for my work colleagues in the Network environment.

further development is planned:

We have a text to speech engine project for traffic and weather information in the nightly OnAir program.
I think the development of this project can help a lot with this new Text to Speech Project.
and
implementation in MS Teams/Onedrive environment or even Microsoft Power Automate (send rough audio via Teams and recieve processed Audio back in MS Teams) could be the next step.
So you could encode the Audio during mobile reporting

...so much ideas

Many many Thanks to emcodem and Thomas!!!
Wish you guys the best.

PA

emcodem · Post by **emcodem** » Wed Mar 30, 2022 3:36 pm

Yeeey, well done!
Enjoy your new productive workflow

and let us know any other topics that come up for you over time

PA79 · Post by **PA79** » Mon May 15, 2023 5:04 pm

Hello people,
I have now been able to realize exciting projects with this great tool. I am currently automating many production steps in our radioprogramm.

I have created a ffmpeg command, which creates a radio report with intro, moderation and outro. The outro only starts when the moderation is finished. So far so good.

This is my working "Merge" Command for the Version with intro with attached musicbed, Talk and Outro

Code: Select all

"%s_ffmpeg%" -i "\\10.10.10.213\daten\Dokumente\AUDIO TRANSCODER\In - AHA-Momente\INTRO\intro.wav" -i "%s_job_work%\%s_original_name%_W2.wav" -i "\\10.10.10.213\daten\Dokumente\AUDIO TRANSCODER\In - AHA-Momente\OUTRO\outro.wav" -filter_complex "[1]adelay=6000|6000[delayed];[0][delayed]amix=inputs=2:duration=shortest[audio1];[1]atrim=0:-1[audio2];[audio1][audio2][2]concat=n=3:v=0:a=1[out]" -map "[out]" -f wav -acodec pcm_s16le -ar 44100 "%s_job_work%\%s_original_name%_MIX.wav"

Now I want to go one step further and need your help. Google and ChatGPT do not help at this point because the command seems very complex.
Maybe someone has an idea how I can realize my project with ffmpeg.
The scheme lokks like this: (Picture)

Does anyone have an idea how the scheme shown in the picture could be implemented as ffmpeg within ffastrans?

Can this be solved at all with one command or does it rather need 2 watchfolders for "Talk1" and "Talk2"?
First create a file that contains OPENER, BED and TALK1?
I am grateful for any idea.

BUMPER and MUSIC BED2 can also be one file if it makes it easier.

I assume that the big problem will be the total length of OPENER+TALK+BUMPER since this is always different in length.

My goal is to put 2 Talks in folders and at the end there will be a ready mixed radio report.

I wish you people the best

emcodem · Post by **emcodem** » Tue May 16, 2023 7:03 am

Aye PA, good to hear from you again!
It can be of course done with a single ffmpeg command but it would become pretty bulky i fear - and hard to debug. (another option would be to use avisynth custom script but thats FrankBB's part

)
Best in such cases is usually to first create the individual clips in a way they have just to be concatt'ed in the end, e.g. you first create the standalone segments:
*) talk1+music.wav
*) talk2+music.wav

All you have to do from here is to simply concat the 5 segments.

Regarding the ffastrans specific part, there are lots of choices how you can design the workflow, it all depends on lots of things. You could for example have a watchfolder looking for ONE of the files and tell your users to throw in this ONE file as a LAST thing into the watchfolder. Then you check if all 7 related files exist (or use HOLD to wait for them), create talk1+music, talk2+music, concat, deliver.

But be aware, having to start a workflow from watchfolder with multiple files requires users to provide a strict "naming convention" and as users are no machines, they will fail pretty often to provide the correct naming or the correct order etc...
Personally for such complex cases, i tend to use GUI submit instead of watchfolder submit, e.g. on webinterface we could let the user add 5 files and start one job with the 5 selected files.

Now i depend on you to give some more input about how you could locate your input files, which ones of them are always the same and so on...

On another topic, as you are here already, i wanted to ask you as an audio expert: what is the very best method that you know to filter out voices? Is there a way to e.g. filter the voices nicely and clean from a whole movie?

PA79 · Post by **PA79** » Tue May 16, 2023 9:41 am

Hi Emcodem,

first to your voice filter question. In my opinion the best software to filter voice is Izotope's RX 10. Especially the modules "Dialogue Isolate" and "Music Rebalance" work amazingly well.
There are also some freeware tools or websites on the market that offer this as well. But I think the cleanest results are achieved with Izotope.
You are welcome to send me an audio file for testing, which I can send through Izotope for you. I have the RX9 suite.

and as users are no machines, they will fail pretty often to provide the correct naming or the correct order etc...

Oh Yes!

My goal is for the non-technical person to place their two or even three recorded audio files idiot-proof somewhere. And the machine takes over the production process.

Personally for such complex cases, i tend to use GUI submit instead of watchfolder submit, e.g. on webinterface we could let the user add 5 files and start one job with the 5 selected files.

That would be amazing

however, this is completely new territory for me. but it would be boring not to get involved with new things.

give some more input about how you could locate your input files, which ones of them are always the same and so on...

always the same elements are the opener with attached musicbed - so it is as a single audiofile.
Then the bumper (middle part) also with attached musicbed to talk over and then the outro. (see pic.)

In the picture of the original post I separated the bumper and the music bed because I thought it would be easier to produce that way. But I think if you leave the bumper like the opener it's a bit easier.

3 fixed elements with always the same time and two elements spoken by the presenter with individual length.

I can store the production elements in our server environment so that anyone with a network here in the house could access the future web interface.

then I could move the entire existing workflow that you helped create to a web interface

emcodem · Post by **emcodem** » Tue May 16, 2023 10:44 am

Thanks a lot for the recommendation, i'll check it out. The problem is that i look for something generic, so it would not really help to send some sample clip because it should do a good job on any kind of input (movies, shows, even music clips etc..)
Anyway, i dont want to capture your thread so back to the topic.
Of course you can come up with ffmpeg commands to mix in the voice clips after X seconds or such but in my head, the easiest solution is to work like this:

: openbumpout.png (36.27 KiB) Viewed 1555 times

Each section separated by black line represents one prepared audio file, so you got ready to use, static wav files for:
*) opener
*) music bed 1
*) bumper
*) music bed 2
*) outro

This way, you only need to collect talk1 and 2 from userinput/watchfolder and you have a very easy time to mix talk1+musicbed1 and talk2+musicbed2 in 2 individual simple ffmpeg commands.

Code: Select all

ffmpeg -i \\server\share\static_audios\musicbed1.wav -i \\server\some\folder\talk1.wav -filter_complex amix=inputs=2 -shortest "%s_job_work%\mix1.wav"

Same for musicbed2+talk2.

After that, you just need to stitch opener, mix1, bumper, mix2 and outro into a single file.
This keeps everything very simple to set up and to debug. E.g. you don't need to specify any duration or other variables if you just make sure that the static files music bed1 and 2 are much longer than the actual voiceover can possibly be. The -shortest option would just hard-cut music1 at the end of voiceover.

Regarding webinterface submission, i fear you will need to spend a little time to get into it but in my head, from a user perspective the interface could look like this. The user has to browse for files on the left side and select exactly 2 files for the right side. The workflow can optionally present some options to the user (in my example, i present 3 dropdowns that would link to static wav files on the NAS).

: webint123.png (113.6 KiB) Viewed 1555 times

This way you can provide some helping text to the user, explaining what he has got to do and in your workflow you would not need to check and search for any files, you would just work with the 2 files the user provided.

FFAStrans forum

Converting and Processing Audio

Re: Converting and Processing Audio

Re: Converting and Processing Audio

Re: Converting and Processing Audio

Re: Converting and Processing Audio

Re: Converting and Processing Audio

Re: Converting and Processing Audio

Re: Converting and Processing Audio

Re: Converting and Processing Audio

Re: Converting and Processing Audio

Re: Converting and Processing Audio