Stitch or Concatenate

emcodem · Post by **emcodem** » Sun Sep 13, 2020 8:12 pm

Hey Crispy, thanks for sharing your workflows!

Hm, i am not very experienced with watchfolders in ffastrans, trying to avoid them whenever possible, but norquas here does in a cmd executor:

Code: Select all

%comspec% /c "if exist "%s_cache_record%" del /f /q  "%s_cache_record%""

, maybe this helps? To be honest, i don't 100% understand your questions because monitors dont really remember folders as such as far as i am aware of (only files)
http://www.ffastrans.com/frm/forum/view ... cord#p4993

I spent some time and pimped the find files processor so it can build a ffconcat list for you (not sure if its really helpful). It would be good to get this whole thing into one single workflow. It would be preferable if your users were able to start the process by creating a special file in the folder of interest when they are done moving all their stuff there. E.g. watch only for '.txt file and tell the users they shall start the process by creating some txt file or such.

Get the 1.3 Find Files here, set it to output ffconcat, and write the contents of the corresponding "found files" user_var to a file using the text file processor.
http://ffastrans.com/wiki/doku.php?id=c ... processors

What i don't really currently understand is how you get the sorting done correctly, or don't you care about the sorting? Well anyway, i built sorting by name and date created into the Find files processor.

If you also want to use the find files proc for the deletion when you are done, you would have to insert it a second time in your workflow and set the output type to "json", then connect the output to the custom processor "files delete".

Also, if this is somehow important for you, i imagine it would be pretty easy to pimp the webinterface and allow users to select a folder for workflow starting on the webinterface. Or they select the list files to be stitched and submit the whole list into one workflow...

crispyjones · Post by **crispyjones** » Fri Sep 18, 2020 4:26 pm

Thanks for pimping out "files find" very generous! Unfortunately I could not get it to work. I would submit a bunch of files to it but end up with a text file that was blank except for the single line at the top 'ffconcat version 1.0'. Is that how I am supposed to use it, manually submitting files to it?

My use of watch folders comes from my background using Vantage (and flipfactory before that), nearly all of our workflows use watch folders. Our users are familiar with the "drop a file in here and it magically appears over there" flow. So habit and environment is the reason why I use watch folders so much. I don't think modifying the webinterface would benefit us much, but thanks for the idea.

I am getting inconsistent behavior when it comes to "forget missing files" in subdirectories. Testing on my production machine it seems to work, running as a service. Testing on my desktop where I run as an application sometimes it will process a few files the second time, sometimes none. I agree that it would be ideal to have this a single workflow but I've never been able to figure out how to make the front half run on every file and create the concat file but have the second half only run once.

As for sorting I thought, like you, that this would be a big problem when I was building this workflow, but that wasn't the case. Maybe because it only gets used for things like gopro media, or other large groups of files with numerically incremented filenames ie gopro0001, gopro0002 etc. Perhaps windows copy behavior drops files in the watch folder in numerical order? I bet if someone dropped in a bunch of randomly named files the outcome would be a mess.

I've modified the original workflow slightly to not recurse sub-directories and simply use the name of the last submitted file as the file name. I'll keep an eye on it and see how that behaves.

emcodem · Post by **emcodem** » Sun Sep 20, 2020 3:05 pm

OK, sorry for all the quoting but i didnt see a better way to split off all the questions...

crispyjones wrote: ↑Fri Sep 18, 2020 4:26 pm Thanks for pimping out "files find" very generous! Unfortunately I could not get it to work. I would submit a bunch of files to it but end up with a text file that was blank except for the single line at the top 'ffconcat version 1.0'. Is that how I am supposed to use it, manually submitting files to it?

The find files proc does not do anything automatically, you need to tell it where to look for files, use %s_original_path% in the "path" input. Manually submitting a file is mostly the same as a watchfolder picking up a file, you just use manual submission for testing and developing.

In order to give you some insight into my thinking, i created some example workflow for this usecase, maybe this should go into a "workflow template" section in the wiki. The Idea is that it picks up ONE file in the watcholder, e.g. *0001.mp4*, list all *.mp4 files in the directory where this file is, sorted by name, concats them and delivers the final output to c:\temp, then it deletes the input files. Note that the delete processor is set to "test mode" and you need to uncheck test mode if you want to deliver

NOTE: please install the "Files Find" and "Files Delete" processors from here to be able to drive this workflow:
http://www.ffastrans.com/wiki/doku.php? ... processors

Workflow:

ffconcat_example_with_delete.json: (10.6 KiB) Downloaded 289 times

crispyjones wrote: ↑Fri Sep 18, 2020 4:26 pm I am getting inconsistent behavior when it comes to "forget missing files" in subdirectories. Testing on my production machine it seems to work, running as a service. Testing on my desktop where I run as an application sometimes it will process a few files the second time, sometimes none.

That sounds like a bug, a call for @admin

crispyjones wrote: ↑Fri Sep 18, 2020 4:26 pm I agree that it would be ideal to have this a single workflow but I've never been able to figure out how to make the front half run on every file and create the concat file but have the second half only run once.

This is why i said that it would be preferrable that the watchfolder only picks up a single file like only watching for *001* in case your files are always numbered like that pattern. Or as i said, the user could just generate some txt file in the watchfolder to kick off the processing. There are many options for that starting the process...
Kicking off one job for each file feels a little bit too messy for me, are you sure you cannot configure the watchfolder to pickup only one file?
Maybe you will understand my idea better when using the files find processor

crispyjones wrote: ↑Fri Sep 18, 2020 4:26 pm As for sorting ...

Hehe, i am happy that you were surprised about the accidently correct order, i would have been surprised too

Post by **admin** » Sun Sep 20, 2020 9:15 pm

Hi cristpyjones,

You write:

"...sometimes it will process a few files the second time, sometimes none."

Could you please elaborate on that? You remove a file, put it back, and sometimes it's picked up again and some times it's not?

-steinar

crispyjones · Post by **crispyjones** » Wed Sep 23, 2020 2:36 am

Hi Steinar. The behavior was odd. For example I would put multiple .mov files (lets call them mov1 to mov10) in a folder called TestConcat and drop that in the watch folder and all the files would be processed, the workflow would complete and delete everything. I would drop the exact folder with the same files in the watch folder again and only mov1 and mov2 would be processed. Unfortunately I've run through so many iterations of this workflow I am not sure I can find the exact version that exhibited that behavior. When I have a bit more time I will try and track it down.

What I can say with some certainty is that the workflows I posted here does not seem to obey the "forget missing files" check box. If I drop the same example folder in this workflow it will process it the first time. If I drop the folder in again, nothing will happen until I 1: change the subdirectory name (TestConcat to TestConcat1) and then all files within it are processed or 2: change a filename within the subdirectory (TestConcat>mov1 to TestConcat>mov1a) and then only files with new names are processed.

@emcodem, thanks for the example workflow. I am kind of busy tonight but at first glance I see some of the mistakes I made in assuming how the find files node functioned. I will try it out soon and let you know the results.

Post by **admin** » Thu Sep 24, 2020 6:46 am

Hi Crispyjones,

I have found the cause of your issue with forgetting missing files, and it was a bug directly related to sub-folders. It has now been fixed in the upcoming versions and hopefully it should work in all scenarios now. Thanks for notifying!

-steinar

crispyjones · Post by **crispyjones** » Fri Sep 25, 2020 10:15 pm

Thanks for the buq squashing Steinar!

@emcodem, I have had a bit of time to spend with the findfiles processor as well as your sample workflow.

The sample workflow was a good starting point and helped me troubleshoot what I was doing wrong. Unfortunately, while it does concat files, it does it in a progressive way and I end up with four (or whatever # max active jobs is set to) concat files at the output.

The sample workflow did help me figure out why I was getting concat files that contained no files, and that is the quirky accept deny behavior. It appears you must enter some sort of accept regex or nothing gets processed, this is slightly different behavior than I am familiar with. Normally if I leave the accept filter blank (in a watch folder) and only populate the deny attributes the node will process everything except for the items that meet the deny regex. In the case of findfiles the accept regex seems to override the deny, so if I accept *.* (without that nothing gets added to the final concat file) and deny *.txt, I still end up with a something.txt file in the final concat text file. This leaves the unappetizing solution of filling the accept row with every kind of media file I can think of.

In the end I think this is all probably beside the point. My goal was to leverage the efficiency of the concat step, to very quickly concatenate a large number of identical file types (gopro, dashcam, security footage, etc) and then do a single transcode to my final file type. If it is a single workflow you lose the efficiency of the concat step because you are progressively building the final in steps, adding a transcode to each step would make it very slow. If I used the example workflow and sorted out how to make it output a single final file, I would still have to transcode it to my final file format, making it a two part workflow again.

The issue I am struggling with is that the two part workflow presents a problem more related to the way watch folders deal with growing files. Step 1 will process files at the rate of "max active jobs"*sleep timer and create an ever growing concat file. Step 2 is set to "check growing once" so that while Step 1 is creating the concat.txt file Step 2 shouldn't trigger until step 1 is complete. The issue I've run into is that if the concat.txt file grows at an uneven rate, lets say the watch folder is filled with 5 small 50mb files but the sixth file is a large 2Gb file, the concat.txt stops growing while the large file is being uploaded into the watch folder this step 2 triggers and creates a concatenated file of just the first 5. Lengthening the sleep timer of step2 helps, but that's a rather blunt instrument because it introduces unnecessary delays for small files and still might fail on large files or if network congestion is high. I think the best solution with the tools available is to have step1 create a concatNOT_READY.txt file and when it is finished, change the name manually to concat.txt and have step2 trigger on that. This requires manual user intervention, which is a big downer.

This is a very long winded way of saying I haven't yet figured out how to make the concat workflow function to my liking. A big thanks to all the help so far but I'd rather not pester any of you with the issue unless you have nothing better to do. I am putting this on my back burner for now. If I have any lightbulb moments I will share them here.

emcodem · Post by **emcodem** » Sat Sep 26, 2020 10:25 am

Aye Mr Jones

ok, i have to admit that i also wanted to make some advertisements for the files find proc and introduce it to you. Now that you know it i am happy and we can turn on solving your problem again hehe

Until now i assume that your users are "moving" the files into the watchfolder so no check for growing or such needed. Now that i know this is not the case i understand your overall design a little better and my thinking now goes in direction of using the webinterface scheduler to detect if all files are there and a job needs to be started.

What is the condition for you that qualifies that "all files are there", is it like "the youngest file was not changed for more than x minutes"?

crispyjones · Post by **crispyjones** » Sat Sep 26, 2020 11:43 pm

The most concise way I could state the requirement using my current two part workflow would be "If Step1 watch folder has growing files do not trigger step2." If this logic could be enforced then step 2 would "know" that step1 is ingesting one large clip and even though the resulting "concat.txt" file has not changed for some time, do not trigger.

I think I would be better equipped if I had a better understanding of the watch folder process. The system I am familiar with is Telestream Vantage, it uses a cycle and count method of dealing with watch folders. The cycle time is the amount of time in seconds between file checks (this seems analogous to the FFAStrans sleep timer). The count is the number of iterations the file in the watch folder must remain unaltered (no change in size or modification date) for the job to be initiated. So a cycle of 5 seconds and a count of 2 would mean that a minimum delay of 10 seconds would occur after a file transfer is complete. There is also a "submit immediately" option that overrides the cycle/count check. Because Vantage uses modification date as a check it essentially always "forgets" files of the same name. I give this info so you can see how I think of watch folder performance and how I might understand FFAStrans behavior via analogy.

FFAStrans has three watch folder states. The "check growing files" checkbox is unchecked, the checkbox is checked and "once" is selected, and finally the checkbox is checked and "continuously" is selected. Lets assume a sleep timer of 10 seconds for these questions. Lets also assume a rather large hypothetical file is being transferred that exceeds 10 seconds of transfer time to the watch folder. I have read both the wiki and the node helpfile, this is just to help me clarify exactly what is happening.

check box state
Unchecked: What is the default behavior? The fact that "check growing files" isn't checked makes it seem like this will submit as soon as the file is present and the sleep timer expires regardless if file is growing. Is this like Vantage submit immediately?
checked once: wiki states This option will make FFAStrans wait to see if incoming files are still growing Does this mean if is growing for 60 seconds that it will only check once and kick on the third cycle? cycle 1>file present>cycle 2>file still growing (checked once) cycle3> start job even though the file is still growing. Would this be a Vantage cycle 10 count 2?
checked continuously: wiki states This option will make FFAStrans execute new job every time the file(s) have changed file size Sounds like the job will be initiated every time the file changes size. I have trouble thinking of a use case, but I understand what is happening. I am not sure vantage could perform this action.

Post by **admin** » Sun Sep 27, 2020 9:35 am

Ok so let me try and explain how it works:

Unchecked:
The default behavior is to disregard file size in total. This mean FFAStrans will only check if the file is readable and not locked in any way. In the case of files being copied over a typical SMB, the file is locked until copying is finished. This will prevent FFAStrans from picking it up until the lock is released. In the case of lets say an MXF is being written to a share from an NLE, the file is probably readable and NOT locked. This would make FFAStrans "belive" the file is ok and ready for processing, hence you need to enable checking growing files.

Checked once:
I can see that "once" might be a bit confusing but here we go: Once in this regard means "one complete cycle without file size increasing". So if the file size grows for lets say 5 complete cycles, the file will not be picked up until the 6'th cycle if there was no change from the 5'th. So let me try and visualize:

cycle 1-> file size > initial pickup size
cycle 2-> file size > cycle 1
cycle 3-> file size > cycle 2
cycle 4-> file size > cycle 3
cycle 5-> file size > cycle 4
cycle 6-> file size = cycle 5 -> start job

Checked continuously:
Your understanding of this feature is pretty much accurate, except that it will behave exactly the same way as "once" every time it's re-run. The difference is that in order for a re-run to occur, the size must be different from previous run.

cycle 1-> file size > initial pickup size
cycle 2-> file size > cycle 1
cycle 3-> file size > cycle 2
cycle 4-> file size > cycle 3
cycle 5-> file size > cycle 4
cycle 6-> file size = cycle 5 -> start job

...time... (...making coffee)

cycle 1-> file size = previous cycle

...time... (...picking a bale of cotton)

cycle 1-> file size = previous cycle

...time... (...changing file size)

cycle 1-> file size ≠ previous cycle
cycle 2-> file size > cycle 1
cycle 3-> file size > cycle 2
cycle 4-> file size = cycle 3 -> re-start job

Hope this helps understand how the shit works

If not, just ask again.

-steinar

FFAStrans forum

Stitch or Concatenate

Re: Stitch or Concatenate

Re: Stitch or Concatenate

Re: Stitch or Concatenate

Re: Stitch or Concatenate

Re: Stitch or Concatenate

Re: Stitch or Concatenate

Re: Stitch or Concatenate

Re: Stitch or Concatenate

Re: Stitch or Concatenate

Re: Stitch or Concatenate