Combine a list element¶
Problem¶
Say that you have a tool which takes a parameter and a bunch of files and does something with those. This is exemplified by the process CAT
below. My first approach was to collect
the files before combining them with the parameter. See the following workflow as an example.
- Create a file with distinct content.
- Concatenate all files into one and use the parameter value.
- For better display, I'm only showing the filenames here and not the whole paths.
- The
collect
operator should turn this into a list. - Don't worry too much about this, I'm again transforming the output to only display filenames and not the entire paths.
- Here, I want to show the content of the resulting file which is the second of the pair in the output.
Run the above workflow with:
which gives the following output. It looks like the combine
operator, when combining a single list of elements treats that just like a channel and forms the cartesian product with every element. There is also a warning about the input cardinality not matching the defined one in CAT
and indeed we can see in the output that only one file is written to the result while the others are ignored.
executor > local (8)
[32/a72ef8] process > CREATE (1) [100%] 3 of 3 ✔
[0a/dfd2e7] process > CAT (4) [100%] 5 of 5 ✔
baz.txt
bar.txt
foo.txt
[1, baz.txt, bar.txt, foo.txt]
[2, baz.txt, bar.txt, foo.txt]
[3, baz.txt, bar.txt, foo.txt]
[4, baz.txt, bar.txt, foo.txt]
[5, baz.txt, bar.txt, foo.txt]
baz.txt
Parameter: 1
baz.txt
Parameter: 5
baz.txt
Parameter: 2
baz.txt
Parameter: 3
baz.txt
Parameter: 4
WARN: Input tuple does not match input set cardinality declared by process `CAT`
Solution¶
Well, if a single list gets treated just like a channel, maybe we can nest that list such that we have a list with a single element that is also a list. I tried quite a few different ways:
-
Can we collect twice?
This does not work correctly. Just like in the problem, we get a flat list.
-
What if we place it into a list manually?
This yields an error
which makes sense since we place the collected variable (of type
DataflowVariable
) inside the literal list and thus it gets passed to ourCAT
process directly. -
Instead of
collect
there is alsotoList
...Same error
-
Then I got the correct advice:
The corresponding comment on Slack was:
Harshil Patel
Don't ask me why.
-
Turns out that the following combination also works.
So in full the solution looks as follows.
- Create a file with distinct content.
- Concatenate all files into one and use the parameter value.
- For better display, I'm only showing the filenames here and not the whole paths.
- Use the winning solution from above. The
toList
operator applied twice creates the nested list. - Don't worry too much about this, I'm again transforming the output to only display filenames and not the entire paths.
- Again, I want to show the content of the resulting file which is the second of the pair in the output.
Run the above workflow with:
This time, both the shape of the input for CAT
, as well as the content of the resulting files are as expected.
executor > local (8)
[0c/731285] process > CREATE (3) [100%] 3 of 3 ✔
[e0/670c78] process > CAT (5) [100%] 5 of 5 ✔
bar.txt
foo.txt
baz.txt
[1, [bar.txt, foo.txt, baz.txt]]
[2, [bar.txt, foo.txt, baz.txt]]
[3, [bar.txt, foo.txt, baz.txt]]
[4, [bar.txt, foo.txt, baz.txt]]
[5, [bar.txt, foo.txt, baz.txt]]
bar.txt
foo.txt
baz.txt
Parameter: 3
bar.txt
foo.txt
baz.txt
Parameter: 1
bar.txt
foo.txt
baz.txt
Parameter: 4
bar.txt
foo.txt
baz.txt
Parameter: 2
bar.txt
foo.txt
baz.txt
Parameter: 5
Alternative solutions¶
DataflowVariable value¶
We saw above that the following code caused an error because we are passing a groovyx.gpars.dataflow.DataflowVariable
to the process.
It is possible, though highly discouraged, to access a DataflowVariable
's inner value.
ch_input = ch_param.combine( [ CREATE.out.collect() ] ) // (1)
.map { first, second -> [first, second.val] }
- This combination generates pairs where the first element is the
val
and the second theDataflowVariable
containing the list.
Creating a list through transformation¶
In our problem statement we saw:
which created lists of four elements each. The parameter and the three files. We can transform this shape ourselves.
Done
Using combine and groupTuple¶
A very different approach is to first combine every parameter value with every file. This generates pairs of one value and one file. We can then group the pairs together as tuples.
- Create a file with distinct content.
- Concatenate all files into one and use the parameter value.
- For better display, I'm only showing the filenames here and not the whole paths.
- Use
combine
on the flat channels to generate pairs. Then collect tuples of files by grouping the pairs by their first element, the numeric value, withgroupTuple
. - Don't worry too much about this, I'm again transforming the output to only display filenames and not the entire paths.
- Again, I want to show the content of the resulting file which is the second of the pair in the output.
Run it
This generates the exact same solution. However, if you have a lot of elements in your channels this might perform slightly worse since you generate a lot more pairs first that you then have to group again.
executor > local (8)
[e9/c7a72b] process > CREATE (2) [100%] 3 of 3 ✔
[cb/44c510] process > CAT (5) [100%] 5 of 5 ✔
baz.txt
foo.txt
bar.txt
[1, [baz.txt, foo.txt, bar.txt]]
[2, [baz.txt, foo.txt, bar.txt]]
[3, [baz.txt, foo.txt, bar.txt]]
[4, [baz.txt, foo.txt, bar.txt]]
[5, [baz.txt, foo.txt, bar.txt]]
baz.txt
foo.txt
bar.txt
Parameter: 4
baz.txt
foo.txt
bar.txt
Parameter: 2
baz.txt
foo.txt
bar.txt
Parameter: 3
baz.txt
foo.txt
bar.txt
Parameter: 1
baz.txt
foo.txt
bar.txt
Parameter: 5