Joining channels using a map as key¶
Problem¶
You want to join
two channels using identical maps as the key to join by.
examples/join-on-map-fails-resume/problem.nf | |
---|---|
- A channel that contains a pair of a map with sample meta information and a number.
- A channel that contains a pair of a map with sample meta information and a list of file paths.
- The desired joined channel should contain the map, the number, and the file paths.
This will work perfectly fine when you execute your workflow from the beginning. However, when you resume your workflow, you will likely see that from the point of such a join statement, many samples are dropped from further processing since the maps no longer evaluate as being equal and thus the tuples are discarded as being incomplete. You can avoid elements being silently discarded by using the failOnMismatch
option.
examples/join-on-map-fails-resume/problem.nf | |
---|---|
Solution¶
Since maps, as mutable objects, may fail to evaluate as being equal after resuming1, we can pull out an immutable value from the maps and join on them. Your map likely contains an id
key which is a unique string or integer that is equal in both channels to be joined. This requires a couple of channel transformations such that we end up with the resulting channel containing the desired map as first element, followed by the remaining elements from both joined channels.
examples/join-on-map-fails-resume/solution.nf | |
---|---|
- We prepend the
id
key which contains an immutable value. - We prepend the
id
key which contains an immutable value. - After the join that occurred on the
id
value, we remove that element and also drop one of the otherwise identical maps.
In order to generally, safely join two channels on a map key, I therefore propose you use the following function which was developed together with Mahesh Binzer-Panchal
-
My current hypothesis is that when you start a new pipeline, the different channels point to the same map object, whereas when you resume, different instances of the map with the same content are created. Then, I guess the comparison carried out by nextflow to join channels, is based on the object identity rather than comparing all key, value pairs. ↩