What you appear to have is a partial order over a set of tasks to be executed, where predecessor tasks have to complete before successor tasks can run.
If you have a measure of the cost of task T2's execution, and it is less than the scheduling overhead, then you can consider optimizing it out.
You can easily do this if it has only one predecessor T1 (just insert its code into the end of the predecessor T1, and link the predecessor T1 to the successors of T2) or only one successor T3 (just insert its code into the beginning of the successor T3, and link the predecessors of T2 to T3).
Assume you have
(;| 1 (<< 2 4) a
2 b
3 (<< 4) c
4 d )
(a partial order PARLANSE program with tasks 1 2 3 4 consisting of work a b c d respectively, with task 1 occurring before tasks 2 and 4, and task 3 before 4. The funny operator ";|" is motivated by partial orders mixing serial ";" and parallel "|" computations. The funny operator (<< n m) means "runs earlier in time than n and m".
Note that tasks 1 and 3 can run in parallel; so can 2 and 3 as well as 2 and 4. [This is the smallest partial order one can express that isn't purely parallel].
if b is too small you can optimize this to:
(;| 1 (<< 4) (;; a b)
3 (<< 4) c
4 d )
where a and b are executed serially.
If it has multiple predecessors and successors, it isn't quite so easy. What you can take advantage of is the fact that any partial order computation has a number of perfectly legal linearizations. In this case you essentially want to collect a set of too-cheap tasks that are linear in one of those sequences, and replace them by their linearization.
Imagine task 4 is too small. Because a b c d is valid linearization of the partial order, you can combine tasks c and d:
(;| 1 (<< 2 3) a
2 b
3 (;; c d) )
Taken to an extreme, this process would serialize all the computations, if their cost is tool small, and that's exactly what you want in that case.