Every time backward() is called, it performs a topological sort. On a very large network with multiple backward passes, this could be computationally very expensive. Does it make sense to cache the result of the topological sort just once and use it every subsequent time, since the structure of the network does not change between multiple forward and backward passes?
Every time
backward()is called, it performs a topological sort. On a very large network with multiple backward passes, this could be computationally very expensive. Does it make sense to cache the result of the topological sort just once and use it every subsequent time, since the structure of the network does not change between multiple forward and backward passes?