Spent several days working out a cool optimization for a denoising diffusion model (addressing the noise continuously, modeling its full distribution at every step of t instead of working with quantized cumulative sums) and have now discovered why people don’t do it this way.
Turns out the derivative of sqrt(x) does some colorful things around x=0. I prefer not to pry, but it seems not to want to be bothered around there. I’m going to leave it be.