Spent several days working out a cool optimization for a denoising diffusion model (addressing the noise continuously, modeling its full distribution at every step of t instead of working with quantized cumulative sums) and have now discovered why people don’t do it this way.
Turns out the derivative of sqrt(x) does some colorful things around x=0. I prefer not to pry, but it seems not to want to be bothered around there. I’m going to leave it be.
ML nonsense
Very mad at whoever did this to me.