Yes, the scaling on width wouldn't work for a delta function but it could work on a pulse function approximating a delta. The scaling is more about the time delay between discovery and production. It sounds like the Shock model models that time delay in a more elaborate way and is probably more accurate.

The delay and width scaling comes about from convolutions in the shock model. Even taking the Lorentzian example and the Gaussian examples, you get summed (not multiplied) scalings if convolution is at the root. Two infinite Gaussians convoluted together give a new width that is sqrt(w12+w22) and two infinite Lorentzians convoluted together goes as w1+w2. That's at least what I think is at the root of the scalings.