I recently wrote this as a list of what I have seen / can think of as problem sources that make it hard to reproduce (replicate?) an experiment. I think I have seen most of them, except for hardware errors.
My thought was this: A typical computer vision experiment might run a training on a GPU for many hours / several days. Even if a bit-flip happens only once every $10^9$ FLOPs, that would be 11500 errors per second with an Nvidia Titan GTX 1080 Ti. I don't know how this error would affect later calculations (how the problem is numerically conditioned).
So: Are there any reports on Hardware errors affecting experiments?
(Blog posts, journal articles, posters?)