With all due respect, you need to hand off your design problem to someone who knows what he's doing. Building a S/H which will respond to a nanosecond pulse is a major undertaking, and if you have no experience along these lines (as you clearly don't), frankly I see no hope that you will succeed.
If you still want to have heartburn, I'd advise you use a high speed A/D converter like this one followed by a very high-speed digital peak detector instantiated in a high-speed FPGA.
You'll note, I hope that the ADC alone will cost you about 4,000 bucks, and the FPGA won't be cheap either. Plus, you'll have to design a printed circuit board to run at 4 GHz, which I'll bet is not currently in your skill set. Breadboards simply will not work. Don't even think about it. Oh yes, and the FPGA part is pretty clear, since FPGAs will easily handle 4 GHz. You might look into the Xilinx Virtex line.
EDIT - You have not specified your pulse frequency. If it's low enough, you could probably use a (very good) GHz digital oscilloscope configured as a data collection instrument. It would be configured as single trace, with remote access. When it detects a pulse, the host is notified and the trace data transferred. The host would then reset the scope and analyze the trace data to extract the peak amplitude. This would be somewhat less accurate than the ADC/FPGA route, simply because scopes don't bother with high-resolution ADCs. This approach would completely avoid the learning curve associated with roll-your-own hardware, but I have no idea if the system throughput would be adequate. If the scope has a deep memory, the data transfer would take a ferociously long time (a 1 Ms buffer at 8 bit/sample will take about 20 msec via USB 2).
FURTHER EDIT - Your op amp is too slow. While the 6172 does have the output capability to charge the output cap in the time required, it does not have the bandwidth. Furthermore, your diode has a trr of 4 nsec nominal, which is pretty much guaranteed to make closing the loop impossible in 10 nsec.
Your simulation problems are caused by a simple fact - your timebase is far too long. Try rerunning it a 100 nsec, and you'll see all sorts of problems.