I get about this same utilization rate when I train models using Tensorflow. The reason is pretty clear in my case, I'm manually choosing a random batch of samples and calling the optimization for each batch separately.
That means that each batch of data is in main memory, it's then copied into GPU memory where the rest of the model is, then forward/back propagation and update is performed in-gpu, then execution is handed back to my code where I grab another batch and call optimize on it.
There's a faster way to do that if you spend a few hours setting up Tensorflow to do batch loading in parallel from pre-prepared TF records.
I realize you may or may not be using tensorflow under keras, but since my experience tends to produce very similar utilization numbers, I'm going out on a limb by suggesting that there's a reasonably likely causal link to draw from these correlations. If your framework is loading each batch from main memory into the GPU without the added efficiency/complexity of asynchronous loading (which the GPU itself can handle), then this would be an expected result.