Machine learning is inherently data intensive, and typical ML algorithms are massively data-parallel. Therefore, even when developing new algorithms, high-level mathy languages (like Python, R, Octave) can be reasonably fast if you are willing to describe your algorithm in terms of standard operations on matrices and vectors.
On the other hand, for deeper exploration of fundamental concepts it can be more interesting to treat individual components as objects for which you want to conceptualize and visualize their internal state and interactions. This is a case where C++ may shine. Using C++, of course, means that a compiler will attempt to optimize your execution speed. Additionally, it opens the door to straightforward multi-core execution with OpenMP (or other available threading approaches).
C++ is a high level language -- not inherently more verbose or tedious than Python for algorithm development. The biggest challenges for working with C++ are:
- A more anarchic library ecosystem means a bigger effort for choosing and integrating existing components.
- Less stable language rules (or the interpretation thereof) means that something you create today might not compile a few years down the road (due to compiler upgrades).
Consider, also, that TensorFlow documentation identifies some benefits of using C++ over Python for certain low-level cases. See TensorFlow: Create an op.
Low-level coding for GPU acceleration is an entirely different can of worms with very limited language options. This is not something to be concerned about until after you have a well-defined custom algorithm that you want to super-optimize. More likely, you would be better off using a framework (like TensorFlow) to handle GPU interactions for you.
For exploratory visualization purposes, don't discount the interactive power of JavaScript, which is also comparatively fast: