Working on what could often be called "medium data" projects, I've been able to parallelize my code (mostly for modeling and prediction in Python) on a single system across anywhere from 4 to 32 cores. Now I'm looking at scaling up to clusters on EC2 (probably with StarCluster/IPython, but open to other suggestions as well), and have been puzzled by how to reconcile distributing work across cores on an instance vs. instances on a cluster.
Is it even practical to parallelize across instances as well as across cores on each instance? If so, can anyone give a quick rundown of the pros + cons of running many instances with few cores each vs. a few instances with many cores? Is there a rule of thumb for choosing the right ratio of instances to cores per instance?
Bandwidth and RAM are non-trivial concerns in my projects, but it's easy to spot when those are the bottlenecks and readjust. It's much harder, I'd imagine, to benchmark the right mix of cores to instances without repeated testing, and my projects vary too much for any single test to apply to all circumstances. Thanks in advance, and if I've just failed to google this one properly, feel free to point me to the right answer somewhere else!