I, like you, took it upon myself to change career paths into the growing field of data science. As a bit of background, I was working in a neuroscience research lab studying the effects of allelic variation on protein transport in rodent models of alzheimer's disease and post-traumatic stress disorder. I was a competent R, bash, and Matlab hacker and knew a bit of C and Java. I had a change of heart when I started applying for neuroscience PhD programs.
Over the next year, i took a few grad-level stats courses and got some of my programming skill back in order. Last year, I started an MS program for CS with the intent of 1) getting an job as a data scientist in industry 2) learning how to think algorithmically / mathematically about data 3) really sharpen my fundamental CS skills 4) have some fun in the process.
I won't list specific courses that helped, but the major winning topics for me were:
Algorithms
- both fundamental algos
- complexity analysis, sorting and searching, graph algorithms, dynamic programming, randomized algos, etc. Tim Roughgarden has a great course for this.
- data mining/massive dataset/stream algorithms
- locality sensitive hashing, sketch algorithms, kd/ball trees, reservoir sampling, sliding window methods, etc.
Databases/Data streams:
- noSQL, SQL, hadoop, etc. Working with them forces you to start appreciating that most of the data you will work with absolutely doesn't fit in memory and requires unique methods to not only extract information from it, but just to work with it and examine it. Learn how to use one by building a web scraper or data harvester to populate a database (or however else you want...)
Machine learning
- Learning the fundamental methods in the field, e.g., tree-based methods, optimization, neural networks, svms, regression, markov chains, graphical methods, ensemble methods, overfitting, regularization, clustering, k-means, knn, etc. As mentioned, Andrew Ng's coursera class is probably the best open-source solution for this one.
- EDIT: The Coursera course is good iff you supplement it with stuff from the real version (lectures, notes). I think the coursera class is a bit light and a nice overview of topics in ML, but doesn't go nearly deep enough into the mechanics of those topics. Tom Mitchell's book is also a good resource.
data visualization
- nothing terribly fancy is totally necessary, but knowing how to visualize multidimensional data is quite useful. Learn a good plotting package and perhaps experiment with other tech, like D3.js or mapping visualizations. If you get really good at this, you'll probably have a great job forever.
"Real-world" experience
- I learned about as much in three months at a big web company doing data science than i did in the previous 1.5~ years of grad school and self-study. This can be approximated by doing Kaggle competitions or the like, but honestly, data in the wild is considerably harder to work with than most of what is on Kaggle (note that the microsoft malware detection project or some of the computer-vision projects are much more useful for learning to work with messy data that is also rather "big").
Notice that i didn't mention anything like "learn R it's the best" or "learn python it's way betterz than R". I use Python, C, MySQL, MongoDB, and R for most of my work and current research (though I really prefer the python ecosystem these days). I'm sure that this will change in the future.
This falls a bit outside of your question, but perhaps the most critical thing about being an industry data scientist is the ability to work in a mostly unsupervised manner and communicate results/methodology clearly to a team of non-experts. Having a background in scientific research helps with this, as the questions you are trying to answer are difficult, often unstructured, and are often at the precipice of the knowledge ledge for your domain. My friends, acquaintances, and past coworkers working as data scientists in industry were nearly all ex-academic-path folks with MS or PhDs and at least a few publications under their belts. I absolutely do not believe this is a strict requirement and if i were in a hiring position, I'd never elect to filter out someone just because they didn't have an advanced degree, but the industry job postings do seem to be trending towards requiring an MS/PhD or equivalent experience.
Bear in mind, all of this comes from just some dude from nowhere who hasn't been directly in the field all that long but whose transition seems to be going well.