I'm working on a Kaggle challenge where some variables are represented by rows instead of columns (Telstra Network Disruption). I am currently searching for the equivalent of gather(), separate() and spread(), which can be found in R tidyr tool.
Asked
Active
Viewed 1.6k times
25
-
1https://github.com/pydata/pandas/issues/10109 – Emre Mar 02 '16 at 16:29
4 Answers
8
I'd start with the melt() function in pandas. I wrote an article about it:
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Tidy_Data_In_Python?lang=en
4
R's gather() essentially goes from wide to long. So,
- check pandas page for how to use
pandas.wide_to_long(), - check this blog for a discussion on getting an elegant
gather-like function in Python.
ximiki
- 943
- 1
- 7
- 15
2
I tried to syntactically mimic the tidyr package in python in a package called tidypython. I made it compatible with the dplython package, which includes usage of the >> operator for chaining commands.
It hasn't been fully tested, but should work pretty well:
https://github.com/durrantmm/tidypython
Let me know if it works for you.
Matt Durrant
- 21
- 1