1

How do I get from a dataframe with multiple columns that have similar values and need to be merged:

df1 = pd.DataFrame({'firstcolumn':['ab', 'ca', 'da', 'ta','la'],
'secondcolumn':['ab', 'ca', 'ta', 'da', 'sa'], 'index':[2011,2012,2011,2012,2012]})

To a crosstab that tells me for each year how many values were collected?

Index ab ca da ta sa la
2011 2  0  1  1  0  0
2012 0  2  1  1  1  1

Also, how could then plot the table?

Dawny33
  • 8,226
  • 12
  • 47
  • 104
Nicola
  • 121
  • 7
  • 1
    I'm looking at your desired example output, and it doesn't make any sense. There are two counts of 'ca' but they are both in 2012, not 2011. The counts of 'ta' similarly should be in 2011, not 2012, if I am reading your dataset correctly. If I am not, please explain. – kingledion Jan 10 '18 at 17:54
  • Thank you, I modified accordingly to reflect the output. Let me know if now it makes sense. My objective is to organise multiple categorical variables (first and second columns) into a value count per year. – Nicola Jan 10 '18 at 20:45

1 Answers1

1

It can be done like:

import pandas as pd
melted = pd.melt(df1, id_vars=["index"], var_name="Var", value_name="Score").dropna()
table=pd.crosstab(index=melted['index'], columns=melted['Score'])
%matplotlib inline
table.plot.bar() #for simple axes subplots
Nicola
  • 121
  • 7