11

How to merge duplicate column and sum their value?

What I have

A   30
A   40
B   50

What I need

A   70
B   50

DF for this example

d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50]}
df = pd.DataFrame(data=d)
df

2 Answers2

16

You may use

df2 = df.groupby(['address']).sum()

or

df2 = df.groupby(['address']).agg('sum')

If there are columns other than balances that you want to peak only the first or max value, or do mean instead of sum, you can go as follows:

d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50], 'sessions': [2, 3, 4]} 
df = pd.DataFrame(data=d) 
df2 = df.groupby(['address']).agg({'balances': 'sum', 'sessions': 'mean'})

That outputs

         balances   sessions
address       
A              70       2.5  
B              50       4.0

You may add as_index = False to groupby arguments to have:

  address  balances  sessions
0       A        70       2.5
1       B        50       4.0
Esmailian
  • 9,147
  • 2
  • 31
  • 47
  • what if for every value in`adresses` the `sessions` value is the same and you just want to keep it? `{'address': ["A", "A", "B"], 'balances': [30, 40, 50], 'sessions': ["V","V","K"]}` to `{'address': ["A", "B"], 'balances': [70, 50], 'sessions': ["V","K"]}` – Tobias Kolb Jan 30 '20 at 17:24
  • 2
    @TobiasKolb in that case you can use `'sessions': 'first'` which keeps one of two (or more) `"V"`s – Esmailian Jul 17 '20 at 17:24
  • This answer is more useful than the accepted one! – spectre Dec 30 '21 at 06:32
7

In another case when you have a dataset with several duplicated columns and you wouldn't want to select them separately use:

df.groupby(by=df.columns, axis=1).sum()
Denisa
  • 86
  • 1
  • 2