Pandas merge column duplicate and sum value

Question

How to merge duplicate column and sum their value?

What I have

A   30
A   40
B   50

What I need

A   70
B   50

DF for this example

d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50]}
df = pd.DataFrame(data=d)
df

Esmailian · Answer 1 · 2019-03-10T07:42:11.443

16

You may use

df2 = df.groupby(['address']).sum()

or

df2 = df.groupby(['address']).agg('sum')

If there are columns other than balances that you want to peak only the first or max value, or do mean instead of sum, you can go as follows:

d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50], 'sessions': [2, 3, 4]} 
df = pd.DataFrame(data=d) 
df2 = df.groupby(['address']).agg({'balances': 'sum', 'sessions': 'mean'})

That outputs

         balances   sessions
address       
A              70       2.5  
B              50       4.0

You may add as_index = False to groupby arguments to have:

  address  balances  sessions
0       A        70       2.5
1       B        50       4.0

edited Mar 10 '19 at 07:42

answered Mar 10 '19 at 07:08

Esmailian

9,147
2
31
47

what if for every value in`adresses` the `sessions` value is the same and you just want to keep it? `{'address': ["A", "A", "B"], 'balances': [30, 40, 50], 'sessions': ["V","V","K"]}` to `{'address': ["A", "B"], 'balances': [70, 50], 'sessions': ["V","K"]}` – Tobias Kolb Jan 30 '20 at 17:24
2

@TobiasKolb in that case you can use `'sessions': 'first'` which keeps one of two (or more) `"V"`s – Esmailian Jul 17 '20 at 17:24
This answer is more useful than the accepted one! – spectre Dec 30 '21 at 06:32

score 7 · Accepted Answer · answered Feb 27 '20 at 12:53

7

In another case when you have a dataset with several duplicated columns and you wouldn't want to select them separately use:

df.groupby(by=df.columns, axis=1).sum()

answered Feb 27 '20 at 12:53

Denisa

86
1
2

Pandas merge column duplicate and sum value

2 Answers2

Linked