17

I have two data frames df1 and df2. For my analysis, I need to remove rows from df1 that have identical column values (Email) in df2?

>>df1
   First  Last  Email
0 Adam   Smith  email@email.com
1 John   Brown  email2@email.com
2 Joe    Max    email3@email.com
3 Will   Bill   email4@email.com


>>df2
  First  Last   Email
0 Adam   Smith  email@email.com
1 John   Brown  email2@email.com
user3503711
  • 103
  • 3
a_a_a
  • 817
  • 2
  • 8
  • 11

2 Answers2

17

You can try this:

cond = df1['Email'].isin(df2['Email'])
df1.drop(df1[cond].index, inplace = True)

>>df1
    First   Last    Email
2   Joe     Max     email3@email.com
3   Will    Bill    email4@email.com
Mohit Motwani
  • 601
  • 1
  • 7
  • 23
  • This method does not work if you don't have a unique column (email in this case). In this case you can join two or more columns into one with `df1[['First', 'Last']].agg('-'.join, axis=1)` if they are strings. – alvitawa Apr 03 '22 at 12:47
9

Simpler to use isin() with dropna()

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html

df1[~df1.isin(df2)].dropna()
Idodo
  • 191
  • 2
  • 3
  • Quick question, what does the tilde stands for? – Gerardo Jaramillo Apr 23 '21 at 18:23
  • It's a negation. Will turn all True to False and vice versa. – Idodo Apr 26 '21 at 08:50
  • 1
    This is wrong. It compares the values one at a time, a row can have mixed cases. Even when a row has all true, that doesn't mean that same row exists in the other dataframe, it means the values of this row exist in the columns of the other dataframe but in multiple rows. – anishtain4 May 13 '22 at 17:08