How to find and remove duplicate rows from pandas dataframe

Question 1

How can I find duplicate rows in a dataframe? Also, how to delete those duplicate rows and keep only unique rows in the dataframe?

Question 2

To find the duplicate rows, you can use df.duplicated() and to remove the duplicate rows, you can use df.drop_duplicates(). Check the following example:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':[1,2,3,4,1,2,5],'b':[11,12,13,14,11,12,15]})
>>> df
   a   b
0 1 11
1 2 12
2 3 13
3 4 14
4 1 11
5 2 12
6 5 15
To find the duplicate rows:
>>> df[df.duplicated()]
   a   b
4 1 11
5 2 12

To delete the duplicate rows:
>>> df.drop_duplicates()
   a   b
0 1 11
1 2 12
2 3 13
3 4 14
6 5 15

pkumar81 · Answer 1 · 2019-10-17T09:28:17+0000

To find the duplicate rows, you can use df.duplicated() and to remove the duplicate rows, you can use df.drop_duplicates(). Check the following example:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':[1,2,3,4,1,2,5],'b':[11,12,13,14,11,12,15]})
>>> df
   a   b
0 1 11
1 2 12
2 3 13
3 4 14
4 1 11
5 2 12
6 5 15
To find the duplicate rows:
>>> df[df.duplicated()]
   a   b
4 1 11
5 2 12

To delete the duplicate rows:
>>> df.drop_duplicates()
   a   b
0 1 11
1 2 12
2 3 13
3 4 14
6 5 15

How to find and remove duplicate rows from pandas dataframe

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories