+1 vote
in Programming Languages by (8.1k points)
How can I find duplicate rows in a dataframe? Also, how to delete those duplicate rows and keep only unique rows in the dataframe?

1 Answer

0 votes
by (15.9k points)

To find the duplicate rows, you can use df.duplicated() and to remove the duplicate rows, you can use df.drop_duplicates(). Check the following example:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':[1,2,3,4,1,2,5],'b':[11,12,13,14,11,12,15]})
>>> df
   a   b
0  1  11
1  2  12
2  3  13
3  4  14
4  1  11
5  2  12
6  5  15

To find the duplicate rows:

>>> df[df.duplicated()] 
   a   b
4  1  11
5  2  12

To delete the duplicate rows:
>>> df.drop_duplicates()
   a   b
0  1  11
1  2  12
2  3  13
3  4  14
6  5  15

...