+2 votes
in Programming Languages by (40.5k points)
I want to find all duplicate rows in a DataFrame. Which function should I use for it?

1 Answer

+1 vote
by (348k points)
selected by
 
Best answer

Pandas's duplicated() function can be used to find all duplicate rows. The function returns boolean Series (True/False) denoting duplicate rows. By default, it marks duplicates as True except for the first occurrence. However, you can change this behavior using the parameter 'keep'.

Here is an example:

Mark duplicates as True except for the first occurrence.

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3,1,3], 'B':[10,20,30,10,30]})
>>> df.duplicated()
0    False
1    False
2    False
3     True
4     True
dtype: bool

Mark duplicates as True except for the last occurrence.

>>> df.duplicated(keep='last')
0     True
1    False
2     True
3    False
4    False
dtype: bool
>>>


...