[Python] How to find duplicate rows in a Pandas DataFrame?

Question

[Python] How to find duplicate rows in a Pandas DataFrame?

1 Answer

answered Jun 24, 2022 by pkumar81 (348k points)
selected Oct 21, 2022 by pkumar81

Best answer

Pandas's duplicated() function can be used to find all duplicate rows. The function returns boolean Series (True/False) denoting duplicate rows. By default, it marks duplicates as True except for the first occurrence. However, you can change this behavior using the parameter 'keep'.

Here is an example:

Mark duplicates as True except for the first occurrence.

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3,1,3], 'B':[10,20,30,10,30]})
>>> df.duplicated()
0    False
1    False
2    False
3     True
4     True
dtype: bool

Mark duplicates as True except for the last occurrence.

>>> df.duplicated(keep='last')
0     True
1    False
2     True
3    False
4    False
dtype: bool
>>>

[Python] How to find duplicate rows in a Pandas DataFrame?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories