+3 votes
in Programming Languages by (15.7k points)
What function should I use to randomly select some rows from a pandas dataframe?

1 Answer

0 votes
by (27.1k points)

You can use the sample() function to select the "n" rows from a pandas dataframe. By default, it selects unique rows. If you do not want to select unique rows, you can use "replace=True" as a parameter. You can also use random_state for reproducibility.

Example:

>>> import pandas as pd
>>> df = pd.DataFrame({'A':[1,2,3,4,5,6,7,8], 'B':[10,20,30,40,50,60,70,80], 'C':[11,22,33,44,55,66,77,88]})
>>> df
   A   B   C
0  1  10  11
1  2  20  22
2  3  30  33
3  4  40  44
4  5  50  55
5  6  60  66
6  7  70  77
7  8  80  88
>>> df.sample(5, replace=True)
   A   B   C
2  3  30  33
1  2  20  22
3  4  40  44
2  3  30  33
4  5  50  55
>>> df.sample(5, replace=False)
   A   B   C
5  6  60  66
6  7  70  77
1  2  20  22
0  1  10  11
7  8  80  88

...