+4 votes
in Programming Languages by (40.5k points)
I want to randomly select some rows from a dataframe. Is there any function for this operation?

1 Answer

+3 votes
by (74.2k points)
selected by
 
Best answer

You can use the sample() method of pandas DataFrame to randomly select a set of items from an axis. If you do not specify an axis, it will choose rows by default. You can specify the number of rows you want to select through parameter 'n' or fraction of rows through parameter 'frac'.

Here is an example:

>>> import numpy as np

>>> import pandas as pd

>>> df = pd.DataFrame({'A': np.random.randint(10,50, 10), 'B': np.random.randint(20,60,10)})

>>> df

    A   B

0  49  58

1  30  26

2  11  26

3  20  59

4  32  47

5  21  36

6  45  22

7  38  37

8  39  55

9  17  30

To select 20% of the rows

>>> df.sample(frac=0.2)

    A   B

5  21  36

1  30  26

To select 4 rows

>>> df.sample(n=4)

    A   B

9  17  30

8  39  55

5  21  36

4  32  47


...