+1 vote
in Programming Languages by (8.2k points)
I want to fetch some data from a RDS file by loading the data from the file into a Pandas dataframe. What Python library should I use?

1 Answer

0 votes
by (16.1k points)

You can use python package 'pyreadr' to read a .rds file. To install this package, use the following command:

$ sudo pip install pyreadr

Once pyreadr is installed, you can load the data from your rds file to pandas dataframe. Here is an example:

>>> import pyreadr
>>> import numpy as np
>>> data = pyreadr.read_r('104300.rds')
>>> df=data[None]
>>> df
         id1    Y subset  ...  YKG1_CENTRAL_NERVOUS_SYSTEM  ZR751_BREAST  ZR7530_BREAST
0          1  neg   test  ...                    -0.165194     -0.150020      -0.088697
1          2  neg   test  ...                    -0.166452     -0.155515      -0.090676
2          3  neg   test  ...                    -0.162092     -0.150307      -0.090390
3          4  neg   test  ...                     0.000000      0.000000       0.000000
4          5  neg   test  ...                     0.619804     -0.156687      -0.090945
...      ...  ...    ...  ...                          ...           ...            ...
20232  20233  neg   test  ...                    -0.064905     -0.077581      -0.059459
20233  20234  neg   test  ...                     0.000000      0.000000       0.000000
20234  20235  neg  train  ...                     0.019277     -0.048828      -0.041774
20235  20236  neg   test  ...                     0.000000      0.000000       0.000000
20236  20237  neg   test  ...                    -0.118863      0.190226      -0.040727

[20237 rows x 22552 columns]
>>> df["Y"]
0        neg
1        neg
2        neg
3        neg
4        neg
        ...
20232    neg
20233    neg
20234    neg
20235    neg
20236    neg
Name: Y, Length: 20237, dtype: category
Categories (2, object): [neg, pos]

...