+2 votes
in Programming Languages by (8.1k points)
I created a very large CSR_matrix (~50 million rows), but when I try to split it into train and test sets, I get 'Segmentation Faults' error. They machine I am using has ~800gb RAM. I checked the RAM usage and found that it never reached the memory limit, so I do not anticipate it to be memory error. I tried different options, but so far no success. I am using Python 3.6.

1 Answer

0 votes
by (15.8k points)

Check the version of Numpy and Scipy. The older versions of Numpy and Scipy do not handle such large CSR matrix properly and they give 'Segmentation Fault' error. You have enough RAM on your machine, so it may not be RAM issue. Upgrade numpy/scipy and it should solve the issue.

Use the following command to upgrade the required python libraries...

 pip install --upgrade numpy scipy matplotlib ipython jupyter pandas sympy nose

...