+3 votes
in Programming Languages by (17.8k points)

I am trying to save a very large compressed sparse matrix on disk as .npy file, but my code is giving error: SystemError: error return without exception set. How can I save a large .npy file? I am using the following code:

    data = np.array([1]*len(rownums))
    X = sparse.csr_matrix((data, (rownums, colnums)), shape=(total_row, total_col))
    # save the files for future use as it takes hours to generate X
    print ('Saving the compresed sparse matrix to .npy file... ')
    w_csr_file = wv.csr_file + str(step)
    np.save(w_csr_file, X) #save the data

The length of rownums is more than a billion. Here are the error details:

Traceback (most recent call last):
  File "find_covariates.py", line 277, in <module>
  File "find_covariates.py", line 268, in main
    create_sparse_matrix_file(set_of_covariates, descendant_ancestor_dict)
  File "find_covariates.py", line 212, in create_sparse_matrix_file
    save_compressed_sparse_matrix_to_file(rownums, colnums, total_row, total_col, step)
  File "find_covariates.py", line 158, in save_compressed_sparse_matrix_to_file
    np.save(w_csr_file, X) #save the data
  File "/usr/lib64/python2.7/site-packages/numpy/lib/npyio.py", line 509, in save
  File "/usr/lib64/python2.7/site-packages/numpy/lib/format.py", line 576, in write_array
    pickle.dump(array, fp, protocol=2, **pickle_kwargs)
SystemError: error return without exception set

1 Answer

0 votes
by (48.7k points)

The problem seems to be with the length of the variable 'rownums'. Numpy.save() cannot save compressed sparse matrix of very large size. I am not sure about the max size that is allowed in Python 2, but Python 3 does give an error if the size if more than 4GB. I would recommend saving the data in multiple .npy files instead of just one. DIVIDE and CONQUER should work :)