+2 votes
in Programming Languages by (349k points)

I created a Compressed Sparse Row matrix using csr_matrix and then saved that matrix using numpy.save() function on the disk to reuse it in future because the creation of compressed sparse matrix takes approx.10 hours due to the enormous size of the data. Everything went okay so far. But, when I loaded the saved file using numpy.load() function, it changed the type of the data to an array object from a matrix. Because of this conversion, I am not able to use this data in my classifier. Here is the details of the error:

>>> X=np.load('classifierdata.npy')
>>> Y=np.load('labeldata.npy')
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\pkumar81\Anaconda2\lib\site-packages\sklearn\model_selection\_split.py", line 1689, in train_test_split
    arrays = indexable(*arrays)
  File "C:\Users\pkumar81\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 206, in indexable
    check_consistent_length(*result)
  File "C:\Users\pkumar81\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 177, in check_consistent_length
    lengths = [_num_samples(X) for X in arrays if X is not None]
  File "C:\Users\pkumar81\Anaconda2\lib\site-packages\sklearn\utils\validation.py", line 126, in _num_samples
    " a valid collection." % x)
TypeError: Singleton array array(<4020784x50626 sparse matrix of type '<type 'numpy.int64'>'
        with 151426374 stored elements in Compressed Sparse Row format>, dtype=object) cannot be considered a valid collection.

When I checked the type of X, it gave me the following:

>>> X
array(<4020784x50626 sparse matrix of type '<type 'numpy.int64'>'
        with 151426374 stored elements in Compressed Sparse Row format>, dtype=object)

The actual type of X should be as follows:

>>> X
<4020784x50626 sparse matrix of type '<type 'numpy.int64'>'
        with 151426374 stored elements in Compressed Sparse Row format>

So, the real culprit behind the error is the conversion of X from a matrix to an array object. Is there any way to convert array object to a matrix?

1 Answer

+2 votes
by (71.8k points)
selected by
 
Best answer

Try tolist() function on X. It will convert array object to the desired matrix format. Check the following example:

>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 1, 1, 1, 1, 1])
>>> csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 1],
       [0, 0, 1],
       [1, 1, 1]])
>>> t=csr_matrix((data, (row, col)), shape=(3, 3))
>>> np.save('tfile',t)
>>> l=np.load('tfile.npy')
>>> l
array(<3x3 sparse matrix of type '<type 'numpy.int32'>'
        with 6 stored elements in Compressed Sparse Row format>, dtype=object)
>>> l.tolist()
<3x3 sparse matrix of type '<type 'numpy.int32'>'
        with 6 stored elements in Compressed Sparse Row format>
>>>


...