+3 votes
in Programming Languages by (74.2k points)

I have predicted probabilities of test data for class 0 and class 1 in a 2D Numpy array. The true labels of the test data are in a 1D Numpy array. For all class 0 and class 1 records, I want to calculate the mean of class 0 and class 1 probabilities. I can use the "for" loop for this. Is there any Pythonic way for this calculation?

E.g.

psx =

array([[0.3, 0.7],

       [0.6, 0.4],

       [0.1, 0.9],

       [0.8, 0.2]])

y_test=np.array([1,0,0,1])

The answer should be: array([0.35, 0.45])

1 Answer

+1 vote
by (349k points)
selected by
 
Best answer

You use the mean() function of Numpy to calculate the mean. Instead of using the "for" loop, you can try slicing operation on Numpy array.

Here is an example to show how to use slicing:

>>> import numpy as np
>>> psx=np.array([[0.3,0.7],[0.6,0.4],[0.1,0.9],[0.8,0.2]])
>>> psx
array([[0.3, 0.7],
       [0.6, 0.4],
       [0.1, 0.9],
       [0.8, 0.2]])
>>> y_test=np.array([1,0,0,1])
>>> y_test
array([1, 0, 0, 1])
>>> K = len(np.unique(y_test))
>>> K
2
>>> th = np.asarray([np.mean(psx[:, k][y_test == k]) for k in range(K)])
>>> th
array([0.35, 0.45])

If you want to calculate n-percentile, you can try the following code:

>>> np.asarray([np.percentile(psx[:, k][y_test == k], 95, axis=0) for k in range(K)])
array([0.575, 0.675])


...