# Python: Find the mean of rows in a given column of a Numpy array based on some criteria

I have predicted probabilities of test data for class 0 and class 1 in a 2D Numpy array. The true labels of the test data are in a 1D Numpy array. For all class 0 and class 1 records, I want to calculate the mean of class 0 and class 1 probabilities. I can use the "for" loop for this. Is there any Pythonic way for this calculation?

E.g.

psx =

array([[0.3, 0.7],

[0.6, 0.4],

[0.1, 0.9],

[0.8, 0.2]])

y_test=np.array([1,0,0,1])

The answer should be: array([0.35, 0.45])

by (45.8k points)

You use the mean() function of Numpy to calculate the mean. Instead of using the "for" loop, you can try slicing operation on Numpy array.

Here is an example to show how to use slicing:

>>> import numpy as np
>>> psx=np.array([[0.3,0.7],[0.6,0.4],[0.1,0.9],[0.8,0.2]])
>>> psx
array([[0.3, 0.7],
[0.6, 0.4],
[0.1, 0.9],
[0.8, 0.2]])
>>> y_test=np.array([1,0,0,1])
>>> y_test
array([1, 0, 0, 1])
>>> K = len(np.unique(y_test))
>>> K
2
>>> th = np.asarray([np.mean(psx[:, k][y_test == k]) for k in range(K)])
>>> th
array([0.35, 0.45])

If you want to calculate n-percentile, you can try the following code:

>>> np.asarray([np.percentile(psx[:, k][y_test == k], 95, axis=0) for k in range(K)])
array([0.575, 0.675])