3

I have some data that looks like this;

data.head()

    stock  date  binNum    volume
0  stock0  d120       2  249500.0
1  stock0  d120       3   81500.0
2  stock0  d120       4   79000.0
3  stock0  d120       5  244000.0
4  stock0  d120       6  175000.0

I can get the average volume for each bin across all days for a particular stock with the following code;

stock0 = data[(data['stock'] == 'stock0')]
binGroups = stock0[['binNum', 'volume']].groupby('binNum', sort=False)
stock0vol = binGroups.aggregate({'volume': np.mean}).reset_index()
stock0vol.head()

   binNum         volume
0       2  174095.238095
1       3  100428.571429
2       4   79880.952381
3       5   73642.857143
4       6   69761.904762

I would like to apply this to all stocks. The result will be a table with a stock column but no date column (since it is an aggregation across all days). Something like this;

    stock  binNum         volume
0  stock0       2  174095.238095
1  stock0       3  100428.571429
2  stock0       4   79880.952381
3  stock0       5   73642.857143
4  stock0       6   69761.904762

I can do this by putting the above code in a loop and bolting on the rows one by one, but I am sure there is a more elegant way to do it via grouping and aggregation. Can anyone shed some light please?

Steztric
  • 181
  • 7

1 Answers1

2

Sorry, I realised all I needed to do was to add another level of grouping, but the aggregation is the same;

stockBinGroups = data.groupby(['stock', 'binNum'], sort=False)
binAveVol = stockBinGroups.aggregate({'volume': np.mean}).reset_index()
binAveVol.head()

    stock  binNum         volume
0  stock0       2  174095.238095
1  stock0       3  100428.571429
2  stock0       4   79880.952381
3  stock0       5   73642.857143
4  stock0       6   69761.904762
Steztric
  • 181
  • 7