3

I'm struggling with a Pandas problem. I have the following data.

+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+
| symbol | side | status  | origQty | executedQty |     qty     | availableQty |   price    | boughtValue | soldValue  | dcaLevel |
+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+
| DGDBTC | BUY  | FILLED  |   0.125 |  0.12500000 |  0.12500000 |   0.12500000 | 0.02000700 |  0.00250088 |            |          |
| DGDBTC | BUY  | FILLED  |   0.125 |  0.12500000 |  0.12500000 |   0.25000000 | 0.01960100 |  0.00245013 |            |          |
| DGDBTC | SELL | FILLED  |    0.25 |  0.25000000 | -0.25000000 |   0.00000000 | 0.02005900 |             | 0.00501475 |          |
| DGDBTC | BUY  | FILLED  |   0.113 |  0.11300000 |  0.11300000 |   0.11300000 | 0.02203000 |  0.00248939 |            |          |
| DGDBTC | BUY  | FILLED  |   0.113 |  0.11300000 |  0.11300000 |   0.22600000 | 0.02160300 |  0.00244114 |            |          |
| DGDBTC | BUY  | EXPIRED |   0.226 |  0.00000000 |  0.00000000 |              | 0.02125500 |             |            |          |
| DGDBTC | BUY  | PARTIAL |   0.226 |  0.15800000 |  0.15800000 |   0.38400000 | 0.02126100 |             |            |          |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02196600 |             |            |          |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02214300 |             |            |          |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02189900 |             |            |          |
| DGDBTC | BUY  | FILLED  |   0.384 |  0.38400000 |  0.38400000 |   0.76800000 | 0.02082900 |  0.00799834 |            |          |
| DGDBTC | BUY  | FILLED  |   0.768 |  0.76800000 |  0.76800000 |   1.53600000 | 0.01984300 |  0.01523942 |            |          |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02074400 |             |            |          |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02094100 |             |            |          |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02076800 |             |            |          |
| DGDBTC | SELL | PARTIAL |   1.536 |  0.30300000 | -0.30300000 |   1.23300000 | 0.02065000 |             |            |          |
| DGDBTC | SELL | FILLED  |   1.233 |  1.23300000 | -1.23300000 |   0.00000000 | 0.02070000 |             | 0.02552310 |          |
+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+

This a subset of the data group by symbol. For each symbol I want to populate the last column with a value that complies with the following rules:

  1. Each buy order (side=BUY) in a series has the value zero (0).
  2. For each consecutive buy order the value is increased by one (1).
  3. When a sell order (side=SELL) is reached it marks a new buy order serie.
  4. Rows with status EXPIRED are skipped.

Example:

+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+
| symbol | side | status  | origQty | executedQty |     qty     | availableQty |   price    | boughtValue | soldValue  | dcaLevel |
+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+
| DGDBTC | BUY  | FILLED  |   0.125 |  0.12500000 |  0.12500000 |   0.12500000 | 0.02000700 |  0.00250088 |            |        0 |
| DGDBTC | BUY  | FILLED  |   0.125 |  0.12500000 |  0.12500000 |   0.25000000 | 0.01960100 |  0.00245013 |            |        1 |
| DGDBTC | SELL | FILLED  |    0.25 |  0.25000000 | -0.25000000 |   0.00000000 | 0.02005900 |             | 0.00501475 |          |
| DGDBTC | BUY  | FILLED  |   0.113 |  0.11300000 |  0.11300000 |   0.11300000 | 0.02203000 |  0.00248939 |            |        0 |
| DGDBTC | BUY  | FILLED  |   0.113 |  0.11300000 |  0.11300000 |   0.22600000 | 0.02160300 |  0.00244114 |            |        1 |
| DGDBTC | BUY  | EXPIRED |   0.226 |  0.00000000 |  0.00000000 |              | 0.02125500 |             |            |          |
| DGDBTC | BUY  | PARTIAL |   0.226 |  0.15800000 |  0.15800000 |   0.38400000 | 0.02126100 |             |            |        2 |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02196600 |             |            |          |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02214300 |             |            |          |
| DGDBTC | SELL | EXPIRED |   0.384 |  0.00000000 |  0.00000000 |              | 0.02189900 |             |            |          |
| DGDBTC | BUY  | FILLED  |   0.384 |  0.38400000 |  0.38400000 |   0.76800000 | 0.02082900 |  0.00799834 |            |        3 |
| DGDBTC | BUY  | FILLED  |   0.768 |  0.76800000 |  0.76800000 |   1.53600000 | 0.01984300 |  0.01523942 |            |        4 |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02074400 |             |            |          |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02094100 |             |            |          |
| DGDBTC | SELL | EXPIRED |   1.536 |  0.00000000 |  0.00000000 |              | 0.02076800 |             |            |          |
| DGDBTC | SELL | PARTIAL |   1.536 |  0.30300000 | -0.30300000 |   1.23300000 | 0.02065000 |             |            |          |
| DGDBTC | SELL | FILLED  |   1.233 |  1.23300000 | -1.23300000 |   0.00000000 | 0.02070000 |             | 0.02552310 |          |
+--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+

I have tried the following two ways.

merged_df['dcaLevel'] = merged_df[(merged_df['side'] == 'BUY') & (merged_df['status'].isin(['FILLED', 'PARTIAL']))].groupby(['symbol'])['dcaLevel'].apply(lambda x: x.shift(1) + 1)

This way throws an error.

merged_df['dcaLevel'] = merged_df[(merged_df['side'] == 'BUY') & (merged_df['status'].isin(['FILLED', 'PARTIAL']))].groupby(['symbol'])['dcaLevel'].apply(lambda x: 0 if x.shift(1) else x.shift(1) + 1)

I tried the following alternative approach.

symbol_df = merged_df.loc[merged_df['symbol'] == 'DGDBTC']
tmp_df = symbol_df[(symbol_df['side'] == 'BUY') & (symbol_df['status'].isin(['FILLED', 'PARTIAL']))]
tmp_df['dcaLevel'] = np.where(tmp_df['availableQty'] < tmp_df['availableQty'].shift(1), 0, tmp_df['dcaLevel'].shift(1) + 1)

It works on some rows and not on others and the first buy order in a serie remains NaN.

I have coded the following which does work, but I'm sure there is an easier way to do this with Pandas.

merged_df['dcaLevel'] = np.NaN
grouped = merged_df[merged_df['status'].isin(['FILLED', 'PARTIAL'])].groupby(['symbol'])
col_idx = merged_df.columns.get_loc('dcaLevel')
for name, group in grouped:
    first = True
    for index, row in group.iterrows():
        if row['side'] == 'SELL':
            first = True
            dca_level = np.NaN
        else:
            if first:
                first = False
                dca_level = 0
            else:
                dca_level = dca_level + 1
            merged_df.iloc[index, col_idx] = dca_level
merged_df[merged_df['symbol'] == 'DGDBTC']

I hope someone can help solve this problem.

nidkil
  • 131
  • 1
  • 1
  • 3

1 Answers1

2

A simple approach could be the following:

import numpy as np
import pandas as pd

counter = 0

def conditional_cumulative_sum(x):
    global counter

    if x['status'] == 'EXPIRED':
        return np.nan
    elif x['side'] == 'BUY':
        temporal = counter
        counter += 1
        return temporal
    elif x['side'] == 'SELL':
        counter = 0
        return np.nan

frame = pd.read_csv('sample.csv')
frame['dcaLevel'] = frame.apply(conditional_cumulative_sum, axis=1)

Assuming sample.csv contains the data provided in the examples and that the empty values in the column dcaLevel correspond to NaN values.

Dani Mesejo
  • 2,206
  • 11
  • 19
  • Hi, thx for the quick respons. I have tried the solution and it is not working for me. The reason being it should be applied to a symbol - groupby('symbol') - but when I do this it throws an error ("TypeError: conditional_cumulative_sum() got an unexpected keyword argument 'axis'"). When I add 'axis' to conditional_cumulative_sum I get a different error ("ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."). Any suggestions how to fix this? – nidkil Feb 15 '18 at 10:27
  • So what you want is a function like the one you described in the steps (1-4) but only applied to the row where the symbol is DGDBTC? – Dani Mesejo Feb 15 '18 at 18:55
  • Yes a function that can be applied to the rows of a group. Please see the code example at the end of my question for an example of the code. – nidkil Feb 16 '18 at 12:53