0

I aim to train a neural network to predict 2 distributions (10 quantiles, i.e. deciles) at 5 time points. So my y is of shape:

(batch size, time points, distribution values) => (batch size, 5, 20)

The distributions are standard quantiles, summing to 1 but will not be normally distributed - more like a negative binomial distribution so will hold values like: [0.0,0.2,0.55,0.1,0.1,0.05,0.0,0.0,0.0,0.0,0.0]

I believe the best way to do this is using a sigmoid activation on the output layer for each distribution separately and then concatenating, like the following tensorflow model code:

import tensorflow as tf
#x is of size (batch size, 5, 100) -> 5 is the 5 time points, 100 here is arbitrary
x
output_dists = 2
n_quants = 10
all_ = []
for i in range(1,output_dists+1):
    #use a dense layer of size equal to the number of quantiles for each output channel
    #use output layer with softmax activation
    chan_quant = tf.keras.layers.Dense(n_quants, activation='softmax',
                                       name=f"quant_out_{i}")(x)
    #append channel
    all_quant.append(chan_quant)

# concat on axis to to go from 2 (batch size, 5, 10) to (batch size, 5, 20)
output_quant = tf.keras.layers.concatenate(all_quant, axis=2,name="quant")

I then use cross-entropy loss for back prop since quantiles sum to 1.

My question is how best to structure the shape of data and train a neural network to predict multiple distributions at multiple time points. For example, whether I should concatenate the two distributions in this way so there are 20 values in axis 2 or should I instead concatenate so there are (batch size, 5, 2, 10) for the loss function? Also is the shape with the different time points of distributions (5) valid for the loss function?

A_Murphy
  • 30
  • 5

0 Answers0