I'd like to get an average embedding to use as an input.
Without feature_column, it can be done in this way
(from Tensorflow: how to look up and average a different amount of embedding vectors per training instance, with multiple training instances per minibatch?)
with tf.Graph().as_default():
embedding = tf.placeholder(shape=[10,3], dtype=tf.float32)
user = tf.placeholder(shape=[None, None], dtype=tf.int32)
selected = tf.gather(embedding, user)
non_zero_count = tf.cast(tf.count_nonzero(user, axis=1), tf.float32)
embedding_sum = tf.reduce_sum(selected, axis=1)
average = embedding_sum / tf.expand_dims(non_zero_count, axis=1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
embedding_ = np.concatenate([np.zeros((1,3)),np.random.randn(9,3)], axis=0)
user_ = [[3,5,7,0], [1,2,0,0]]
print(sess.run(average, feed_dict={embedding:embedding_, user:user_}))
print(np.sum([embedding_[i] for i in user_], axis=1) / np.atleast_2d(np.count_nonzero(user_, axis=1)).T)
If I could convert feature_column into a tensor, I could do similar thing, but don't know how to convert
user_fc = feature_column.categorical_column_with_vocabulary_list(
'user', [3,5,7,1,2,0])
user_embedding_column = feature_column.embedding_column(user, dimension=3)
embedding = user_embedding.to_tensor() # if I can do this, I could replace the embedding in the code above with this.