i trying follow udacity tutorial on tensorflow came across following 2 lines word embedding models:
# embeddings inputs. embed = tf.nn.embedding_lookup(embeddings, train_dataset) # compute softmax loss, using sample of negative labels each time. loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, embed, train_labels, num_sampled, vocabulary_size))
now understand second statement sampling negative labels. question how know negative labels are? providing second function current input , corresponding labels along number of labels want (negatively) sample from. isn't there risk of sampling input set in itself?
this full example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/5_word2vec.ipynb
you can find documentation tf.nn.sampled_softmax_loss()
here. there explanation of candidate sampling provided tensorflow here (pdf).
how know negative labels are?
tensorflow randomly select negative classes among possible classes (for you, possible words).
isn't there risk of sampling input set in itself?
when want compute softmax probability true label, compute: logits[true_label] / sum(logits[negative_sampled_labels]
. number of classes huge (the vocabulary size), there little probability sample true_label negative label.
anyway, think tensorflow removes possibility altogether when randomly sampling. (edit: @alex confirms tensorflow default)