DiCE:TheInfinitelyDifferentiableMonteCarloEstimator
The score function estimator is widely used for estimating gradients of stochastic objectives in Stochastic Computation Graphs (SCG), e.g., in reinforcement learning and meta-learning. While deriving the first order gradient estimators by differentiating a surrogate loss (SL) objective is computatio
用户评论