
Offpolicy Maximum Entropy Reinforcement Learning : Soft ActorCritic with Advantage Weighted Mixture Policy(SACAWMP)
The optimal policy of a reinforcement learning problem is often disconti...
read it

Implicit Policy for Reinforcement Learning
We introduce Implicit Policy, a general class of expressive policies tha...
read it

A series of maximum entropy upper bounds of the differential entropy
We present a series of closedform maximum entropy upper bounds for the ...
read it

CASAB: A Unified Framework of ModelFree Reinforcement Learning
Building on the breakthrough of reinforcement learning, this paper intro...
read it

Contextual Policy Reuse using Deep Mixture Models
Reinforcement learning methods that consider the context, or current sta...
read it

GALILEO: A Generalized LowEntropy Mixture Model
We present a new method of generating mixture models for data with categ...
read it

MixtureofParents Maximum Entropy Markov Models
We present the mixtureofparents maximum entropy Markov model (MoPMEMM...
read it
Maximum Entropy Reinforcement Learning with Mixture Policies
Mixture models are an expressive hypothesis class that can approximate a rich set of policies. However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward. The entropy of a mixture model is not equal to the sum of its components, nor does it have a closedform expression in most cases. Using such policies in MaxEnt algorithms, therefore, requires constructing a tractable approximation of the mixture entropy. In this paper, we derive a simple, lowvariance mixtureentropy estimator. We show that it is closely related to the sum of marginal entropies. Equipped with our entropy estimator, we derive an algorithmic variant of Soft ActorCritic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.
READ FULL TEXT
Comments
There are no comments yet.