Transformers documentation
Decision Transformer
Decision Transformer
Overview
Decision Transformer モデルは、Decision Transformer: Reinforcement Learning via Sequence Modeling で提案されました。 Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
論文の要約は次のとおりです。
強化学習(RL)をシーケンスモデリング問題として抽象化するフレームワークを紹介します。 これにより、Transformer アーキテクチャのシンプルさとスケーラビリティ、および関連する進歩を活用できるようになります。 GPT-x や BERT などの言語モデリングで。特に、Decision Transformer というアーキテクチャを紹介します。 RL の問題を条件付きシーケンス モデリングとして投げかけます。値関数に適合する以前の RL アプローチとは異なり、 ポリシー勾配を計算すると、Decision Transformer は因果的にマスクされたアルゴリズムを利用して最適なアクションを出力するだけです。 変成器。望ましいリターン (報酬)、過去の状態、アクションに基づいて自己回帰モデルを条件付けすることにより、 Decision Transformer モデルは、望ましいリターンを達成する将来のアクションを生成できます。そのシンプルさにも関わらず、 Decision Transformer は、最先端のモデルフリーのオフライン RL ベースラインのパフォーマンスと同等、またはそれを超えています。 Atari、OpenAI Gym、Key-to-Door タスク
このバージョンのモデルは、状態がベクトルであるタスク用です。
このモデルは、edbeeching によって提供されました。元のコードは ここ にあります。
DecisionTransformerConfig
class transformers.DecisionTransformerConfig
< source >( state_dim = 17 act_dim = 4 hidden_size = 128 max_ep_len = 4096 action_tanh = True vocab_size = 1 n_positions = 1024 n_layer = 3 n_head = 1 n_inner = None activation_function = 'relu' resid_pdrop = 0.1 embd_pdrop = 0.1 attn_pdrop = 0.1 layer_norm_epsilon = 1e-05 initializer_range = 0.02 scale_attn_weights = True use_cache = True bos_token_id = 50256 eos_token_id = 50256 scale_attn_by_inverse_layer_idx = False reorder_and_upcast_attn = False add_cross_attention = False **kwargs )
Parameters
- state_dim (int, optional, defaults to 17) — The state size for the RL environment
- act_dim (int, optional, defaults to 4) — The size of the output action space
- hidden_size (`, defaults to 128) — Dimension of the hidden representations.
- max_ep_len (int, optional, defaults to 4096) — The maximum length of an episode in the environment
- action_tanh (bool, optional, defaults to True) — Whether to use a tanh activation on action prediction
- vocab_size (`, defaults to 1) — Vocabulary size of the model. Defines the number of different tokens that can be represented by the input_ids.
- n_positions (`, defaults to 1024) — The maximum sequence length that this model might ever be used with.
- n_layer (`, defaults to 3) — Number of hidden layers in the Transformer decoder.
- n_head (`, defaults to 1) — Number of attention heads for each attention layer in the Transformer decoder.
- n_inner (`) — Dimension of the MLP representations.
- activation_function (`, defaults to relu) — The non-linear activation function (function or string) in the decoder. For example, “gelu”, “relu”, “silu”, etc.
- resid_pdrop (`, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
- embd_pdrop (`, defaults to 0.1) — The dropout ratio for the embeddings.
- attn_pdrop (`, defaults to 0.1) — The dropout ratio for the attention probabilities.
- layer_norm_epsilon (`, defaults to 1e-05) — The epsilon used by the layer normalization layers.
- initializer_range (`, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
- scale_attn_weights (bool, optional, defaults to True) — Scale attention weights by dividing by sqrt(hidden_size)..
- use_cache (`, defaults to True) — Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if config.is_decoder=True or when the model is a decoder-only generative model.
- bos_token_id (`, defaults to 50256) — Token id used for beginning-of-stream in the vocabulary.
- eos_token_id (`, defaults to 50256) — Token id used for end-of-stream in the vocabulary.
- scale_attn_by_inverse_layer_idx (bool, optional, defaults to False) — Whether to additionally scale attention weights by 1 / layer_idx + 1.
- reorder_and_upcast_attn (bool, optional, defaults to False) — Whether to scale keys (K) prior to computing attention (dot-product) and upcast attention dot-product/softmax to float() when training with mixed precision.
- add_cross_attention (`, defaults to False) — Whether cross-attention layers should be added to the model.
This is the configuration class to store the configuration of a DecisionTransformerModel. It is used to instantiate a Decision Transformer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the
Configuration objects inherit from [PreTrainedConfig] and can be used to control the model outputs. Read the documentation from [PreTrainedConfig] for more information.
Example:
>>> from transformers import DecisionTransformerConfig, DecisionTransformerModel
>>> # Initializing a DecisionTransformer configuration
>>> configuration = DecisionTransformerConfig()
>>> # Initializing a model (with random weights) from the configuration
>>> model = DecisionTransformerModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.configDecisionTransformerGPT2Model
[[autodoc]] DecisionTransformerGPT2Model - forward
DecisionTransformerModel
[[autodoc]] DecisionTransformerModel - forward
Update on GitHub