Optimum documentation
Configuration
Configuration
The IPUConfig class enables PopArt and PopTorch configuration, allowing to control the behavior of the IPUs. It is JSON-serializable, and can be loaded and saved from / to a local directory / file, as well as from / to the 🤗 Hub.
IPUConfig
class optimum.graphcore.IPUConfig
< source >( **kwargs )
Parameters
-
seed (
int, optional) — Sets the seed for the random number generator on the IPU. -
auto_loss_scaling (
bool, optional, defaults toFalse) — Whether automatic loss scaling is enabled on the IPU. When using float16/half values for activations, gradients, and weights, the loss value needs to be scaled by a constant factor to avoid underflow/overflow. This adjustment is known as loss scaling. This setting automatically sets a global loss scaling factor during training. Note: This is an experimental feature and may not behave as expected. -
executable_cache_dir (
str, optional, defaults to"") — Enables caching the compile executables to a directory.
Parameters for controlling the batch size
-
replication_factor (
int, optional, defaults to 1) — The number of replicas for data-parallelism during training. It depends on the size of the pipeline as well as the number of IPUs available. For example: on a Pod16, with a 4-IPU pipeline, replication_factor must be betwen 1 and 4. -
inference_replication_factor (
int, optional, defaults to 1) — Same asreplication_factorfor inference. -
gradient_accumulation_steps (
int, optional, defaults to 1) — Number of micro-batches to accumulate for the gradient calculation. Accumulates the gradient gradient_accumulation times before updating the model using the gradient.
Parameters related to parallelism
-
layers_per_ipu (
List[int]) — Specifies the number of layers that will be put on each IPU for pipelined execution. For instance:[2, 3, 4, 2]specifies a 4-IPU pipeline, where the first two layers will be put on IPU0, the following three on IPU1, the next four on IPU2 and the last two on IPU3. -
sharded_execution_for_inference (
bool, optional, defaults toFalse) — Whether to use a shared execution strategy for inference instead of pipelined. To learn more, read the PopTorch documentation.
Parameters for memory management
-
optimizer_state_offchip (
bool, optional, defaults toTrue) — Whether to use the off chip memory to store the optimizer state or to use the on chip memory. -
replicated_tensor_sharding (
bool, optional, defaults toFalse) — Shards the optimizer between replicas with zero-redundancy. -
matmul_proportion (
List[float]orfloat, optional, defaults to 0.6) — Sets the amount of temporary memory made available on per-IPU basis. Use this setting to control the amount of temporary memory available to operations such as:- convolution
- matrix multiplication
- embedding lookups
- indexing operations
-
enable_half_partials (
bool, optional, defaults toTrue) — Whether the data type of partial results for matrix multiplication and convolution operators should be float16 or not. -
embedding_serialization_factor (
int, optional, defaults to 1) — The factor to use to serialze embeddings. Nothing happens ifembedding_serialization_factor = 1, and forembedding_serialization_factor > 1, thetorch.nn.Embeddinglayer is replaced by aoptimum.graphcore.modeling_utils.SerializedEmbeddinglayer. -
recompute_checkpoint_every_layer (
bool, optional, defaults toFalse) — Whether to use gradient checkpointing at the end of every layer. It can help in reducing the memory impact.
Parameters related to host / device synchronization
-
device_iterations (
int, optional, defaults to 1) — Number of iterations the device should run over the data before returning to the user during training. This is equivalent to running the IPU in a loop over that the specified number of iterations, with a new batch of data each time. However, increasing deviceIterations is more efficient because the loop runs on the IPU directly. -
inference_device_iterations (
int, optional, defaults to 1) — Same asdevice_iterationsfor inference. -
output_mode (
str, optional, defaults to"final") — Specifies which data to return from a model. Allowed values:all: returns a result for each batch.sum: returns the sum of all batches.final: returns the last batch.default:allfor inference,finalfor training.
Class for PopArt and PopTorch configuration. Handles the conversion to poptorch options as well as configuration pod type specialization.
batch_size_factor
< source >(
for_inference: bool = False
pod_type: typing.Optional[str] = None
)
→
int
Computes the factor to apply to the micro batch size to get the combined batch size.
for_pod_type
< source >(
pod_type: typing.Optional[str] = None
)
→
IPUConfig
Creates an IPUConfig specialized for a POD type.
to_options
< source >(
for_inference: bool = False
compile_only: bool = False
pod_type: typing.Optional[str] = None
)
→
poptorch.Options
Parameters
-
for_inference (
bool, defaults toFalse) — If True, the resulting poptorch.Options will be adapted inference, it will be adapted for training otherwise. -
compile_only (
bool, defaults toFalse) — If True, compilation will be performed offline, no IPUs required. -
pod_type (
str, optional) — The POD type to specialize thepoptorch.Optionsfor.
Returns
poptorch.Options
The option representing the IPUConfig.
Creates a poptorch.Options from the IPUConfig.
update_from_string
< source >( update_str: str )
Updates attributes of this class with attributes from update_str.
The expected format is ints, floats and strings as is, and for booleans use true or false, and for lists
use [a b c d]. For example: "n_embd=10,resid_pdrop=0.2,scale_attn_weights=false,summary_type=cls_index, matmul_proportion=[0.08 0.2 0.25 0.25]".
The keys to change have to already exist in the config object.