trainer.optimization package

Submodules

trainer.optimization.adafactor module

class trainer.optimization.adafactor.Adafactor(params, lr=None, eps=(1e-30, 0.001), clip_threshold=1.0, decay_rate=- 0.8, beta1=None, weight_decay=0.0, scale_parameter=True, relative_step=True, warmup_init=False)[source]

Bases: torch.optim.optimizer.Optimizer

AdaFactor pytorch implementation as introduced in Adafactor: Adaptive Learning Rates with Sublinear Memory Cost https://arxiv.org/abs/1804.04235.

Parameters
  • params (Iterable[nn.parameter.Parameter]) – Iterable of parameters to optimize or dictionaries defining parameter groups.

  • lr (float, optional) – The external learning rate.

  • eps (Tuple[float, float], optional) – Regularization constants for square gradient and parameter scale respectively.

  • clip_threshold (float, optional) – Threshold of root mean square of final gradient update.

  • decay_rate (float, optional) – Coefficient used to compute running averages of square.

  • beta1 (float, optional) – Coefficient used for computing running averages of gradient.

  • weight_decay (float, optional) – Weight decay (L2 penalty).

  • scale_parameter (bool, optional) – If True, learning rate is scaled by root mean square.

  • relative_step (bool, optional) – If True, time-dependent learning rate is computed instead of external learning rate.

  • warmup_init (bool, optional) – Time-dependent learning rate computation depends on whether warm-up initialization is being used.

step(closure=None)[source]

Performs a single optimization step

Parameters

closure (callable, optional) – A closure that reevaluates the model and returns the loss.

trainer.optimization.adamw module

class trainer.optimization.adamw.AdamW(params: Iterable[torch.nn.parameter.Parameter], lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-06, weight_decay: float = 0.0, correct_bias: bool = True)[source]

Bases: torch.optim.optimizer.Optimizer

Implements Adam algorithm with weight decay fix as introduced in Decoupled Weight Decay Regularization <https://arxiv.org/abs/1711.05101>. :param params: Iterable of parameters to optimize or dictionaries defining parameter groups. :type params: Iterable[nn.parameter.Parameter] :param lr: The learning rate to use. :type lr: float, optional :param betas: Adam’s betas parameters (b1, b2). :type betas: Tuple[float,float], optional :param eps: Adam’s epsilon for numerical stability. :type eps: float, optional :param weight_decay: Decoupled weight decay to apply. :type weight_decay: float, optional :param correct_bias: Whether or not to correct bias in Adam. :type correct_bias: bool, optional

step(closure: Optional[Callable] = None)[source]

Performs a single optimization step. :param closure: A closure that reevaluates the model and returns the loss. :type closure: Callable, optional

trainer.optimization.optimization module

PyTorch optimization for BERT model.

trainer.optimization.optimization.get_constant_schedule(optimizer: torch.optim.optimizer.Optimizer, last_epoch: int = - 1)[source]

Create a schedule with a constant learning rate, using the learning rate set in optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer for which to schedule the learning rate.

  • last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.

Returns

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.

trainer.optimization.optimization.get_constant_schedule_with_warmup(optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: int, last_epoch: int = - 1)[source]

Create a schedule with a constant learning rate preceded by a warmup period during which the learning rate increases linearly between 0 and the initial lr set in the optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer for which to schedule the learning rate.

  • num_warmup_steps (int) – The number of steps for the warmup phase.

  • last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.

Returns

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.

trainer.optimization.optimization.get_cosine_schedule_with_warmup(optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = - 1)[source]

Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer for which to schedule the learning rate.

  • num_warmup_steps (int) – The number of steps for the warmup phase.

  • num_training_steps (int) – The total number of training steps.

  • num_cycles (float, optional, defaults to 0.5) – The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine).

  • last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.

Returns

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.

trainer.optimization.optimization.get_cosine_with_hard_restarts_schedule_with_warmup(optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: int = 1, last_epoch: int = - 1)[source]

Create a schedule with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, with several hard restarts, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer for which to schedule the learning rate.

  • num_warmup_steps (int) – The number of steps for the warmup phase.

  • num_training_steps (int) – The total number of training steps.

  • num_cycles (int, optional, defaults to 1) – The number of hard restarts to use.

  • last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.

Returns

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.

trainer.optimization.optimization.get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, last_epoch=- 1)[source]

Create a schedule with a learning rate that decreases linearly from the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer for which to schedule the learning rate.

  • num_warmup_steps (int) – The number of steps for the warmup phase.

  • num_training_steps (int) – The total number of training steps.

  • last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.

Returns

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.

trainer.optimization.optimization.get_polynomial_decay_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, lr_end=1e-07, power=1.0, last_epoch=- 1)[source]

Create a schedule with a learning rate that decreases as a polynomial decay from the initial lr set in the optimizer to end lr defined by lr_end, after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer for which to schedule the learning rate.

  • num_warmup_steps (int) – The number of steps for the warmup phase.

  • num_training_steps (int) – The total number of training steps.

  • lr_end (float, optional, defaults to 1e-7) – The end LR.

  • power (float, optional, defaults to 1.0) – Power factor.

  • last_epoch (int, optional, defaults to -1) – The index of the last epoch when resuming training.

Note: power defaults to 1.0 as in the fairseq implementation, which in turn is based on the original BERT implementation at https://github.com/google-research/bert/blob/f39e881b169b9d53bea03d2d341b31707a6c052b/optimization.py#L37

Returns

torch.optim.lr_scheduler.LambdaLR with the appropriate schedule.

trainer.optimization.optimization.get_scheduler(name: Union[str, towhee.trainer.utils.trainer_utils.SchedulerType], optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: Optional[int] = None, num_training_steps: Optional[int] = None)[source]

Unified API to get any scheduler from its name.

Parameters
  • name (str or :obj:`SchedulerType) – The name of the scheduler to use.

  • optimizer (torch.optim.Optimizer) – The optimizer that will be used during training.

  • num_warmup_steps (int, optional) – The number of warmup steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.

  • num_training_steps (int, optional) – The number of training steps to do. This is not required by all schedulers (hence the argument being optional), the function will raise an error if it’s unset and the scheduler type requires it.

trainer.optimization.optimization.get_warmup_steps(num_training_steps: int, warmup_steps: int, warmup_ratio: float) int[source]

Get number of steps used for a linear warmup.

Parameters
  • num_training_steps (int) – All training steps when training.

  • warmup_steps (int) – Warmup steps. If > 0, warmup_ratio will not work.

  • warmup_ratio (float) – Ratio of num training steps to warmup.

Returns

Warmup steps.

Module contents