trainer package

Subpackages

Submodules

trainer.callback module

class trainer.callback.Callback[source]

Bases: object

Callback defines a set of functions which will be called in the training process. Customized Callback could inherent the base Callback and overwrite its methods to control the training process or handle the training information.

on_batch_begin(batch: Tuple, logs: Dict) None[source]

Hook function invoked before every batch calculation.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked after every batch calculation.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_epoch_begin(epochs: int, logs: Dict) None[source]

Hook function invoked before each epoch.

Parameters
  • epochs (int) – Epoch index.

  • logs (Dict) – Kv store to save and load info.

on_epoch_end(epochs: int, logs: Dict) None[source]

Hook function invoked after each epoch.

Parameters
  • epochs (int) – Epoch index.

  • logs (Dict) – Kv store to save and load info.

on_eval_batch_begin(batch: Tuple, logs: Dict) None[source]

Hook function invoked before every batch calculation in evaluate stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_eval_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked after every batch calculation in evaluate stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_eval_begin(logs: Dict) None[source]

Hook function invoked before evaluate stage.

Parameters

logs (Dict) – Kv store to save and load info.

on_eval_end(logs: Dict) None[source]

Hook function invoked after evaluate stage.

Parameters

logs (Dict) – Kv store to save and load info.

on_train_batch_begin(batch: Tuple, logs: Dict) None[source]

Hook function invoked before train stage.

Parameters

logs (Dict) – Kv store to save and load info.

on_train_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked before every batch calculation in train stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_train_begin(logs: Dict) None[source]

Hook function invoked before train stage.

Parameters

logs (Dict) – Kv store to save and load info.

on_train_end(logs: Dict) None[source]

Hook function invoked after train stage.

Parameters

logs (Dict) – Kv store to save and load info.

set_model(model: torch.nn.modules.module.Module) None[source]

Set the model to callback.

Parameters

model (torch.nn.Module) – The model which callback can operate.

set_optimizer(optimizer: torch.optim.optimizer.Optimizer) None[source]

Set the optimizer to callback.

Parameters

optimizer (torch.optim.Optimizer) – The optimizer which callback can operate.

set_trainercontrol(trainercontrol: trainer.callback.TrainerControl) None[source]

Set the trainercontrol to callback.

Parameters

trainercontrol (towhee.trainer.callback.TrainerControl) – The trainercontrol which callback can operate.

class trainer.callback.CallbackList(callbacks: Optional[List[trainer.callback.Callback]] = None)[source]

Bases: object

CallbackList aggregate multiple Callback in the same object. Invoke the callbacks of CallbackList will invoke corresponding callback in each “Callback” in the FIFO sequential order.

Parameters

callbacks (List[towhee.trainer.callback.Callback]) – A list of callbacks which methods will be called simultaneously.

Example:

add_callback(callback: trainer.callback.Callback, singleton: bool = True)[source]
Parameters
  • callback (towhee.trainer.callback.Callback) – The callback need to be added.

  • singleton (bool) – If set true, only one instance of same Callback will remain in callbacklist.

on_batch_begin(batch: Tuple, logs: Dict) None[source]

Hook function invoked before every batch calculation.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked after every batch calculation.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_epoch_begin(epochs: int, logs: Dict) None[source]

Hook function invoked before each epoch.

Parameters
  • epochs (int) – Epoch index.

  • logs (Dict) – Kv store to save and load info.

on_epoch_end(epochs: int, logs: Dict) None[source]

Hook function invoked after each epoch.

Parameters
  • epochs (int) – Epoch index.

  • logs (Dict) – Kv store to save and load info.

on_eval_batch_begin(batch: Tuple, logs: Dict) None[source]

Hook function invoked before every batch calculation in evaluate stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_eval_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked after every batch calculation in evaluate stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_eval_begin(logs: Dict) None[source]

Hook function invoked before evaluate stage.

Parameters

logs (Dict) – Kv store to save and load info.

on_eval_end(logs: Dict) None[source]

Hook function invoked after evaluate stage.

Parameters

logs (Dict) – Kv store to save and load info.

on_train_batch_begin(batch: Tuple, logs: Dict) None[source]

Hook function invoked before train stage.

Parameters

logs (Dict) – Kv store to save and load info.

on_train_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked before every batch calculation in train stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_train_begin(logs: Dict) None[source]

Hook function invoked before train stage.

Parameters

logs (Dict) – Kv store to save and load info.

on_train_end(logs: Dict) None[source]

Hook function invoked after train stage.

Parameters

logs (Dict) – Kv store to save and load info.

pop_callback(callback: trainer.callback.Callback)[source]
Parameters

callback (towhee.trainer.callback.Callback) – The callback need to be removed from callback list.

set_model(model: torch.nn.modules.module.Module)[source]

Set the model to callback.

Parameters

model (torch.nn.Module) – The model which callback can operate.

set_optimizer(optimizer: torch.optim.optimizer.Optimizer)[source]

Set the optimizer to callback.

Parameters

optimizer (torch.optim.Optimizer) – The optimizer which callback can operate.

set_trainercontrol(trainercontrol: trainer.callback.TrainerControl)[source]

Set the trainercontrol to callback.

Parameters

trainercontrol (towhee.trainer.callback.TrainerControl) – The trainercontrol which callback can operate.

class trainer.callback.EarlyStoppingCallback(trainercontrol: trainer.callback.TrainerControl, monitor: str, min_delta: float = 0, patience: int = 0, mode: str = 'max', baseline: Optional[float] = None)[source]

Bases: trainer.callback.Callback

Assuming the goal of a training is to minimize the loss. With this, the metric to be monitored would be ‘loss’, and mode would be ‘min’. Training loop will check at end of every epoch whether the loss is no longer decreasing, considering the min_delta and patience if applicable. Once it’s found no longer decreasing. trainercontrol. should_training_stop is marked True.

Parameters
  • trainercontrol (towhee.trainer.callback.TrainerControl) – The trainercontrol which callback can operate.

  • monitor (str) – Quantity to be monitored.

  • min_delta (float) – Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.

  • patience (str) – Number of epochs with no improvement after which training will be stopped.

  • mode (str) – One of {“min”, “max”}. In min mode, training will stop when the quantity monitored has stopped decreasing; in “max” mode it will stop when the quantity monitored has stopped increasing.

  • baseline (float) – Baseline value for the monitored quantity. Training will stop if the model doesn’t show improvement over the baseline.

get_monitor_value(logs: Dict)[source]
on_epoch_end(epochs: int, logs: Optional[Dict] = None)[source]

Hook function invoked after each epoch.

Parameters
  • epochs (int) – Epoch index.

  • logs (Dict) – Kv store to save and load info.

on_train_begin(logs: Optional[Dict] = None)[source]

Hook function invoked before train stage.

Parameters

logs (Dict) – Kv store to save and load info.

on_train_end(logs: Optional[Dict] = None)[source]

Hook function invoked after train stage.

Parameters

logs (Dict) – Kv store to save and load info.

class trainer.callback.ModelCheckpointCallback(trainercontrol: trainer.callback.TrainerControl, filepath: str = './', every_n_epoch: int = - 1, every_n_iteration: int = - 1)[source]

Bases: trainer.callback.Callback

ModelCheckpointCallback is intended to save the model at some interval. It can be set in epoch mode or iteration mode. Only one of every_n_epoch and every_n_iteration can be set to a positive value and the trainer.should_save will set to True when the condion meets.

Parameters
  • trainercontrol (TrainerControl) – The trainercontrol which callback can operate.

  • filepath (str) – Filepath to save the model.

  • every_n_epoch (int) – Save the model after n epochs.

  • every_n_iteration (int) – Save the model after n iterations.

on_batch_end(batch: Tuple, logs: Optional[Dict] = None)[source]

Hook function invoked after every batch calculation.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_epoch_end(epochs: int, logs: Optional[Dict] = None)[source]

Hook function invoked after each epoch.

Parameters
  • epochs (int) – Epoch index.

  • logs (Dict) – Kv store to save and load info.

class trainer.callback.PrintCallBack(total_epoch_num: int, step_frequency: int = 16)[source]

Bases: trainer.callback.Callback

PrintCallBack is intended to print logs on the screen.

Parameters
  • total_epoch_num (int) – Epoch numbers expected to run.

  • step_frequency (int) – Print information in every n steps.

on_eval_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked after every batch calculation in evaluate stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_train_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked before every batch calculation in train stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

class trainer.callback.ProgressBarCallBack(total_epoch_num: int, train_dataloader: torch.utils.data.dataloader.DataLoader)[source]

Bases: trainer.callback.Callback

ProgressBarCallBack is intended to print a progress bar to visualize current training progress. The tqdm is used as the progress bar backend.

Parameters
  • total_epoch_num (int) – Epoch numbers expected to run.

  • train_dataloader (torch.utils.data.DataLoader) – training dataloader for tqdm to warp.

on_epoch_begin(epochs: int, logs: Dict) None[source]

Hook function invoked before each epoch.

Parameters
  • epochs (int) – Epoch index.

  • logs (Dict) – Kv store to save and load info.

on_eval_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked after every batch calculation in evaluate stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_train_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked before every batch calculation in train stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

class trainer.callback.TensorBoardCallBack(summary_writer_constructor: Callable, log_dir: Optional[str] = None, comment: str = '')[source]

Bases: trainer.callback.Callback

TensorBoardCallBack is intended to record the essential value(e.g. epoch_loss) to tensorboard after each iteration. If tensorboard is available, you can see the tensorboard in localhost:6006.

Parameters
  • summary_writer_constructor (Callable) – Function which construct tensorboard summary writer.

  • log_dir (str) – Save directory location.

  • comment (str) – Comment log_dir suffix appended to the default log_dir.

on_eval_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked after every batch calculation in evaluate stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

on_train_batch_end(batch: Tuple, logs: Dict) None[source]

Hook function invoked before every batch calculation in train stage.

Parameters
  • batch (Tuple) – The data batch to calculate.

  • logs (Dict) – Kv store to save and load info.

class trainer.callback.TrainerControl(should_training_stop: bool = False, should_epoch_stop: bool = False, should_save: bool = False, should_evaluate=False, should_log=False)[source]

Bases: object

TrainerControl defines a set of current control status which trainer can get and take the corresponding action. It can be used by customized Callback to interfere the trainer.

Parameters
  • should_training_stop – (bool) whether or not training should be interrupted.

  • should_epoch_stop – (bool) whether or not current training epoch should be interrupted.

  • should_save – (bool) whether or not trainer should save current model.

  • should_evaluate – (bool) whether or not trainer should evaluate current model.

  • should_log – (bool) whether or not trainer should report the log.

trainer.metrics module

trainer.modelcard module

class trainer.modelcard.ModelCard(model_name: Optional[str] = None, model_architecture: Optional[str] = None, model_overview: Optional[str] = None, language: Optional[Union[str, List[str]]] = None, tags: Optional[Union[str, List[str]]] = None, tasks: Optional[Union[str, List[str]]] = None, datasets: Optional[Union[str, List[str]]] = None, datasets_tags: Optional[Union[str, List[str]]] = None, dataset_args: Optional[Union[str, List[str]]] = None, eval_results: Optional[Dict[str, float]] = None, eval_lines: Optional[List[str]] = None, training_summary: Optional[Dict[str, Any]] = None, training_config: Optional[towhee.trainer.training_config.TrainingConfig] = None, source: Optional[str] = 'trainer')[source]

Bases: object

Utilities to generate and save model card. Recommended attributes from https://arxiv.org/abs/1810.03993 (see papers) https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/actionrecognitionnet

Parameters
  • model_name (Optional[str]) – model name

  • model_architecture (Optional[str]) – model structure

  • model_overview (Optional[str] = None) –

  • language (Optional[Union[str, List[str]]]) – language

  • tags (Optional[Union[str, List[str]]]) – tags

  • tasks (Optional[Union[str, List[str]]]) – model tasks (eg. classification, prediction, etc.)

  • datasets (Optional[Union[str, List[str]]]) – datasets used to train/test the model

  • datasets_tags (Optional[Union[str, List[str]]]) – tags of datasets

  • dataset_args (Optional[Union[str, List[str]]]) – arguments of dataset

  • eval_results (Optional[Dict[str, float]]) – evaluation results recorded

  • eval_lines (Optional[List[str]]) – evaluation baselines

  • training_summary (Optional[Dict[str, Any]]) – training summary include training information

  • training_config (Optional[TrainingConfig]) – training configurations

  • source (Optional[str]) – source of model card (default = “trainer”)

Example

>>> from towhee.trainer.modelcard import ModelCard
>>> model_card = ModelCard(model_name='test')
>>> # Print out model name stored in model card
>>> model_card.model_name
'test'
>>> # Save model card to "path/to/my_dir" as README.md
>>> model_card.save_model_card('/path/to/my_dir')
>>> # Save model card as "/path/to/my_dir/model_card.md"
>>> model_card.save_model_card('/path/to/my_dir/model_card.md')
save_model_card(save_directory_or_file)[source]

Write model card to the given filepath or directory

Parameters

save_directory_or_file (str) – file path or directory to write and save model card.

to_dict()[source]

Serializes this instance to a Python dictionary.

trainer.scheduler module

Scheduler utilities for pytorch optimization.

trainer.scheduler.check_scheduler(scheduler_type: str) bool[source]

Check if the scheduler type is supported.

Parameters

scheduler_type (str) – the type of the scheduler.

Return (bool):

if the scheduler type is supported.

Example

>>> from towhee.trainer.scheduler import check_scheduler
>>> check_scheduler('constant')
True
trainer.scheduler.configure_constant_scheduler(optimizer: torch.optim.optimizer.Optimizer, last_epoch: int = - 1)[source]

Return a scheduler with a constant learning rate, using the learning rate set in optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer for which to schedule the learning rate.

  • last_epoch (int) – The last epoch when resuming training.

Return (LambdaLR):

A constant scheduler

Example

>>> from towhee.trainer.scheduler import configure_constant_scheduler
>>> from towhee.trainer.optimization.adamw import AdamW
>>> from torch import nn
>>> def unwrap_scheduler(scheduler, num_steps=10):
>>>     lr_sch = []
>>>     for _ in range(num_steps):
>>>         lr_sch.append(scheduler.get_lr()[0])
>>>         scheduler.step()
>>>     return lr_sch
>>> mdl = nn.Linear(50, 50)
>>> optimizer = AdamW(mdl.parameters(), lr=10.0)
>>> num_steps = 2
>>> scheduler = configure_constant_scheduler(optimizer)
>>> lr_sch_1 = unwrap_scheduler(scheduler, num_steps)
[10.0, 10.0]
trainer.scheduler.configure_constant_scheduler_with_warmup(optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: int, last_epoch: int = - 1)[source]

Return a schedule with a constant learning rate preceded by a warmup period during which the learning rate increases linearly between 0 and the initial lr set in the optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer to be scheduled.

  • num_warmup_steps (int) – Warmup steps.

  • last_epoch (int) – The last epoch when training is resumed.

Return (LambdaLR):

A constant scheduler with warmup.

Example

>>> from towhee.trainer.scheduler import configure_constant_scheduler_with_warmup
>>> from towhee.trainer.optimization.adamw import AdamW
>>> from torch import nn
>>> def unwrap_scheduler(scheduler, num_steps=10):
>>>     lr_sch = []
>>>     for _ in range(num_steps):
>>>         lr_sch.append(scheduler.get_lr()[0])
>>>         scheduler.step()
>>>     return lr_sch
>>> mdl = nn.Linear(50, 50)
>>> optimizer = AdamW(mdl.parameters(), lr=10.0)
>>> num_steps = 10
>>> num_warmup_steps = 4
>>> scheduler = configure_constant_scheduler_with_warmup(optimizer, num_warmup_steps)
>>> lr_sch_1 = unwrap_scheduler(scheduler, num_steps)
[0.0, 2.5, 5.0, 7.5, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0]
trainer.scheduler.configure_cosine_scheduler_with_warmup(optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: float = 0.5, last_epoch: int = - 1)[source]

Return a scheduler with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer to be scheduled.

  • num_warmup_steps (int) – The steps for the warmup phase.

  • num_training_steps (int) – The number of training steps.

  • num_cycles (int) – The number of periods in te cosine scheduler.

  • last_epoch (int) – The last epoch when training is resumed.

Return (LambdaLR):

A cosine scheduler with warmup.

Example

>>> from towhee.trainer.scheduler import configure_cosine_scheduler_with_warmup
>>> from towhee.trainer.optimization.adamw import AdamW
>>> from torch import nn
>>> def unwrap_scheduler(scheduler, num_steps=10):
>>>     lr_sch = []
>>>     for _ in range(num_steps):
>>>         lr_sch.append(scheduler.get_lr()[0])
>>>         scheduler.step()
>>>     return lr_sch
>>> mdl = nn.Linear(50, 50)
>>> optimizer = AdamW(mdl.parameters(), lr=10.0)
>>> num_steps = 10
>>> num_warmup_steps = 4
>>> num_training_steps = 10
>>> scheduler = configure_cosine_scheduler_with_warmup(optimizer, num_warmup_steps, num_training_steps)
>>> lr_sch_1 = unwrap_scheduler(scheduler, num_steps)
[0.0, 5.0, 10.0, 9.61, 8.53, 6.91, 5.0, 3.08, 1.46, 0.38]
trainer.scheduler.configure_cosine_with_hard_restarts_scheduler_with_warmup(optimizer: torch.optim.optimizer.Optimizer, num_warmup_steps: int, num_training_steps: int, num_cycles: int = 1, last_epoch: int = - 1)[source]

Return a scheduler with a learning rate that decreases following the values of the cosine function between the initial lr set in the optimizer to 0, with several hard restarts, after a warmup period during which it increases linearly between 0 and the initial lr set in the optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer to be scheduled.

  • num_warmup_steps (int) – The steps for the warmup phase.

  • num_training_steps (int) – The number of training steps.

  • num_cycles (int) – The number of hard restarts to be used.

  • last_epoch (int) – The index of the last epoch when training is resumed.

Return (LambdaLR):

A cosine with hard restarts scheduler with warmup.

Example

>>> from towhee.trainer.scheduler import configure_cosine_with_hard_restarts_scheduler_with_warmup
>>> from towhee.trainer.optimization.adamw import AdamW
>>> from torch import nn
>>> def unwrap_scheduler(scheduler, num_steps=10):
>>>     lr_sch = []
>>>     for _ in range(num_steps):
>>>         lr_sch.append(scheduler.get_lr()[0])
>>>         scheduler.step()
>>>     return lr_sch
>>> mdl = nn.Linear(50, 50)
>>> optimizer = AdamW(mdl.parameters(), lr=10.0)
>>> num_steps = 10
>>> num_warmup_steps = 4
>>> num_training_steps = 10
>>> num_cycles = 2
>>> scheduler = configure_cosine_with_hard_restarts_scheduler_with_warmup(optimizer,
num_warmup_steps, num_training_steps, num_cycles)
>>> lr_sch_1 = unwrap_scheduler(scheduler, num_steps)
[0.0, 5.0, 10.0, 8.53, 5.0, 1.46, 10.0, 8.53, 5.0, 1.46]
trainer.scheduler.configure_linear_scheduler_with_warmup(optimizer, num_warmup_steps, num_training_steps, last_epoch=- 1)[source]

Return a scheduler with a learning rate that decreases linearly from the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer to be scheduled.

  • num_warmup_steps (int) – Warmup steps.

  • num_training_steps (int) – Training steps.

  • last_epoch (int) – The last epoch when training is resumed.

Return (LambdaLR):

A linear scheduler with warmup.

Example

>>> from towhee.trainer.scheduler import configure_linear_scheduler_with_warmup
>>> from towhee.trainer.optimization.adamw import AdamW
>>> from torch import nn
>>> def unwrap_scheduler(scheduler, num_steps=10):
>>>     lr_sch = []
>>>     for _ in range(num_steps):
>>>         lr_sch.append(scheduler.get_lr()[0])
>>>         scheduler.step()
>>>     return lr_sch
>>> mdl = nn.Linear(50, 50)
>>> optimizer = AdamW(mdl.parameters(), lr=10.0)
>>> num_steps = 10
>>> num_warmup_steps = 4
>>> num_training_steps = 10
>>> scheduler = configure_constant_scheduler_with_warmup(optimizer, num_warmup_steps, num_training_steps)
>>> lr_sch_1 = unwrap_scheduler(scheduler, num_steps)
[0.0, 2.5, 5.0, 7.5, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0]
trainer.scheduler.configure_polynomial_decay_scheduler_with_warmup(optimizer, num_warmup_steps, num_training_steps, lr_end=1e-07, power=1.0, last_epoch=- 1)[source]

Return a scheduler with a learning rate that decreases as a polynomial decay from the initial lr set in the optimizer to end lr defined by lr_end, after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer.

Parameters
  • optimizer (Optimizer) – The optimizer to be scheduled.

  • num_warmup_steps (int) – The steps for the warmup phase.

  • num_training_steps (int) – The number of training steps

  • lr_end (float) – The end LR.

  • power (float) – Power factor.

  • last_epoch (int) – The index of the last epoch when training is resumed.

Return (LambdaLR):

A polynomial decay scheduler with warmup.

Example

>>> from towhee.trainer.scheduler import configure_polynomial_decay_scheduler_with_warmup
>>> from towhee.trainer.optimization.adamw import AdamW
>>> from torch import nn
>>> def unwrap_scheduler(scheduler, num_steps=10):
>>>     lr_sch = []
>>>     for _ in range(num_steps):
>>>         lr_sch.append(scheduler.get_lr()[0])
>>>         scheduler.step()
>>>     return lr_sch
>>> mdl = nn.Linear(50, 50)
>>> optimizer = AdamW(mdl.parameters(), lr=10.0)
>>> num_steps = 10
>>> num_warmup_steps = 4
>>> num_training_steps = 10
>>> power = 2.0
>>> lr_end = 1e-7
>>> scheduler = configure_polynomial_decay_scheduler_with_warmup(optimizer,
num_warmup_steps, num_training_steps, num_cycles)
>>> lr_sch_1 = unwrap_scheduler(scheduler, num_steps)
[0.0, 5.0, 10.0, 7.656, 5.625, 3.906, 2.5, 1.406, 0.625, 0.156]

trainer.trainer module

trainer.training_config module

class trainer.training_config.TrainingConfig(output_dir: str = './output_dir', overwrite_output_dir: bool = True, eval_strategy: str = 'epoch', eval_steps: typing.Optional[int] = None, batch_size: typing.Optional[int] = 8, val_batch_size: typing.Optional[int] = -1, seed: int = 42, epoch_num: int = 2, dataloader_pin_memory: bool = True, dataloader_drop_last: bool = True, dataloader_num_workers: int = 0, lr: float = 5e-05, metric: typing.Optional[str] = 'Accuracy', print_steps: typing.Optional[int] = None, load_best_model_at_end: typing.Optional[bool] = False, early_stopping: typing.Union[dict, str] = <factory>, model_checkpoint: typing.Union[dict, str] = <factory>, tensorboard: typing.Optional[typing.Union[dict, str]] = <factory>, loss: typing.Union[str, typing.Dict[str, typing.Any]] = 'CrossEntropyLoss', optimizer: typing.Union[str, typing.Dict[str, typing.Any]] = 'Adam', lr_scheduler_type: str = 'linear', warmup_ratio: float = 0.0, warmup_steps: int = 0, device_str: typing.Optional[str] = None, sync_bn: bool = False, freeze_bn: bool = False)[source]

Bases: object

The training config, it can be defined in a yaml file

Parameters
  • output_dir (str) – The output directory where the model predictions and checkpoints will be written.

  • overwrite_output_dir (bool) – Overwrite the content of the output directory.

  • eval_strategy (str) – The evaluation strategy.

  • eval_steps (int) – Run an evaluation every X steps.

  • batch_size (int) – Batch size for training.

  • val_batch_size (int) – Batch size for evaluation.

  • seed (int) – Random seed that will be set at the beginning of training.

  • epoch_num (int) – Total number of training epochs to perform.

  • dataloader_pin_memory (bool) – Drop the last incomplete batch if it is not divisible by the batch size.

  • dataloader_drop_last (bool) – Drop the last incomplete batch if it is not divisible by the batch size.

  • dataloader_num_workers (int) – Number of subprocesses to use for data loading.

  • lr (float) – The initial learning rate for AdamW.

  • metric (str) – The metric to use to compare two different models.

  • print_steps (int) – If None, use the tqdm progress bar, otherwise it will print the logs on the screen every print_steps.

  • load_best_model_at_end (bool) – Whether or not to load the best model found during training at the end of training.

  • early_stopping (Union[dict, str]) – Early stopping.

  • model_checkpoint (Union[dict, str]) – Model checkpoint.

  • tensorboard (Union[dict, str]) – Tensorboard.

  • loss (Union[str, Dict[str, Any]]) – Pytorch loss in torch.nn package.

  • optimizer (Union[str, Dict[str, Any]]) – Pytorch optimizer Class name in torch.optim package.

  • lr_scheduler_type (str) – The scheduler type to use.

  • warmup_ratio (float) – Linear warmup over warmup_ratio fraction of total steps.

  • device_str (str) – Device string.

  • sync_bn (bool) – It will be work if device_str is cuda, the True sync_bn would make training slower but acc better.

  • freeze_bn (bool) – It will completely freeze all BatchNorm layers during training.

load_from_yaml(path2yaml: Optional[str] = None) trainer.training_config.TrainingConfig[source]

Load training configuration from yaml.

Parameters

path2yaml (str) – The path to yaml.

Returns

(TrainingConfig).

TrainingConfig instance self.

Example

>>> from towhee.trainer.training_config import TrainingConfig
>>> from pathlib import Path
>>> conf = Path(__file__).parent / "config.yaml"
>>> ta = TrainingConfig()
>>> ta.save_to_yaml(conf)
>>> ta.load_from_yaml(conf)
>>> ta.epoch_num
2
save_to_yaml(path2yaml: Optional[str] = None)[source]

Save training configuration to yaml.

Parameters

path2yaml (str) – The path to yaml.

Example

>>> from towhee.trainer.training_config import TrainingConfig
>>> from pathlib import Path
>>> conf = Path(__file__).parent / 'config.yaml'
>>> ta = TrainingConfig()
>>> ta.save_to_yaml(conf)
>>> ta.load_from_yaml(conf)
>>> ta.epoch_num
2
trainer.training_config.dump_default_yaml(yaml_path)[source]

Dump a default yaml, which can be overridden by the custom operator.

trainer.training_config.get_config_help()[source]

Get config setting infos. :returns:

(dict)

The help dict.

Module contents