Release 1.1.7 William Falcon Et
Total Page:16
File Type:pdf, Size:1020Kb
PyTorch Lightning Documentation Release 1.1.7 William Falcon et al. Feb 03, 2021 GETTING STARTED 1 Lightning in 2 steps 1 2 How to organize PyTorch into Lightning 15 3 Rapid prototyping templates 19 4 Style guide 21 5 Fast performance tips 27 6 Benchmark with vanilla PyTorch 31 7 LightningModule 33 8 Trainer 79 9 Accelerators 109 10 Callback 121 11 LightningDataModule 141 12 Logging 149 13 Metrics 171 14 Plugins 215 15 Step-by-step walk-through 219 16 API References 247 17 Bolts 337 18 Pytorch Ecosystem Examples 339 19 Community Examples 341 20 AWS/GCP training 343 21 16-bit training 345 22 Computing cluster (SLURM) 347 i 23 Child Modules 351 24 Debugging 353 25 Loggers 357 26 Early stopping 361 27 Fast Training 363 28 Hyperparameters 365 29 Learning Rate Finder 371 30 Multi-GPU training 375 31 Multiple Datasets 389 32 Saving and loading weights 391 33 Optimization 397 34 Performance and Bottleneck Profiler 403 35 Single GPU Training 409 36 Sequential Data 411 37 Training Tricks 413 38 Transfer Learning 417 39 TPU support 421 40 Test set 425 41 Inference in Production 429 42 Conversational AI 431 43 Contributor Covenant Code of Conduct 445 44 Contributing 447 45 How to become a core contributor 457 46 PyTorch Lightning Governance | Persons of interest 459 47 Changelog 461 48 Indices and tables 505 Python Module Index 507 Index 509 ii CHAPTER ONE LIGHTNING IN 2 STEPS In this guide we’ll show you how to organize your PyTorch code into Lightning in 2 steps. Organizing your code with PyTorch Lightning makes your code: • Keep all the flexibility (this is all pure PyTorch), but removes a ton of boilerplate • More readable by decoupling the research code from the engineering • Easier to reproduce • Less error-prone by automating most of the training loop and tricky engineering • Scalable to any hardware without changing your model Here’s a 3 minute conversion guide for PyTorch projects: 1.1 Step 0: Install PyTorch Lightning You can install using pip pip install pytorch-lightning Or with conda (see how to install conda here): conda install pytorch-lightning -c conda-forge You could also use conda environments conda activate my_env pip install pytorch-lightning Import the following: import os import torch from torch import nn import torch.nn.functional asF from torchvision import transforms from torchvision.datasets import MNIST from torch.utils.data import DataLoader, random_split import pytorch_lightning as pl 1 PyTorch Lightning Documentation, Release 1.1.7 1.2 Step 1: Define LightningModule class LitAutoEncoder(pl.LightningModule): def __init__(self): super().__init__() self.encoder= nn.Sequential( nn.Linear(28*28, 64), nn.ReLU(), nn.Linear(64,3) ) self.decoder= nn.Sequential( nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28*28) ) def forward(self, x): # in lightning, forward defines the prediction/inference actions embedding= self.encoder(x) return embedding def training_step(self, batch, batch_idx): # training_step defined the train loop. # It is independent of forward x, y= batch x=x.view(x.size(0),-1) z= self.encoder(x) x_hat= self.decoder(z) loss=F.mse_loss(x_hat, x) # Logging to TensorBoard by default self.log('train_loss', loss) return loss def configure_optimizers(self): optimizer= torch.optim.Adam(self.parameters(), lr=1e-3) return optimizer SYSTEM VS MODEL A LightningModule defines a system not a model. Examples of systems are: • Autoencoder • BERT • DQN • GAN • Image classifier • Seq2seq • SimCLR • VAE Under the hood a LightningModule is still just a torch.nn.Module that groups all research code into a single file to make it self-contained: 2 Chapter 1. Lightning in 2 steps PyTorch Lightning Documentation, Release 1.1.7 1.2. Step 1: Define LightningModule 3 PyTorch Lightning Documentation, Release 1.1.7 • The Train loop • The Validation loop • The Test loop • The Model or system of Models • The Optimizer You can customize any part of training (such as the backward pass) by overriding any of the 20+ hooks found in Available Callback hooks class LitAutoEncoder(pl.LightningModule): def backward(self, loss, optimizer, optimizer_idx): loss.backward() FORWARD vs TRAINING_STEP In Lightning we separate training from inference. The training_step defines the full training loop. We encourage users to use the forward to define inference actions. For example, in this case we could define the autoencoder to act as an embedding extractor: def forward(self, x): embeddings= self.encoder(x) return embeddings Of course, nothing is stopping you from using forward from within the training_step. def training_step(self, batch, batch_idx): ... z= self(x) It really comes down to your application. We do, however, recommend that you keep both intents separate. • Use forward for inference (predicting). • Use training_step for training. More details in LightningModule docs. 1.3 Step 2: Fit with Lightning Trainer First, define the data however you want. Lightning just needs a DataLoader for the train/val/test splits. dataset= MNIST(os.getcwd(), download= True, transform=transforms.ToTensor()) train_loader= DataLoader(dataset) Next, init the LightningModule and the PyTorch Lightning Trainer, then call fit with both the data and model. # init model autoencoder= LitAutoEncoder() # most basic trainer, uses good defaults (auto-tensorboard, checkpoints, logs, and ,!more) (continues on next page) 4 Chapter 1. Lightning in 2 steps PyTorch Lightning Documentation, Release 1.1.7 (continued from previous page) # trainer = pl.Trainer(gpus=8) (if you have GPUs) trainer= pl.Trainer() trainer.fit(autoencoder, train_loader) The Trainer automates: • Epoch and batch iteration • Calling of optimizer.step(), backward, zero_grad() • Calling of .eval(), enabling/disabling grads • Saving and loading weights • Tensorboard (see Loggers options) • Multi-GPU training support • TPU support • 16-bit training support Tip: If you prefer to manually manage optimizers you can use the Manual optimization mode (ie: RL, GANs, etc. ). That’s it! These are the main 2 concepts you need to know in Lightning. All the other features of lightning are either features of the Trainer or LightningModule. 1.4 Basic features 1.4.1 Manual vs automatic optimization Automatic optimization With Lightning, you don’t need to worry about when to enable/disable grads, do a backward pass, or update optimizers as long as you return a loss with an attached graph from the training_step, Lightning will automate the optimization. def training_step(self, batch, batch_idx): loss= self.encoder(batch[0]) return loss 1.4. Basic features 5 PyTorch Lightning Documentation, Release 1.1.7 Manual optimization However, for certain research like GANs, reinforcement learning, or something with multiple optimizers or an inner loop, you can turn off automatic optimization and fully control the training loop yourself. First, turn off automatic optimization: trainer= Trainer(automatic_optimization= False) Now you own the train loop! def training_step(self, batch, batch_idx, optimizer_idx): # access your optimizers with use_pl_optimizer=False. Default is True (opt_a, opt_b, opt_c)= self.optimizers(use_pl_optimizer= True) loss_a= self.generator(batch[0]) # use this instead of loss.backward so we can automate half precision, etc... self.manual_backward(loss_a, opt_a, retain_graph=True) self.manual_backward(loss_a, opt_a) opt_a.step() opt_a.zero_grad() loss_b= self.discriminator(batch[0]) self.manual_backward(loss_b, opt_b) ... 1.4.2 Predict or Deploy When you’re done training, you have 3 options to use your LightningModule for predictions. Option 1: Sub-models Pull out any model inside your system for predictions. # ---------------------------------- # to use as embedding extractor # ---------------------------------- autoencoder= LitAutoEncoder.load_from_checkpoint('path/to/checkpoint_file.ckpt') encoder_model= autoencoder.encoder encoder_model.eval() # ---------------------------------- # to use as image generator # ---------------------------------- decoder_model= autoencoder.decoder decoder_model.eval() 6 Chapter 1. Lightning in 2 steps PyTorch Lightning Documentation, Release 1.1.7 Option 2: Forward You can also add a forward method to do predictions however you want. # ---------------------------------- # using the AE to extract embeddings # ---------------------------------- class LitAutoEncoder(pl.LightningModule): def forward(self, x): embedding= self.encoder(x) return embedding autoencoder= LitAutoencoder() autoencoder= autoencoder(torch.rand(1, 28 * 28)) # ---------------------------------- # or using the AE to generate images # ---------------------------------- class LitAutoEncoder(pl.LightningModule): def forward(self): z= torch.rand(1,3) image= self.decoder(z) image= image.view(1,1, 28, 28) return image autoencoder= LitAutoencoder() image_sample= autoencoder() Option 3: Production For production systems, onnx or torchscript are much faster. Make sure you have added a forward method or trace only the sub-models you need. # ---------------------------------- # torchscript # ---------------------------------- autoencoder= LitAutoEncoder() torch.jit.save(autoencoder.to_torchscript(),"model.pt") os.path.isfile("model.pt") # ---------------------------------- # onnx # ---------------------------------- with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as tmpfile: autoencoder= LitAutoEncoder() input_sample= torch.randn((1, 28 * 28)) autoencoder.to_onnx(tmpfile.name, input_sample, export_params=True) os.path.isfile(tmpfile.name) 1.4. Basic features 7 PyTorch Lightning Documentation, Release 1.1.7 1.4.3 Using CPUs/GPUs/TPUs