Loss Functions

Loss functions Loss functions How good is your neural network? Loss functions How good is your neural network? Inputs e.g. ● Image (pixel values) Inputs ● Sentence (encoded) Outputs ● State (for RL) Loss Targets Targets e.g. ● ‘Cat’/’Dog’ (encoded) ● Next word (encoded) ● TD(0) (for RL) Loss functions Some pedantry Loss function and Cost function are often used interchangeably but they are different: ● The loss function measures the network performance on a single datapoint ● The cost function is the average of the losses over the entire training dataset Our goal is to minimize the cost function. In reality, we actually use batch losses as a proxy for the cost function. Loss/Cost functions Some stuff to remember ● The cost function distils the performance of the network down to a single scalar number. ● Generally (although not necessarily) loss ≥ 0 (so cost ≥ 0 also) ● cost=0 ⇒ perfect performance (on the given dataset) ● The lower the value of the cost function, the better the performance of the network. ● Hence gradient descent ● A cost function must faithfully represent the purpose of the network. Loss/Cost functions Some stuff to remember ● The cost function distils the performance of the network down to a single scalar number. ● Generally (although not necessarily) loss ≥ 0 (so cost ≥ 0 also) ● cost=0 ⇒ perfect performance (on the given dataset) ● The lower the value of the cost function, the better the performance of the network. ● Hence gradient descent ● A cost function must faithfully represent the purpose of the network. cat 0.85 Accuracy: 1 0.15 Loss (CCE): 0.07 dog Loss/Cost functions Loss functions ● Classification ● Maximum likelihood ● Binary cross-entropy (aka log loss) ● Categorical cross-entropy ● Regression (i.e. function approximation) ● Mean Squared Error ● Mean Absolute Error ● Huber Loss Classification loss functions Loss functions ● Classification ● Maximum likelihood ● Binary cross-entropy (aka log loss) ● Categorical cross-entropy Maximum likelihood Maximum Target Input Classification loss (1, 1, 0, 0, 0) Cat - Dog - Bear - Hitler - Putin - functions (.5, .9, .2, .6, .1) - Cat - Dog - Bear - Hitler - Putin Maximum likelihood Maximum Target Input Classification loss (1, 1, 0, 0, 0) Cat - Dog - Bear - Hitler - Putin - Loss = Loss = = –0.7 = – 1 / 5 [ (1*0.5 + 0*0.5) + (1*0.9 + 0*0.1) + (0*0.2 + 1*0.8) + (0*0.6 + 1*0.4) + (0*0.1 + 1*0.9) ] functions (.5, .9, .2, .6, .1) - Cat - Dog - Bear - Hitler - Putin Binary Binary cross-entropy (aka Log loss) Target Input Classification loss (0, 0, 1, 0, 1) Cat - Dog - Bear - Hitler - Putin - Loss = Loss functions T (.2, .4, .8, .3, .9) o r - Cat c h . n - Dog n . B - Bear C E L - Hitler o s - Putin s Binary Binary cross-entropy (aka Log loss) Target Input Classification loss (0, 0, 1, 0, 1) Cat - Dog - Bear - Hitler - Putin - Loss = Loss = = 0.28 = – 1 / 5 [ (1*log(0.8)) + (1*log(0.6)) + (1*log(0.8)) + (1*log(0.7)) + (1*log(0.9)) ] functions T (.2, .4, .8, .3, .9) o r - Cat c h . n - Dog n . B - Bear C E L - Hitler o s - Putin s Binary Binary cross-entropy (aka Log loss) Target Input Classification loss (0, 0, 1, 0, 1) Cat - Dog - Bear - Hitler - Putin - Loss = Loss = = 0.28 = – 1 / 5 [ (1*log(0.8)) + (1*log(0.6)) + (1*log(0.8)) + (1*log(0.7)) + (1*log(0.9)) ] functions T (.2, .4, .8, .3, .9) o r - Cat c h . n - Dog n . B - Bear C E L - Hitler o s - Putin s y y = y = log(p) y = p p Categorical cross-entropy Target Input Classification loss (1, 0, 0, 0, 0) Cat - Dog - Bear - Hitler - Putin - Loss = Loss = = 0.22 = – [ 1*log(0.8) ] T o r c h . n n . N L L L o s s functions (.8, 0, 0, .2, 0) T o r - Cat c h . - Dog n n . - Bear C r o s - Hitler s E n - Putin t r o p y L o s distribution probability a be Must s softmax for you for softmax the does one This Categorical cross-entropy Target Input Classification loss (1, 0, 0, 0, 0) Cat - Dog - Bear - Hitler - Putin - Loss = Loss = = 0.22 = – [ 1*log(0.8) ] T o r c h . n n . N L L L o s s functions (.8, 0, 0, .2, 0) T o r - Cat c h . - Dog n n . - Bear C r o s - Hitler s E n - Putin t r o p y L o s also be used when you you when used be also distribution probability a be Must This loss function can can function loss This s have a single binary binary single a have (e.g. yes/no) output yes/no) (e.g. softmax for you for softmax the does one This Regression loss functions Loss functions ● Regression (i.e. function approximation) ● Mean Squared Error ● Mean Absolute Error ● Huber Loss aka L2 loss Regression loss Mean squared error functions Input Torch.nn.MSELoss Target Loss aka L2 loss Regression loss functions Loss Mean squared error n.MSE Torch.n Input Q(St,At) t t state S & action A Target reward Rt+1 Q(St+1,At+1) t+1 t+1 state S & [best] action A aka L2 loss Regression loss functions Loss Mean squared error n.MSE Torch.n Loss Q(St,At) Input Q(St,At) t t state S & action A Target reward Rt+1 Q(St+1,At+1) t+1 t+1 state S & [best] action A aka L1 loss Regression loss Absolute error functions Torch.nn.L1Loss Input Target Loss Huber loss combines the best features of MSE and AE: - like MSE for small errors. Regression loss - like AE otherwise. This avoids over-shooting: Loss oothL1 - MSE has a very steep Huber loss .nn.Sm Torch gradient when the error is large (i.e. for outliers) because of its quadratic form. Input - AE has a steep gradient when the error is small because it doesn’t have a ‘flat’ bottom. loss Target Loss error.

Loss Functions

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support