On Github Since Jan 2016 Under MIT License • Python Support Since Oct 2016 (Beta), Rebranded As “Cognitive Toolkit” • External Contributions E.G

Deep learning at Microsoft • Microsoft Cognitive Services • Skype Translator • Cortana • Bing • HoloLens • Microsoft Research Microsoft Cognitive Toolkit Microsoft Cognitive Services Microsoft Cognitive Toolkit Microsoft Cognitive Toolkit ImageNet: Microsoft 2015 ResNet 28.2 ImageNet Classification top-5 error (%) 25.8 16.4 11.7 7.3 6.7 3.5 ILSVRC ILSVRC ILSVRC ILSVRC ILSVRC ILSVRC ILSVRC 2010 NEC 2011 Xerox 2012 2013 Clarifi 2014 VGG 2014 2015 ResNet America AlexNet GoogleNet Microsoft had all 5 entries being the 1-st places this year: ImageNet classification, ImageNet localization, ImageNet detection, COCO detection, and COCO segmentation Microsoft Cognitive Toolkit Microsoft Cognitive Toolkit Image Similarity Goal: given query image, find similar images. • Customer: Anonymous ISV (Azure Partner) • Task: given a retail image, find same product on competitor websites (to compare price). • Existing solution: solely based on mining text information from the websites of Target, Macy, etc. • Customer asked for individual similarity measure (e.g. texture, neck style, etc). Microsoft Cognitive Toolkit Bing / Bing Ads Microsoft Cognitive Toolkit Microsoft Translator http://translate.it Power point-plug in for translating speech to subtitles Microsoft Cognitive Toolkit Microsoft’s historic speech breakthrough • Microsoft 2016 research system for conversational speech recognition • 5.9% word-error rate • enabled by CNTK’s multi-server scalability [W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig: “Achieving Human Parity in Conversational Speech Recognition,” https://arxiv.org/abs/1610.05256] Microsoft Cognitive Toolkit Microsoft Customer Support Agent Microsoft Cognitive Toolkit (CNTK) • Microsoft’s open-source deep-learning toolkit • https://github.com/Microsoft/CNTK • Created by Microsoft Speech researchers (Dong Yu et al.) in 2012, “Computational Network Toolkit” • On GitHub since Jan 2016 under MIT license • Python support since Oct 2016 (beta), rebranded as “Cognitive Toolkit” • External contributions e.g. from MIT, Stanford and NVidia Microsoft Cognitive Toolkit Microsoft Cognitive Toolkit (CNTK) • Over 80% Microsoft internal DL workload runs CNTK • 1st-class on Linux and Windows, docker support • Python, C++, C#, Java • Internal == External Microsoft Cognitive Toolkit CTNK – The Fastest Toolkit Caffe: 1.0rc5(39f28e4) http://dlbench.comp.hkbu.edu.hk/ CNTK: 2.0 Beta10(1ae666d) Benchmarking by HKBU, Version 8 MXNet: 0.93(32dc3a2) Single Tesla K80 GPU, CUDA: 8.0 CUDNN: v5.1 TensorFlow: 1.0(4ac9c09) Torch: 7(748f5e3) Caffe CNTK MxNet TensorFlow Torch FCN5 (1024) 55.329ms 51.038ms 60.448ms 62.044ms 52.154ms AlexNet (256) 36.815ms 27.215ms 28.994ms 103.960ms 37.462ms ResNet (32) 143.987ms 81.470ms 84.545ms 181.404ms 90.935ms LSTM (256) - 43.581ms 288.142ms - 1130.606ms (v7 benchmark) (44.917ms) (284.898ms) (223.547ms) (906.958ms) Microsoft Cognitive Toolkit “CNTK is production-ready: State-of-the-art accuracy, efficient, and scales to multi-GPU/multi-server.” speed comparison (samples/second), higher = better [note: December 2015] 80000 70000 60000 Achieved with 1-bit gradient quantization algorithm 50000 40000 30000 20000 Theano only supports 1 GPU 10000 0 CNTK Theano TensorFlow Torch 7 Caffe 1 GPU 1 x 4 GPUs 2 x 4 GPUs (8 GPUs) Microsoft Cognitive Toolkit Superior performance Scalability What is new in CNTK 2.0? Microsoft has now released a major upgrade of the software and rebranded it as part of the Microsoft Cognitive Toolkit. This release is a major improvement over the initial release. There are two major changes from the first release that you will see when you begin to look at the new release. First is that CNTK now has a very nice Python API and, second, the documentation and examples are excellent. Installing the software from the binary builds is very easy on both Ubuntu Linux and Windows. https://esciencegroup.com/2016/11/10/cntk-revisited-a-new-deep-learning-toolkit-release-from-microsoft/ CNTK Other Advantages • Python and C++ API • Mostly implemented in C++ • Low level + high level Python API • Extensibility • User functions and learners in pure Python • Readers • Distributed, highly efficient built-in data readers • Internal == External Microsoft Cognitive Toolkit The Microsoft Cognitive Toolkit (CNTK) • CNTK expresses (nearly) arbitrary neural networks by composing simple building blocks into complex computational networks, supporting relevant network types and applications. • CNTK is production-ready: State-of-the-art accuracy, efficient, and scales to multi-GPU/multi-server. Microsoft Cognitive Toolkit MNIST Handwritten Digits (OCR) 1 5 4 3 Handwritten Corresponding Digits 5 3 5 3 Labels 5 9 0 6 • Data set of hand written digits with 60,000 training images 10,000 test images • Each image is: 28 x 28 pixels Multi-layer perceptron https://github.com/Microsoft/CNTK/tree/master/Tutorials Deep 784 pixels (x) Model . Weights i = 784 O= 400 400 + 400 bias 28 pix 28 D a = relu 784 i = 400 D O= 200 200 + 200 bias a = relu 400 28 pix 10 nodes i = 200 D O= 10 10 + 10 bias a = None 200 z0 z1 z2 z3 z4 z5 z6 z7 z8 z9 푒푧i softmax 0.08 0.08 0.10 0.17 0.11 0.09 0.08 0.08 0.13 0.01 9 푧j σ푗=0 푒 Loss function Label One-hot encoded (Y) 1 5 4 3 0 0 0 0 5 3 5 3 0 0 0 1 0 0 5 9 0 6 28 x 28 pix (p) . Loss = − σ9 푦 푙표푔 푝 Cross entropy 28 pix 28 ce function 푗=0 푗 푗 error 28 pix Predicted Probabilities (p) Model 0.08 0.08 0.10 0.17 0.11 0.09 0.08 0.08 0.13 0.01 (w, b) CNTK Model Example: 2-hidden layer feed-forward NN h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1) h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2) P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout) with input x RM and one-hot label y RJ and cross-entropy training criterion ce = yT log P ce = cross_entropy (L, P) Microsoft Cognitive Toolkit CNTK Model example: 2-hidden layer feed-forward NN h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1) h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2) P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout) with input x RM and one-hot label y RJ and cross-entropy training criterion ce = yT log P ce = cross_entropy (P, y) Microsoft Cognitive Toolkit CNTK Model ce cross_entropy P softmax bout + h1 = sigmoid (x @ W1 + b1) h2 = sigmoid (h1 @ W2 + b2) Wout • h 2 P = softmax (h2 @ Wout + bout) s ce = cross_entropy (P, y) b2 + W2 • h1 s b1 + W1 • x y Microsoft Cognitive Toolkit CNTK Model ce cross_entropy • Nodes: functions (primitives) P • Can be composed into reusable composites softmax • Edges: values bout + • Incl. tensors, sparse Wout • h 2 • Automatic differentiation s • ∂F / ∂in = ∂F / ∂out ∙ ∂out / ∂in b2 + • Deferred computation execution engine W2 • h 1 • Editable, clonable s b1 + W 1 • LEGO-like composability allows CNTK to support x y wide range of networks & applications Microsoft Cognitive Toolkit Authoring networks as functions ce • “model function” cross_entropy • features predictions P softmax • defines the model structure & parameter initialization bout + • holds parameters that will be learned by training Wout • h2 s • “criterion function” b2 + • (features, labels) (training loss, additional metrics) W2 • h1 • defines training and evaluation criteria on top of the model function s • provides gradients w.r.t. training criteria b1 + W1 • x y Microsoft Cognitive Toolkit Authoring networks as functions • CNTK model: neural networks are functions • pure functions • with “special powers”: • can compute a gradient w.r.t. any of its nodes • external deity can update model parameters • user specifies network as function objects: • formula as a Python function (low level, e.g. LSTM) • function composition of smaller sub-networks (layering) • higher-order functions (equiv. of scan, fold, unfold) • model parameters held by function objects • “compiled” into the static execution graph under the hood Microsoft Cognitive Toolkit Layers lib: full list of layers/blocks • layers/blocks.py: • LSTM(), GRU(), RNNUnit() • Stabilizer(), identity • ForwardDeclaration(), Tensor[], SparseTensor[], Sequence[], SequenceOver[] • layers/layers.py: • Dense(), Embedding() • Convolution(), Convolution1D(), Convolution2D(), Convolution3D(), Deconvolution() • MaxPooling(), AveragePooling(), GlobalMaxPooling(), GlobalAveragePooling(), MaxUnpooling() • BatchNormalization(), LayerNormalization() • Dropout(), Activation() • Label() • layers/higher_order_layers.py: • Sequential(), For(), operator >>, (function tuples) • ResNetBlock(), SequentialClique() • layers/sequence.py: • Delay(), PastValueWindow() • Recurrence(), RecurrenceFrom(), Fold(), UnfoldFrom() • models/models.py: • AttentionModel() Microsoft Cognitive Toolkit CNTK workflow Script configure and executes through CNTK Python APIs… reader network trainer • minibatch source • model function • SGD • task-specific • criterion function (momentum, deserializer • CPU/GPU Adam, …) corpus • automatic execution engine • minibatching model randomization • packing, padding • distributed reading Microsoft Cognitive Toolkit As easy as 1-2-3 from cntk import * # reader def create_reader(path, is_training): ... # network def create_model_function(): ... def create_criterion_function(model): ... # trainer (and evaluator) def train(reader, model): ... def evaluate(reader, model): ... # main function model = create_model_function() reader = create_reader(..., is_training=True) train(reader, model) reader = create_reader(..., is_training=False) evaluate(reader, model) Microsoft Cognitive Toolkit Workflow • prepare data • configure reader, network, learner (Python) • train: mpiexec --np

Load more