PAWEL CISLO M S C D A T A S C I E N T I S T

Data Science Resources

P A W E L C I S L O . C O M FOREWORD

First of all, I would like to extremely thank you for deciding to be one of my newsletter subscribers.

I will try to be quick right there, as this e-book will consist of a lot of resources stored alphabetically (tutorials, links, tools and datasets) that I definitely recommend to check out in your free time.

All of the links you are about to see have been collected on my long internet journey (since 2016). Without a doubt, there are tons of other sites to visit; however, I considered these to be the one that stood out to me (I wish to knew them when I was starting).

As I browse the web, I might add more links to this e-book. You can find the most up to date book on the book address given in your e-mail.

Enjoy!

P A W E L C I S L O . C O M TUTORIALS

Side Note: If you are looking for any other tutorials, I recommend to use the browser of all the online courses (courseroot.com); however, you will usually be redirected to Udemy, Coursera and edX. Moreover, on my website, I will try to post my own notes from the courses that I have completed.

• 39 Machine Learning Resources that will help you in every essential step • A Data Science Framework: To Achieve 99% Accuracy <--- very good tutorial for beginners in Python • Another Book on Data Science <--- learn R & Python in parallel • A short & practical HOW-TO guide to scrape data from a website using Python • Awesome Learn Datascience <--- list of tutorials & resources for beginners • AWS Machine Learning Course <--- free (30 lessons, 45 hours) • D3 Graph Theory <--- learn graph theory visually • Data Science Essentials (edX) <--- one of the courses I have finished • Deep Learning for Self-Driving Cars (MIT 6.S094) • Deep Learning Ocean <--- kick-starter into deep learning • Deep Learning World (repo) <--- resources for Deep Learning Researchers and Developers • Easy-TensorFlow <--- comprehensive tutorials • Guide to deep learning • Immersive Linear Algebra <--- world's first linear algebra book with fully interactive figures • Learn <--- set of courses to go through • Lecture Collection | Convolutional Neural Networks for Visual Recognition (Spring 2017) <--- very good - YouTube based, recommended by "About Data - Krzysztof Sopyła" • Learn Data Science • Learn Machine Learning in 3 Months <--- list of resources by Siraj Raval

P A W E L C I S L O . C O M • Machine Learning & Deep Learning Tutorials • Machine Learning Crash Course <--- by Google for free. After continue with paid TensorFlow course • Machine Learning Course by Andrew Ng <--- mostly recommended course by Stanford University, which I finished as well • Machine Learning Course with Python (repo) • Machine Learning Crash Course (HN) <--- with TensorFlow APIs (by Google) • Machine Learning for Everyone <--- explained in simple words • Machine Learning Guides <--- by Google • Machine Learning with Python <--- small scale machine learning projects • March 2019 Machine Learning Study Path <--- complete ML study path • Microsoft Professional Program for Big Data • MIT Deep Learning <--- MIT Deep Learning related courses • mlcourse.ai <--- open Machine Learning course, both in English and Russian • MVA - Introduction to Data Science • Notes from Coursera Deep Learning courses by Andrew • Pandas Cookbook <--- online e-book (534 pages) • PracticalAI <--- practical approach to learning machine learning • Principles and Techniques of Data Science <--- online book • Production Data Science (Reddit) <--- bridge the gap between exploration in data science and productionisation in software development • Project Based Learning <--- curated list of project-based tutorials • Python Data Science Handbook • PyTorch Tutorial • Quantee Tutorial <--- Data Science tutorial by Dawid Kopczyk • R and Python <--- how to integrate both into your workflow • Seeing Theory <--- visualization of data • Spinning Up <--- learn deep reinforcement learning • Stawiamy własny serwer <--- do programowania w R i Pythonie • TensorFlow Course <--- Simple and ready-to-use tutorials for TensorFlow • The most comprehensive Data Science learning plan for 2017 • The neural network zoo <--- explanation of all the neural architectures • The Open Source Data Science Masters

P A W E L C I S L O . C O M LINKS

Side Note: Here you can find links to interesting articles and things that did not fit into any other category.

• 5 Career Paths in Big Data and Data Science, Explained • 7 myths in machine learning research • 8 ways to perform simple linear regression and measure their speed using Python • 10+2 Data Science Methods that Every Data Scientist Should Know in 2016 • 10 must-know algorithms and data structures for a software engineer • 11 most read Deep Learning Articles from Analytics Vidhya in 2017 • 12 essential command line tools for data scientists • 16 Useful Advices for Aspiring Data Scientists • 20 Big Data Repositories You Should Check Out • 21 Must-Know Data Science Interview Questions and Answers • 100 Days of ML Coding <--- great respository with lots of ML terms • A Concise Handbook of TensorFlow <--- well-explanatory PDF • AI & Architecture <--- use of AI to generate floor designs and their styles • Algorithm_Interview_Notes <--- in Chinese • All the best big data tools and how to use them • Amazon Mechanical Turk <--- access a global, on-demand, 24x7 workforce • Artificial Neural Networks Explained • As a data scientist, what tips would you have for a younger version of yourself? • Awesome (better graphical form) ○ Awesome AI ○ Awesome Big Data ○ Awesome Computer Vision ○ Awesome Data Science ○ Awesome Data Science Interview Questions ○ Awesome Deep Learning ○ Awesome Deep Vision

P A W E L C I S L O . C O M ○ Awesome Information Retrieval ○ Awesome Machine Learning ○ Awesome Pytorch List ○ Awesome Speech and NLP • Awful AI <--- curated list to track current scary usages of AI • Big Data Landscape 2017 (Source) • -182 Song Similarity <--- style progression of a band over time • Bringing the best out of Jupyter Notebooks for Data Science • Building a Deep Neural Net In • Build Handwriting Recognizer & Ship It To App Store • Cheat Sheets ○ Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning ○ Cheat Sheets for Machine Learning and Deep Learning Engineers ○ Cheat Sheets - Data Science Resources <--- R/Python/Numpy/Pandas ○ DataCamp Cheat Sheets ○ Data Science Cheatsheets <--- list of cheatsheets on GitHub ○ Data-Science--Cheat-Sheet ○ Deep Learning cheatsheets for Stanford's CS 230 ○ Machine Learning cheatsheets for Stanford's CS 229 <--- repo ○ Most general cheat sheet • Checklist for debugging neural networks • Choosing the right metric for evaluating Machine Learning models • Classification datasets results (MNIST, CIFAR, STL-10, SVHN, ILSVRC2012) • Convolutional Neural Networks <--- very good explanation • Cybercrimes Investigation and Intrusion Detection in Internet of Thing • Data Science Blogs <--- list of all blogs • Data Science Interview Guide • Data Science map • DataScienceWeekly <--- weekly newsletter • Deep Learning 500 questions <--- with anserws (Chinese) • Deep Learning Papers Reading Roadmap • DeepForge <--- modern development environment for deep learning • DeepLearn <--- implementation of research papers on Deep Learning

P A W E L C I S L O . C O M • deep learning object detection <--- paper list • Deep Neural Network implemented in pure SQL over BigQuery • Descriptive Statistics • Docker for Data Science • Encoding data in dubstep songs (HN) • End-to-End Deep Learning for Self-Driving Cars • Essentials of Machine Learning Algorithms <--- (with Python & R codes) • Feature Selection algorithms <--- in Python • Generate Quick and Accurate Time Series Forecasts using Facebook’s Prophet • Genetic algorithm in machine learning • Getting Spark, Python, and Jupyter Notebook running on Amazon EC2 • Global Heatmap • Google AI Research <--- repository with code released by Google AI Research • Homemade Machine Learning <--- Python examples of popular ML algorithms • How Docker Can Help You Become A More Effective Data Scientist • How far can you travel in one hour by car? • How Much Hotter Is Your Hometown Than When You Were Born? • How to Build an End-to-End Conversational AI System using Behavior Trees • How to build your own AlphaZero AI using Python and Keras • How to Deploy Machine Learning Models: The Ultimate Guide • How to easily Detect Objects with Deep Learning on Raspberry Pi (HN) • How to extract data from MS Word Documents using Python • How to Develop a Word-Level Neural Language Model and Use it to Generate Text • How to recognize fake AI-generated images • How to setup a data science blog • How to solve 90% of NLP problems <--- step-by-step guide • How to train Keras model x20 times faster with TPU for free • How you can train an AI to convert your design mockups into HTML and CSS (HN) • Illustrated Word2vec • Inside - AI <--- subscribe to daily news about AI • Interactive Machine Learning List (Repository) <--- I have also contributed ;) • Introduction to Matplotlib <--- good summary of my Udemy notes • Introduction to NumPy and Pandas <--- simple tutorial • Japanese scientists just used AI to read minds and it's amazing • Kaggle <--- home of data science & machine learning • Kaggle’s State of Data Science & Machine Learning Report, 2017 (conclusions) • Kto wygra finał mistrzostw świata w piłce nożnej 2018? <--- good tutorial of cleaning and merging two datasets by Mateusz Grzyb

P A W E L C I S L O . C O M • Learn Linear Algebra • Machine Learning 101 - Jason Mayes (Google) <--- presentation of ~ 100 slides • Machine learning basics <--- GitHub repo • Machine Learning Flashcards <--- bought it for $12! • Machine Learning for Beginners: An Introduction to Neural Networks (HN) • Machine Learning from scratch <--- bare bones Python implementations • Machine Learning Project on Imbalanced Data Can Add Value to Your Resume • Machine Learning with Matlab <--- e-book • math-as-code <--- cheat-sheet for mathematical notation in code form • Migracje Polaków w 2016 roku (część pierwsza) • Mistakes, we’ve drawn a few <--- learn how to make better visualisations • MIT Autonomous Vehicle Technology Study • ModelDepot <--- list of free pretrained models and tutorials • My curated list of AI and ML from around the web <--- pages to follow • Notes on Machine Learning & Artificial Intelligence <--- by Chris Albon • NVIDIA Drive Networks <--- deep neural network (DNN) solutions • Pandas for SQL Users • Pantheon - Visualizations <--- lots of cool data visualizations • Pokemon or Big Data? <--- funny game :) • Practical Machine Learning with R and Python • Probability and Statistics Cookbook • Querying Wikidata with Python and SPARQL • Random Forests for Complete Beginners • Reading List for Data Scientists • Recipe for Training Neural Networks • Sentiment Analysis using LSTM (Step-by-Step Tutorial) <--- using PyTorch • Simple audio classification with Keras <--- using R • Spotify: Analyzing and Predicting Songs • Spotify: Creating a Music Playlist using ML • Stock Market Predictions with LSTM in Python • Stock Prices Prediction Using Machine Learning and Deep Learning Techniques • Stop Installing Tensorflow using pip for performance sake! • Sound Classification using Deep Learning • The AI Programmer's Bookshelf <--- list of useful books • The best data science videos • The best JavaScript chart libraries for 2019 • The key to building a data science portfolio that will get you a job • Top 5 Best Jupyter Notebook Extensions

P A W E L C I S L O . C O M • Top 10 Coding Mistakes Made by Data Scientists • Twitter sentiment analysis with Python (CNN + Word2Vec) • Understanding Learning Rates and How It Improves Performance • Understanding LSTM and its diagrams • Visual introduction to machine learning • Weapons of Micro Destruction: How Our ‘Likes’ Hijacked Democracy • What to consider when choosing colors for data visualization • You probably don't need AI/ML. You can do with well written SQL scripts

P A W E L C I S L O . C O M TOOLS

Side Note: Treat it as your arsenal to deal with any kind of problems you approach. Majority of these tools utilises Python and some R language.

• AdaBound <--- optimizer that trains as fast as Adam and as good as SGD • adanet <--- lightweight and scalable TensorFlow AutoML framework • AI Writer <--- write article based on queried word • Amazon Scraper <--- get some info about products sold on Amazon • ANN visualizer <--- Python library for visualizing ANNs • Anomalize <--- detect anomalies in data (in R) • Apache Spark <--- fast and general engine for large-scale data processing • aspider <--- sync web scraping micro-framework based on asyncio • Augmentor <--- image augmentation library • Auto-Keras <--- automated ML • BIRADS_classifier <--- high-resolution breast cancer screening • Caffe <--- deep learning framework made with expression, speed... • Caffe Demo <--- classify an image in the browser • Celeb Match <--- which celebrity face match your values • Chartify <--- create charts easily • Chatistics <--- convert Messenger and Hangouts chat logs into Data Frames • CleverHans <--- benchmark ML systems vulnerability to adversarial examples • Cloud Datalab <--- interactive tool for data exploration • ColorBrewer <--- color advice for cartography • csvkit <--- suite of utilities for converting to and working with CSV • D3 <--- JavaScript library for manipulating documents (example usage) • Darts <--- differentiable architecture search • Databolt Flow <--- building complex data science workflows easy • Datashader <--- turn even the largest data into images, accurately • DeepCreamPy <--- decensoring Hentai with Deep Neural Networks :) • Deepfakes <--- generate your own Deepfakes (SAAS) • Deep Graph Library (DGL) <--- interface between existing tensor libraries • DeepMind Lab (Siraj's example) <--- platform for agent-based AI research • Deep Recurrent Nets character generation <--- generate some text • DeepTraffic <--- online self-driving car simulator • DensePose <--- real-time approach for mapping all human pixels • DeOldify <--- colorizine and restore old images

P A W E L C I S L O . C O M • Describe <--- transcribe audio to text • Detectron <--- image recognition from Facebook • Distriller <--- neural network distiller by Intel AI Lab • Dopamine <--- research framework for fast prototyping • EdgeDB <--- next generation object-relational database • Face Recognition <--- world's simplest facial recognition api • FaceSwap <--- recognize and swap faces in pictures and videos • Fairseq <--- sequence modeling toolkit • fastai <--- deep learning library, plus lessons and and tutorials • fast_progress <--- flexible progress bar for Jupyter Notebook and console • Faster R-CNN and Mask R-CNN in PyTorch 1.0 • Fatkun Batch Download Image <--- download many images at once (Chrome) • ferret <--- web scraping system aiming to simplify data • <--- beautiful data visualization • Google Images Download <--- python script • Google Ngram Viewer <--- find how often a word appeared in books • graph-cli <--- command line tool to create graphs from CSV data • Graph Nets <--- build graph nets in TensorFlow • HiddenLayer <--- neural network graphs and training metrics • Horizon <--- platform for Applied Reinforcement Learning (Applied RL) • HTML Parser • HyperTools <--- toolbox for gaining geometric insights • ImageAI <--- build applications and systems • Image-to-Image Demo <--- funny one :) • janitor <--- simple tools for data cleaning (in R) • JAX <--- Autograd and XLA, • Keras <--- deep Learning for Python (API) • Kibana <--- visualization of data • Labelbox <--- labeling tool for ML • lazynlp <--- crawl, clean up, and deduplicate webpages • Leaflet <--- create mobile-friendly interactive maps • Learning to See in the Dark <--- optimise dark images • Lore <--- make ML approachable for Engineers • Loss Landscape <--- code for visualizing the loss landscape of neural nets • Ludwig <--- toolbox built on top of TensorFlow • m2cgen <--- transform ML models into a native code • MACE <--- mobile AI compute engine • Mercury <--- extract the bits that humans care about from any URL you give it

P A W E L C I S L O . C O M • ml5.js <--- friendly ML library for the web • MLflow <--- open source platform for the complete ML lifecycle • MLJAR <--- platform for machine learning (review by Mateusz Grzyb) • ModelDepot <--- list of free, pretrained ML models and tutorials • Modin <--- speed up your Pandas workflows by changing a single line of code • Music Maker <--- with the use of AI • NanoNets <--- ML with less data • neptune.ml (medium article) <--- collaboration for data science projects • Neuron, a new VS Code extension for data science (HN) • Nevergrad <--- gradient-free optimization • NLP Architect <--- library for exploring techniques of NLP • NLP-progress <--- repository to track the progress in NLP • NLTK <--- Natural Language Toolkit • NSFW Data Scrapper <--- collection of scripts to aggregate image data • Omni Draw <--- how do the learned models perceive what you draw • Optuna <--- hyperparameter optomisation framework • OriginLab <--- like Excel but for scientific stuff • pandas-datareader <--- extract data from a wide range of Internet sources • Papermill <--- tool for parameterizing Jupyter Notebooks • Parts-of-speech <--- POS tagging (adjective, adverb, noun…) • Pathfinding.js <--- visualisation of algorithms trying to find a way from A to B • Person Blocker <--- using neural nets block up to 80 different types of objects • Petastorm <--- enables single machine or distributed training • Photon <--- incredibly fast crawler • physt <--- easily create histograms • PocketFlow <--- Automatic Model Compression (AutoMC) • PySyft <--- library for encrypted, privacy preserving deep learning • PyText <--- NLP modelling framework based on PyTorch • Pythia <--- software suite for Visual Question Answering • PyToune <--- Keras-like framework for PyTorch • QuickChart <--- generates images of charts from a URL • QuickDraw <--- implementation of Quickdraw game • Redash <--- connect and query your data sources • Remastering Star Trek: Deep Space Nine With Machine Learning • SC-FEGAN <--- Face Editing Generative Adversarial Network • ScrapedIn <--- scrape LinkedIn search results without API restrictions • SegNet <--- Deep Convolutional Encoder-Decoder Architecture • Setosa <--- algorithms explained visually

P A W E L C I S L O . C O M • sg2im <--- image generation from scene graphs • Show Facebook Computer Vision Tags <--- Chrome/Firefox extension • smart_open <--- utils for streaming large files in Python • SNIPER <--- efficient multi-scale object detection algorithm • SPADE <--- semantic image synthesis • Spektral <--- deep learning on graphs with Keras • T2F <--- text to face generation using Deep Learning • TensorFlow Playground <--- neural network playground • TensorForce <--- TensorFlow library for applied reinforcement learning • TensorSpace.js <--- neural network 3D visualization framework • TensorRec <--- Tensorflow reccomendation algorithm and framework • Texar <--- toolkit for Text Generation and Beyond • TextBlob <--- text processing--Sentiment analysis • TextDistance <--- compute distance between sequences • The Chartmaker Directory <--- search how to make graphs • This resume does not exist <--- generate new resume every 5 seconds • tidypredict <--- predict score of models (lm, glm, randomForest) • TL-GAN: transparent latent-space GAN • TOAST UI <--- beautiful data visualization on your web service • TRFL <--- TensorFlow Reinforcement Learning • tweets analyzer <--- tweets metadata scraper & activity analyzer • Vectordash <--- rent GPUs for deep learning (cheaper than AWS) • VergeML <--- command line based environment for exploring ML models • vid2vid <--- high-resolution video-to-video translation • Visual Spark Studio <--- Apache Spark IDE • WarriorJS <--- game of programming and AI • Why building your own Deep Learning Computer is 10x cheaper than AWS • Wikipedia_trend <--- package for getting Wikipedia article access statistics • Yellowbrick <--- combine scikit-learn with matplotlib in the best tradition

P A W E L C I S L O . C O M DATASETS

Side Note: Now, having all the knowledge and tools, let's apply it on one of the datasets. :)

• 50 Best Free Datasets for ML (HN) • Academic Torrents • Apollo Scape <--- RGB videos with high resolution image sequences • Awesome Public Datasets • AWS Public Datasets • Berkeley DeepDrive BDD100k <--- 100,000 video sequences of car recordings • Caffe2 • Common Voice <--- dataset of voices that everyone can use to train speech- enabled applications • CORGIS <--- datasets for beginners • DataHub <--- easiest way to find, share and publish datasets online • Datasets for machine learning <--- huge list (CV/NLP/Audio) • Datasets for mind reading • FiveThirtyEight <--- economics, sports, politics • Goodbooks-10k <--- new dataset for book recommendations • Google BigQuery <--- public datasets from Google • Google Dataset Search <--- search engine of datasets from Google • Grouplens datasets • imgaug <--- image augmentation for machine learning experiments • How readers browse Wikipedia • Kaggle • List of lists with datasets • Mapillary Vistas Dataset <--- street-level imagery dataset • Mathematics Dataset <--- generates mathematical question and answer pairs • Million Song Dataset • nuScenes <--- large-scale autonomous driving dataset • Open Images Dataset V5 <--- ~9M images annotated with image-level labels • Profile Engine <--- dataset of public Facebook data (2007-2010)

P A W E L C I S L O . C O M • Quandl <--- financial data directly into Python • Quantopian Datasets • Tencent ML-Images <--- largest multi-label image database • World Bank Open Data <--- economic data

P A W E L C I S L O . C O M ADDENDUM

I hope that you liked all the resources. If you feel like saying thank you to me, please share my blog with your friends. :)

I do own more data science resources, but there is too many of them to include into this tiny e-book. Some of them are also in a graphical form and take more space than available on the current A4 page.

Please follow my blog, and I will try to explain more of the data science stuff as we go along together!

See you soon!

P A W E L C I S L O . C O M