Rubens Zimbres Senior Data Scientist, PhD [email protected]

+55 (11) 96750-6962

Summary

I am a strategist with over 23 years of experience in customer service, management and financial planning, having worked with crisis management, as CEO and Data Scientist in the areas of strategic planning and restructuring, physical and digital marketing, social networks personnel management, customer database analysis and have 13 years of Market Research expertise and 10 years of data analysis experience.

I have more than 10 years of experience in statistical software SPSS, full Office suite, an expert in , Python and (a software similar to MatLab). I also work with Spark, SQL, RapidMiner (a software similar to Microsoft Azure ML), HTML, Tableau and PowerBI.

I have Master's and Doctorate degree in Business Administration from Mackenzie. Artificial Intelligence course from at Brown University and Post Graduation Courses from University of Washington. I own a patent of a computer program at INPI in ++ / Mathematica regarding data analysis, simulation and social networks.

I develop financial and Econometric models emulating human behavior regarding choice using Artificial Intelligence tools as Rational Choice Theory, Game Theory, Nash Equilibrium and Reinforcement Learning (uncertainty, risk, reward and decision-making)

I also work in Data Science fields, as , Artificial Intelligence, Machine Learning, Deep Learning, Computer Vision and Natural Language Processing (Neural Networks, Word2Vec, Doc2Vec, Sentiment analysis). A specialist in storytelling and enhanced visual presentation of results.

I am also skilled in Big Data analysis (structured data, pattern recognition, sentiment analysis based on facial recognition, geo-spatial mapping and social network analysis) and artificial intelligence applied to business and strategy.

In my LinkedIn Profile there is my Portfolio with examples of my work in Machine Learning, Deep Learning and Natural Language Processing: www.linkedin.com/in/rubens-zimbres

My codes in R and Python can be accessed at: http://github.com/RubensZimbres/repo.raz

Experience

Data Scientist at DR3 Consultoria

October 2009 - Present (7 years)

I work with data analysis, market research, service quality research, digital marketing, database analysis and optimization of Machine Learning . I use the following Machine Learning methods to leverage profit, increase efficiency and value, monitor KPIs, define investment allocation, leverage ROI and identify sources of competitive advantage:

- Econometric models using market indexes and utility function to forecast sales and demand

- Conjoint analysis for market research

- Relational algebra with feature hashing

- Statistical analysis (descriptive and multivariate: Linear, Non Linear, Polynomial, Logistic and Multinomial Logistic Regression, Linear Discriminant Analysis, Factor Analysis, Principal Components Analysis): identify causes, predict outcomes, dimensionality reduction and do market segmentation

- Classification algorithms in predictive analysis (Decision Trees, Naive Bayes, Random Forests, Support Vector Machines): market segmentation, price elasticity and facial recognition

- Optimization (Monte Carlo Tree Search with fuzzy string match, genetic algorithms, gradient descent): determine employee turnover, optimize portfolio and logistics

- Clustering algorithms: (K Nearest Neighbors, K Means): identify niches, handle Big Data and sampling

- Autoencoders, Restricted Boltzmann Machines, Neural Networks, Convolutional Neural Networks, Recurrent Networks (LSTM) and Deep Learning: do pattern recognition, face recognition, optimize processes, predict time series, Computer Vision and Natural Language Processing.

- Survival analysis associated with decision trees: determine product life cycle and obsolescence of marketing strategies

- Simulation (Markov Chain Monte Carlo, Markov Decision Process, Cellular Automata, Agent-Based Models): simulate contagion in social networks, herd behavior and logistic problems, and can also be applied to self-driving cars.

- Time Series analysis associated with Discrete Event Simulation: to predict strategy's outcomes

- Machine Learning: supervised, unsupervised and ensembled Machine Learning as an optimization and a way to extract features from Big Data

- Symbolic Artificial Intelligence with forward and backward chaining inference for social network modeling and simulation of human interactions with intelligent agents

I have handled more than 57 million entries for a Machine Learning task (MNIST) reaching 99.60% accuracy with Convolutional Neural Networks and I developed all machine learning algorithms including face recognition, neural networks and deep learning in R and using customized , and Lasagne models in Python.

I work with structured, unstructured and batched data. Specialized in analytics and enhanced graphic visualization.

CEO and Business Intelligence Analyst at Doux Dermatologia

October 2010 - July 2013 (2 years 10 months)

Doux Dermatologia has 15 employees and a yearly revenue of 2.5 million reais. In 2010 the business had a 1 million reais debt. I renegotiated debt and developed a new image repositioning for the clinic, focusing on quality and efficiency, as well as a customer oriented strategy associated to monthly goals and variable remuneration. Growth hacking strategies were adopted: Google AdWords customization, SEO, word-of-mouth advertising, email marketing, social media presence, customer research and database analysis were the choices given that the business was not able to finance broader scope strategies. I did a comprehensive analysis of the database (3,500 patients, 750,000 records) and a satisfaction research to find out about quality perception and behavioral intentions of our clients.

Data was used to define Google AdWords and social media strategy, SEO, find out the most profitable procedures, clients’ demographics, market segmentation, demand sizing and guide strategic planning focused on quality, customer retention and profit. Financial indexes were monitored, like net worth, gross margin, net profit, assets, liabilities, ROI, ROA, ROE, EBITDA, financial leverage, current liquidity ratio, inventory turnover and customer churn rate.

In the first year, profit doubled. Our ROI in Google Adwords became 600%. After 2.5 years net profit increased 239%, financial leverage decreased 36%, stock turnover increased 15%, ROI increased 206%, ROE 95% and EBITDA 98.5%. With an increase in price strategy we decreased number of appointments and increased revenue (more efficiency). Up to 2013, 700 thousand reais of the debt was paid and after analysing Net Present Value, we decided to buy a 600 thousand reais skin treatment laser. Employee satisfaction increased and this reflected in total revenue of the business.

Market Researcher DR3 Consulting

February 2005 – December 2009 (4 years 10 months)

I am specialized in development and analysis of scientific market researches according to business' needs to identify customer perception of quality, involvement and purchase intentions (behavioral), including the following steps:

- Complete study of conceptual background relevant to the matter and hypoteses formulation

- Qualitative approach with structured and semi structured interviews and content analysis

- Conjoint analysis (quantitative approach): Pre test questionnaire development, including Likert (ordinal) scale customization, development of construct oriented questions in order to achieve internal consistency, internal and external validity

- Data treatment (outliers, missing values), transformation to adjust skewness and kurtosis, normality tests (Kolmogorov Smirnov), multicollinearity tests (Keiser Meyer Olklin) and ANOVA to verify the existence of homoscedasticity

- Statistical analysis (descriptives, correlations, linear regression, factor analysis) of pre test questionnaire and filtering of indicators to prevent multicollinearity, adjust number of relevant variables and allow feature engineering.

- Questionnaire application and further statistical analysis to obtain internal, external, conceptual, statistical and convergent validity.

This methodology helps to identify which factors lead to a given customer perception of quality and future purchase intentions, so that businesses can adjust their strategy to leverage revenue, profits, increase customer satisfaction and his/her involvement with product or service.

Operations Manager at Cultura Online Bookstore

March 2009 - August 2010 (1 year 6 months)

Cultura Online is a small company that uses e-commerce and physical structures to sell used books, e-books and DVDs. I was responsible for the project inception, coordination of operations, business intelligence, online advertising strategies, e-commerce configuration and logistics management.

The business had 5 employees and books were obtained from personal collections, and then sold online to all states of Brazil, in our personal website and Mercado Livre. Various payment options were offered to clients and we focused on national literature and speed of delivery. After a donation of 600 DVDs from a personal collection we included the media in our portfolio.

As a way to diversify, I developed public domain e-books using Natata, which were burned into mini CDs and sold in newsstands. At that time, tablets were not popular in Brazil. I left the project to become CEO of Doux Dermatologia.

Lecturer at National Meetings of Management in the Health Sector

August 2006 - March 2010 (3 years 7 months)

I was a lecturer of strategic planning, competitive advantage and sales techniques to students, health care and marketing professionals and health insurance companies in universities, national meetings of Management in the Health Sector and at DR3 Consultoria.

Logistics Assistant at Conway International (USA)

February 2000 - April 2000 (3 months)

I was responsible for organization and expedition of orders. It was a temporary job in USA Los Angeles.

Entrepreneur at Dental Office

July 1997 - November 1999 (2 years 4 months)

I was a successful entrepreneur in the Health Sector. After leaving Brazilian Air Force and investing profitably in the brazilian stock exchange (BOVESPA) I invested in my own business. As Brazil was passing through an economic crisis, I started strategically with competitive prices associated with high service quality. Spreadsheets were developed and a sensitivity analysis was done in order to focus on most profitable services. I also fostered word of mouth advertising between clients through a discount policy. Internet was in its inception and I invested in online prospection of clients and partnerships. Brake-Even Point was reached in 9 months. Successive increases in prices were made and profit increased 200% in 2,5 years.

1st Lieutenant at Brazilian Air Force

February 1993 - July 1997 (4 years 6 months)

I was the head of Health Sector and responsible for dental care of military personnel, input purchases for Health Sector, military personnel training and PAMA security.

Education

Universidade Presbiteriana Mackenzie

Doctor in Business Administration (2006 – 2009)

I continued Master's degree research using Artificial Intelligence, Agent-Based Modeling and Markov Decision Process in Management to simulate human interactions in Social Networks to find a rule that could predict and mimetize customer purchase behavior in aesthetic services. The research was successful and we reached a 73.8% correct prediction of future behaviors. A quali-quantitative longitudinal research was conducted with 100 clients and services providers. The results allow business to identify future behaviors of clients and influencers so that firms can be more efficient in their marketing and management efforts. The tuition fee was sponsored by merit by Mackenzie and additional training was sponsored by Wolfram Research, Illinois.

Articles were published at Elsevier - Electronic Notes in Theoretical Computer Science, Elsevier, 2009. [ISBN: 1571-0661] and an article was accepted but not presented in Academy of Management Annual Meeting, Montreal, Canada, 2010.. The doctorate thesis generated a Patent at INPI.

University of Toronto – Coursera Online Training (2016)

Neural Networks for Machine Learning (10 weeks)

- Perceptrons, Rectified Linear Neuron, Sigmoid Neuron, Binary Threshold Neurons

- Feed Forward Neural Networks, Recursive Neural Networks, Convolutional Neural Networks

- Learning weights for logistic output, Learning with Hidden Units, Backpropagation

- Avoiding overfitting, facilitating convergence and increasing generalization power in Neural Networks

- Regularization L1 and L2. Restricted Boltzmann Machines.

I developed all possible Neural Networks models (Deep Learning, Convolutional, LSTM and Recurrent) in R using H2O and in Python using Keras, Theano e Lasagne libraries.

University of Washington - Coursera Online Training (2015)

Data Science (4 weeks)

- Big Data: Volume, Velocity, Variety, Processing Issues and Difficulties on Analysis

- Relational Databases and Relational Algebra

- Machine Learning, Visualizations and Graph Analytics

- MapReduce: Scalability, Relational Join, Large Scale Parallel Data Processing, Cluster Computing, Fault Tolerance, Parallel and Distributed Query on Multiple Nodes, Hashing Algorithms, SQL, Hive and Pig Basics.

Machine Learning (10 weeks)

- Statistical Learning: Naive Bayes, Bayesian Networks

- Decision Trees, Entropy, Information Gain, Pruning Trees and Rules

- K Nearest Neighbor, Variable selection, Overfitting avoidance, Recommender Systems (Collaborative Filtering)

- Cost Function, Gradient Descent and Constrained Optimization

- Perceptrons, Kernels, Lagrange Multiplier, Support Vector Machines

- Neural Networks, Multilayer Perceptron, Backpropagation, Hidden Layers

Data Manipulation at Scale: Systems and Algorithms (4 weeks)

- Big Data, Data Science, Tools, Abstractions, Statistics, MapReduce

Getting and Cleaning Data (4 weeks)

- Subsetting, Sorting, Ordering, Feature Engineering, Reshaping and Merging Data

Stanford University - Coursera Online Training (2015 – 2016)

Machine Learning (11 weeks)

- Supervised and Unsupervised Learning

- Linear and Logistic Regression, Cost Function and Gradient Descent

- Neural Networks, Backpropagation and Optimizaition

- Bias, Variance

- Support Vector Machines and Kernels

- Clustering algorithms, K Means, Dimensionality Reduction and Principal Components Analysis

- Anomaly detection using K Nearest Neighbor and Gaussian Distribution

- Recommender Systems: Content Based Recommendation and Collaborative Filtering

- Large Scale Machine Learning

- Validation Sets, Training Sets, Test Sets

Learning applied to solve the MNIST pattern recognition task with R, Python and Mathematica using Neural Networks reaching an accuracy greater than 99.9%.

Brown University and Google Instructor – Udacity Online Learning (2016)

Deep Learning - Neural Networks (3 weeks)

- Logistic classification

- Stochastic optimization

- Learning rate decay and tuning

- Softmax normalization and cross entropy

- Data and parameter tuning: Backpropagation and Chain Rule

- Cross validation

- Regularization: L1, L2, Elastic Net

- Activation functions: Sigmoid, Hyperbolic Tangent, ReLU (rectified linear units)

- Autoencoders

- Convolutional Networks: convolution, patch, stride, pooling, inception

- Word2Vec: vectorization of words to make semantic understading possible

- Recurrent Networks: exploding and vanishing gradients

Learning applied to solve the IRIS classification task using Neural Networks with accuracy greater than 97% using Deep Learning in Neural Networks in R and Mathematica.

Machine Learning (11 weeks)

- Decision Trees: ID3 and C4.5

- Linear and Polynomial Regression

- Neural Networks: Perceptron training, Gradient Descent, Backpropagation, Sigmoid

- K Nearest Neighbors, K Means, Support Vector Machines, Perceptrons. EM algorithm

- Bayesian Learning, Naive Bayes

- Randomized Optimization: Hill Climbing, Simulated Annealing, Genetic Algorithms

- Feature Selection and Transformation

- Information Theory

- Reinforcement Learning and Decision Making: Markov Decision Process, States, Actions, Policies, Risk, Uncertainty, Rewards, Expected Utility, Rational Choice Theory.

- Game Theory, Nash Equlibrium, Prisoner’s Dilemma, Iterative Games, Zero Sum Games, Tit- for-tat, Pavlov strategies

Coursera Online Learning and Udemy Online Learning (2015)

R Programming, Statistics in R and Python

- Handling strings, vectors, lists, matrices, Str, Importing files. If-else Statement, Subsetting lists and matrices, For/While loops, Functions, Dates and Time, Loop functions, Simulation, Random Numbers, Random Data Sampling (bagging) and Optimization of code.

I developed the following models in R and Python: Linear and Logistic Regression, ANOVA, ROC curve, Multiple Regression, Linear Discriminant Analysis, Principal Components Analysis, Decision Trees, Random Forests, Support Vector Machines, Naive Bayes, K Nearest Neighbor, Bootstrap Aggregating, Markov Chain Monte Carlo, Neural Networks, Gradient Boosting, Social Network modeling, Geo Location and Face Recognition.

MIT and Ohio State University - Coursera Online Training (2016)

Calculus

Limits, Derivatives, Local and Global Minima and , Optimization, Chain Rule, Lagrange multiplier and .

Wharton School of Business – Coursera Online Learning (2016)

Business and Financial Modeling (4 weeks)

- Linear models and optimization

- Discrete event simulation

- What-if scenarios, sensitivity analysis and dynamic models

- Price and demand elasticity

- Net Present Value: lifetime customer value and investments

- Optimization

- Probabilistic models: Monte Carlo simulation, Markov Chain models, Probability trees

- Forecast in the presence of uncertainty and risk

- Multiple and logistic regression applied to business

Universidade de São Paulo

Auditing Course at FUNDECTO (2007 – 2007)

Brown University

Wolfram Research, Cellular Automata, Artificial Intelligence and Discrete Event Simulation (2006)

Wolfram Mathematica is a software used in 90% of technical courses in the United States. Along the years I developed models of word of mouth advertising dynamics, predictive analysis for business, population dynamics, iterative social networks, structured and unstructured data analysis, cellular automata to simulate human interactions, pattern recognition and sentiment analysis associated with facial recognition. All models were developed from scratch and based in associations of Machine Learning methods, sometimes involving artificial intelligence and complex behavior. I got free tuition of the course, sponsored by Wolfram Research.

Universidade de São Paulo

FEA - Faculdade de Economia e Administração

Economics and Complexity (2005 – 2005)

Universidade Presbiteriana Mackenzie

Master in Business Administration – Strictu Sensu (2004 – 2005)

I developed an innovative idea using Electrical Engineering concepts (Cellular Automata and Artificial Intelligence) in Management. I got Sponsorship from Mackenzie, based on merit, because I was one of the best students in class and the idea was innovative.

Universidade de São Paulo

Bachelor's degree Dentistry (1989 – 1992)

Languages

English (full proficiency) Portuguese (native)

Spanish (full proficiency) French (intermediate)

Italian (intermediate)

Patents

Cellular Automata Code to Simulate Artificial Societies and Social Networks.

Brazil Patent RS 10950-1 Issued February 1, 2011

Honors and Awards

Military Rescue Operation - Skyjet Brasil September 1995

I was voluntary in a rescue operation of the Air Force. The CEO of Skyjet Brasil personally thanked Air Force Colonel Bioza for our military presence in the rescue operation in St. Marteen after Luis Tornado. Brazilian tourists were isolated.

Doctorate Sponsorship – Universidade Presbiteriana Mackenzie - 2006

Wolfram Research Sponsorship – Wolfram Research, USA – 2006

Master’s Degree Sponsorship – Universidade Presbiteriana Mackenzie - 2004

Publications

R vs Python? No, R and Python. Data Science Central, 2016, October 7. Authors: Rubens Zimbres. Available at: http://www.datasciencecentral.com/profiles/blogs/r-vs-python-r-and- python-and-something-else

Dynamics of quality perception in a social network: A cellular automaton based model in aesthetics services. Electronic Notes in Theoretical Computer Science, Elsevier [ISBN: 1571- 0661] October 1, Authors: Rubens Zimbres, Pedro P.B. Oliveira

Cellular automata based modeling of the formation and evolution of social networks: A case in Dentistry. Artificial Intelligence and Decision Support Systems, pp. 333-339, 2008. [ISBN: 978-989-8111-37-1] June 16, 2008, Spain. Authors: Rubens Zimbres, Pedro P.B. Oliveira

Agent-based Modeling: A Third Way of Doing Science ? EnANPAD (Encontro Nacional dos Programas de Pós-Graduação em Administração de Empresas) September 9, 2006

Effects of changes in the neighborhood and initial state in the flow of information in social networks. Wolfram Research, Illinois July 18, 2006. Authors: Rubens Zimbres, Jason Cawley, , USA

Simulation of interactions in social networks using cellular automata as a complementary method of quantitative analysis. IBOPE - 4 Congresso Brasileiro de Pesquisa March 1, 2010.

Negotiations in business networks based on Prospect Theory. SEMEAD - USP Authors: Rubens Zimbres

Influence of Rationality and Quality in Service Purchase in an Artificial Society. SEMEAD - USP Authors: Rubens Zimbres

Game Theory and Transactions in Dentistry. CATI - FGV 2005 Authors: Rubens Zimbres