Sds Podcast Episode 433: Data Science Trends for 2021

SDS PODCAST EPISODE 433: DATA SCIENCE TRENDS FOR 2021

Show Notes: http://www.superdatascience.com/433 1

Jon Krohn: 00:00:00 This is episode number 433, with Ben Taylor, of DataRobot.

Jon Krohn: 00:00:15 Welcome to the SuperDataScience Podcast. My name is Jon Krohn, chief data scientist and bestselling author on deep learning. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. Thanks for being here today. And now, let's make the complex, simple.

Jon Krohn: 00:00:45 Welcome to the SuperDataScience episode. I'm your host, Dr. Jon Krohn. And I'm joined today by our very special guest, Ben Taylor. In this episode, Ben joins us on the podcast for a fourth time. This time to cover the data science trends that are set to take off in 2021. This episode is well suited to anyone with an interest in data science, no matter whether you're a beginner or advanced practitioner, and no matter whether you're hands-on or an executive, primarily keen on the business of data science.

Jon Krohn: 00:01:19 Today, we've got plenty for everyone. Some of the key trends we focus on in this episode are the prevalence of racism and gender biases in artificial intelligence, tools for understanding black-box algorithms, training models, without compromising private user data, delivering business value with AI, putting machine learning models into production and the data science software libraries set to take off in 2021.

Jon Krohn: 00:01:47 Ben Taylor, is a brilliant data scientist, with an exceptionally broad understanding of machine learning models deployed into real-world systems. So, I always learn a lot from him. I'm confident, you will too. We touch on some serious topics, but Ben's got a sharp wit, and so we also share a lot of laughs. I can't wait to share this episode with you. So, let's go.

Show Notes: http://www.superdatascience.com/433 2

Jon Krohn: 00:02:16 Ben, welcome to the program. This is such a perfect moment, because it was many months ago, the beginning of 2020, when I was launching my own podcast, and you were the first guest that I ever had on my podcast. Now, I'm hosting the SuperDataScience Podcast, and boom, you're here again, just like clockwork.

Ben Taylor: 00:02:40 And that's a funny story because I was supposed to be on your podcast in person in New York.

Jon Krohn: 00:02:46 Yeah, that's right. So, we had actually, we had agreed, we had a time and a date set for you to be in New York to record. At that time, the idea for my podcast, it was like a live news show. All of us were there in person. And I couldn't believe it that the Ben Taylor, was willing to be the first guest on my podcast. And what happened is virus got in the way. I didn't think that this virus was going to be a big deal, but obviously I was wrong.

Ben Taylor: 00:03:14 Well, it wasn't public yet, but we had just joined the DataRobot family through a recent acquisition. So, I was at DataRobot headquarters in Boston. And Thursday, right? Yeah, I remember the day it was Thursday, I think the second week in March when things are blowing up, and I was so busy in meetings. I wasn't following the news. And so, my wife was trying to get ahold of me because people are stealing toilet paper practically from the supermarket. It was just a show at home. So, luckily my co-founder came in and said, "Ben, you have to call your wife immediately." And she said, "Get your a** on the plane today. You're not going to New York." So, that's how that happened.

Jon Krohn: 00:03:55 Yeah. It was that week. Everything went from like, we knew that there was an issue overseas. We knew that, Wuhan, China had closed down. I can't remember if in Italy, things were kicking off already at that point in Italy.

Show Notes: http://www.superdatascience.com/433 3

Ben Taylor: 00:04:09 Yeah. Well, they were kicking off New York. Because I remember the National Guard had come out that week and-

Jon Krohn: 00:04:16 Oh, already at that point.

Ben Taylor: 00:04:18 Yeah. Or maybe it was the following day. There was a lot that happened that a day or two later, where I made the right decision.

Jon Krohn: 00:04:28 Yeah. Things were escalating quickly. It was the Thursday that you decided, or you decided, that there was no choice. And it was a very reasonable thing to go home. Let me phrase that the right way. And it was the next day that all of a sudden, nobody showed up to work with me.

Ben Taylor: 00:04:51 Yeah.

Jon Krohn: 00:04:51 It was on a Friday.

Ben Taylor: 00:04:52 Yeah.

Jon Krohn: 00:04:54 And then the Monday was the day I was in the gym when the governor of New York announced that gyms were closing that evening. And everything closed. That was the beginning of the end. But now, everything's different. We've got vaccines in probably quickly. And it would have been too soon for us to plan to do this one in person. We learned from our mistakes. Don't try to get Ben Taylor, in person, there will always be a pandemic. But soon you will be able to come to New York, and we can meet in person.

Ben Taylor: 00:05:31 Yeah. I'm excited. And I think a lot of people are missing that just the peer-to-peer interactions you get from data conferences. So, I'm hoping the fall season will be lots of travel, lots of great conferences, face-to-face meetings.

Show Notes: http://www.superdatascience.com/433 4

Jon Krohn: 00:05:46 I'm hoping so. I've already committed to speak at the Open Data Science Conference in November 2021, which is in San Francisco. It's going to be in-person apparently.

Ben Taylor: 00:05:55 Okay. That's exciting.

Jon Krohn: 00:05:57 Yeah. I don't think we've mentioned this yet for the guests who don't know. Where did you fly home to? And I guess, where you've been since that flight. Have you been on any flights since?

Ben Taylor: 00:06:12 I don't think I've been on any flights since. I had to check. So, home for me is Lehi, Utah. I tell people Salt Lake City, because not everyone knows where Lehi is. But Lehi, it's really the center of tech for Utah. So, you have the Adobe Campus. You have Pluralsight. You have the Qualtrics acquisition for $8 billion from SAP. It is just South of us. So, you have a lot of these unicorn tech companies that are spinning up out of Utah. And it's a great place to live because you have good skiing, good tech. Everything is close together. So, this is where I've been.

Jon Krohn: 00:06:52 I bet having access to green space during COVID is nice. I don't know, I seem to be the only person that stayed in New York.

Ben Taylor: 00:06:59 Yeah. I feel spoiled, because I have a Can-Am side-by- side, so I can go out in the desert or I can go back country skiing.

Jon Krohn: 00:07:08 Wow.

Ben Taylor: 00:07:09 And lately I've been running outside, which feels great to go running in the snow in the mountains. So, I can definitely put my lungs to work and get some vitamin D.

Show Notes: http://www.superdatascience.com/433 5

Jon Krohn: 00:07:19 Nice. It sounds great. I didn't know much about the Utah tech scene. And so, even just that little bit is great to hear.

Jon Krohn: 00:07:32 Eliminating unnecessary distractions is one of the central principles of my lifestyle. As such, I only subscribe to a handful of email newsletters. Those that provide a massive signal to noise ratio. One of the very few that meet my strict criteria is the Data Science Insider. If you weren't aware of it already, the Data Science Insider is a 100% free newsletter, that the SuperDataScience team creates and sends out every Friday.

Jon Krohn: 00:08:01 We pour over all of the news and identify the most important breakthroughs in the fields of data science, machine learning and artificial intelligence. The top five, simply five news items, the top five items are handpicked. The items that we're confident will be most relevant to your personal and professional growth. Each of the five articles is summarized into a standardized, easy to read format, and then packed gently into a single email. This means that you don't have to go and read the whole article. You can read our summary and be up to speed on the latest and greatest data innovations in no time at all.

Jon Krohn: 00:08:37 That said, if any items do particularly tickle your fancy, then you can click through and read the full article. This is what I do, I skim the Data Science Insider newsletter every week. Those items that are relevant to me, I read the summary in full. And if that signals to me that I should be digging into the full original piece, for example to pour over figures, equations, code, or experimental methodology, I click through and dig deep.

Jon Krohn: 00:09:02 So, if you'd like to get the best signal to noise ratio out there in data science, machine learning and AI news, subscribe to the Data Science Insider, which is completely free, no strings attached at superdatascience.com/dsi.

Show Notes: http://www.superdatascience.com/433 6

That's superdatascience.com/dsi. And now, let's return to our amazing episode.

Jon Krohn: 00:09:27 So, we've got you on the program to give us some insight into the trends that are unfolding in 2021. We thought that there would be no better person than you because of your passion for the data science field. I actually read recently that you were so excited about AI on stage, that at a conference, at a data conference, obviously pre- COVID, you cried on stage with your wife and your father- in-law in the audience. Do you want to tell us a bit about that? What were you talking about?

Ben Taylor: 00:09:58 Yeah. So, I get really fired up about the future, just all the things that will come, stuff like a smart toilet. I promise I didn't cry over the smart toilet idea. I get really excited for all of these amazing sensors that I don't think the data science team has really embraced yet, because they're not widely available. So, some of those things are localized radar. And then the idea of, I had a ring camera set up at my home with seven cameras. But in the future, when you have a hundred cameras in your home and localized radar, which would be very annoying today, just handling that load. But with AI, it becomes more of a router. So, everything's actionable, things get escalated.

Ben Taylor: 00:10:38 So, you can imagine a scenario in the future where if your kid is in harms way, they're choking, or we had someone in our neighborhood who was strangled by a blind cord, a three-year-old girl, very, very sad. That scenario would never happen again. Yeah. So, parents aren't home, this is happening, they can do something that they're not aware, and so an AI system.

Ben Taylor: 00:10:57 So, me talking about this hypothetical story in the future about how AI will escalate, and if you aren't responding to your phone, because you're chatting with a neighbor, they'll send the AI EMTs to your house to break down

Show Notes: http://www.superdatascience.com/433 7

your front door and save your kid. And they'll do it all in record time.

Ben Taylor: 00:11:17 And so, me telling the story, I got choked up, talking about a futuristic tech story to save kids' lives. It was pretty funny that I was that motivated. But I remember the emotions inside me were, "The AI will knock down your damn door, save your kid."

Jon Krohn: 00:11:36 Yeah. Well, I hope we can come up with something on today's program that makes one of us cry. That would really make for a good episode. My first guest episode with SuperDataScience. Somebody should be upset. Somebody should be balling.

Ben Taylor: 00:11:48 Do you want a volunteer? Do you want me to volunteer?

Jon Krohn: 00:11:55 All right, we've got a lot of topics lined up for today. We'll try to move through them as quickly as possible because I think with either you or me, we're prone to speaking at length in incredible detail about things that excite us in the data science world. So, I'm going to try to move us along as best as I can, because there's so many things that we want to talk about. I want to talk about delivering results with value and urgency. We want to talk about transparent storytelling with data. We want to talk about federated learning, machine learning productionization. There is enough full of a word. And machine learning bias or ethics in AI. If we have time at the end, I'd also like to talk about software languages or packages that we think might continue to become more widespread or really take off in 2021.

Jon Krohn: 00:12:42 All right. So, let's start off with that first one, delivering results with value and urgency. So, getting some kind of return on your investment with AI. So, with COVID, there was a lot more pressure than usual to find short term ROI. And that is absolutely something that I experienced

Show Notes: http://www.superdatascience.com/433 8

in my company. And so, in your notes to me, you mentioned that the OKR, so objectives and key results have become the new KPI, the key performance indicator. So, maybe tell us about KPIs and how they vary from OKRs, and then this short-term focus on AI ROI.

Ben Taylor: 00:13:26 Yeah. So, I don't think I can take credit for that statement. I think it came from someone who was at Goldman, who said that. And I really liked it. And so, OKRs are typically tied to quarter goals. So, if you're in leadership or you're an executive, you're going to get bonuses paid out quarterly, depending on how well you hit those.

Ben Taylor: 00:13:45 And KPIs, I think being a data scientist, we've seen this natural evolution where maybe on the spectrum of idiot, you talk AUC, F1 score, just pure stats. And we know those don't work well when you're trying to communicate value. And so, someone who's more sophisticated, they'll talk about KPIs or even use a utility function to map it to money, which is the ultimate KPI. You're actually talking dollars flowing to the business. But OKR, it really screams urgency, that you have a good enough sense of the business. You're willing to work on priorities that will deliver value this quarter. And I feel like that's a mindset that we don't see in the data science field very often, because a lot of the stuff we do is hard and it takes a long time.

Jon Krohn: 00:14:28 Yeah, that all makes perfect sense to me. I'm actually glad to have that explanation, because I thought up until that moment a second ago, that OKRs and KPIs are synonyms. So, I guess KPIs can be any kind of data that you're tracking related to business performance. But OKRs are more related to quarterly results.

Ben Taylor: 00:14:53 And those can change quarter by quarter. So, think of it, OKRs is you're putting special attention on the KPIs that

Show Notes: http://www.superdatascience.com/433 9

matter most in the short term. And sometimes you make a lot of progress, if you do have laser focus on a thing versus another thing as you go quarter by quarter.

Jon Krohn: 00:15:11 So, with a long-term research project, which is going to end up needing to happen, if you're really going to push the boundaries of what's possible in AI, is it possible to have OKRs even for long-term research objectives?

Ben Taylor: 00:15:27 I think it is. It's a complicated answer. So, to back up the recommendation for anyone who's starting green, is you need a crawl, walk, run. So, if you're a budding data scientists, or if you have a new department, you need to get a quick win. And so, I think you and I had probably agree that they need to think about the quick win, what's the cross scenario. But then as they start getting some momentum under their belt, there are these bigger moonshot objectives that are very important for the business. And those aren't things you can expect to be done this quarter.

Ben Taylor: 00:16:00 And so, I have had pushback on this OKR over KPI, where that's great, but you still need to have the moonshot initiatives. You still need to have the strategic work to make sure that you're prepared for one, two, three, four or five years out. Five years out is really hard to think about from a data science perspective because everything's changing every six months. So, a lot of times we're just focused on the year.

Jon Krohn: 00:16:24 In five years, I plan on having an AGI. That's my-

Ben Taylor: 00:16:27 That's right.

Jon Krohn: 00:16:28 ... five years goal.

Ben Taylor: 00:16:29 I will upload this body and get new knees.

Show Notes: http://www.superdatascience.com/433 10

Jon Krohn: 00:16:32 Exactly.

Ben Taylor: 00:16:33 Or I will upload this [inaudible 00:16:34] to a new body.

Jon Krohn: 00:16:37 So, okay. This all makes perfect sense to me. It's definitely, at my company, the same kind of thing, when COVID hit my long-term R and D roadmap went on pause, so that we could focus on delivering value. And so, I totally understand that. It sounds like the kind of thing that you described with this kind of green data scientists having a walk, crawl, run mentality. I think the way that I end up breaking it up with my team in typical circumstances, so not when COVID immediately hit, but generally speaking, I have projects that I know are delivering business value. And those constitute, maybe not the majority of our time, but the plurality of our time.

Jon Krohn: 00:17:30 And then a second track where it's like, "Okay, I can see that there's probably going to be something here that maybe this quarter, certainly this year, it will bring some value to the company." And then a minority of our time ends up being spent on really fun things, trying out applications of new technologies into spaces that I don't even know if there is going to be value. But by playing around with things, you often stumble across great opportunities.

Ben Taylor: 00:17:59 Yeah. And I think for those minority projects, some business leader might see that as a waste of time, but I see it as value where I've really celebrated the... If you practice to learn, to learn, you'll be better at learning. And I think I really have to get to that point because otherwise, I would feel very sad about my college experience, because I studied chemical engineering and I'm not using chemical engineering, so should I have not gone to school? But the takeaway is in college I learned to learn, and now I can apply that to new problems.

Show Notes: http://www.superdatascience.com/433 11

Ben Taylor: 00:18:29 So, and then passion is so important and data science to keep, just sharpen the sauce. So, by introducing yourself to passion and these minority projects, when things matter most, you're much more likely to be creative than someone who has not had that exposure.

Jon Krohn: 00:18:46 Yeah. That all makes perfect sense to me. You're preaching to the choir. So, the idea here is to have trends in 2021. So, it was the idea here that because of COVID, OKR is delivering value on AI, as something that all of a sudden came to the forefront. And I guess the idea here is that even with vaccines, unwinding restrictions, you still think that delivering value is something that's going to be big in 2021 and beyond.

Ben Taylor: 00:19:14 I think time to value will remain a key point of focus. And we see that in the data science community, is becoming easier to do things in very, very quickly using open source software or platforms like DataRobot. And the thing I like to say is, the principal consultant today is the free intern tomorrow. And I'm just talking about the work they do. So, you and I could geek out about some work we're doing, but if it falls in line into a common business use case, there's a good chance that one, two, three, four years from now, an intern could do that work. But today, it takes PhD level understanding to really dive deep on reinforced learning or deep learning or some federated learning, or something that's overwhelming for principal consultant, but expected of a free intern in the future, which is great. It's great for society. Maybe not great for the principal consultants. They have to keep running. They have to keep finding new work.

Jon Krohn: 00:20:15 All right. So, you mentioned DataRobot there. And I deliberately, so I learned something. So, audience members, you're going to love hearing this. I learned something about you collectively today, which is that on the whole based on research that Ben has done. If I was

Show Notes: http://www.superdatascience.com/433 12

to ask Ben for his background at the beginning of the program, a proportion of you would've switched off. So, I deliberately didn't do it. But now you mentioned DataRobot in a sentence, and you don't know that Ben works at DataRobot. Ben, do you mind telling us a little bit about DataRobot?

Ben Taylor: 00:20:47 Yeah. So, DataRobot, they've been around for a while now, eight years. They've raised over 700 million. They just closed a new round over 320 million, I think a few weeks ago from Snowflake, if you [inaudible 00:20:59] notable investors. So, they started with AutoML. And that quickly graduated to end-to-end pipeline, so data ingest, which they've done a bunch of acquisitions. I think they're on their seventh or eighth acquisition now. They purchased Paxata, and a bunch of other startups, including ours.

Jon Krohn: 00:21:19 And Zeff. You did actually mention that earlier, now that I think of it. Yeah. Yeah.

Ben Taylor: 00:21:25 Yeah. So, our startup, Zeff was focused on AutoML for deep learning. And we are going after text, images, video. But the thing that was more interesting, it was combining those data sets together. So, we have a single model that had multiple different types of images or audio and video working together in a single model.

Ben Taylor: 00:21:42 So, DataRobot, this will sound biased because obviously I work there, I should have some bias or I should go work somewhere else, but I do see them as the leader when it comes to applied AI in general. So, if you look at the number of industries they're in, the number of customers they're in, and you compare that to their competitors, they just have more. They have more applied use cases. They have customers with hundreds and thousands of machine learning models deployed. And being a data scientist, that's kind of the high bar.

Show Notes: http://www.superdatascience.com/433 13

Ben Taylor: 00:22:14 It's one thing to build a model and to deliver a BI report and say, "Hey, good news. This is accurate." It's another thing to deploy that model and actually sleep at night. What is that model doing? What is it doing a week from now? What is it doing six months from now? So, at HireVue where I used to work, I was personally involved with model excursions, train the model at static. Everything should be great. And then three months later, a customer is complaining. It turns out you had feature drift. Where did the feature drift come from? Came from a vendor. Vendor didn't tell you they changed the threshold.

Ben Taylor: 00:22:44 And so, you can't anticipate all the ways your models will go wrong, but you need to have alarms in place. And so, that's getting ahead of ourselves with the, this is in the spirit of productionizing AI.

Jon Krohn: 00:22:57 Nice. Yeah. All those kinds of concerns are the kinds of things that I do deal with daily. So, maybe I'm going to come out of this podcast experience asking for a demo. I also, I think the DataRobot is a great name. Whoever did that initially and got that locked down, I think it's such a brilliant name for what you do.

Jon Krohn: 00:23:18 One quick thing that I wanted to make sure in case an audience member wasn't aware of it. So, AutoML, automatic machine learning, automated machine learning. The idea there is aspects of the model get configured automatically that wouldn't necessarily be done by a robot. It would be done historically by a person.

Ben Taylor: 00:23:39 Yeah. Which is really funny because, I don't know if you remember this, but some of the early data scientists, maybe the more academic types, they really fought AutoML. They didn't think it was possible. They thought you had to sit down and decide, "Should I use a logistic regression right now, or gradient boosted classifier." They

Show Notes: http://www.superdatascience.com/433 14

thought that would always be more of an art than a science. It turns out it's just a science. So, when it comes to interpreting the model and making sure it's doing what you want, that you really want to involve the subject matter expert. That's still going to be a dialogue. But when it comes to how to best hyper prep tune a variety of models, I don't think humans were ever well equipped to do that.

Jon Krohn: 00:24:25 Well, that segues perfectly into the next topic. So, talking about black boxes and understanding them, that was another trend that you identified for 2021, is a transparent storytelling. So, there are tools out there. You've listed some of them for me like SHAP, a Grad-CAM visualization method, topic discovery. These allow us to glimpse into the way that a black box model is working. So, probably many audience members are aware of this.

Jon Krohn: 00:24:59 But if you have a relatively simple machine learning model, like a logistic regression model, you can look at every feature that you have. Maybe you have 10 inputs into your model, and you can see how each one of those is weighted, and how that directly impacts your outcome. But if you have a neural network, you might have millions or billions, or even in some cases in the last year, trillions of model weights. And there's no way that you can look at the model and say, "I understand what's going on here." So, how is transparent storytelling going to help?

Ben Taylor: 00:25:34 So, I've been able to watch this unfold, and you have as well, in the last five or 10 years. The need for storytelling has always been there because of the accusation of a black box model. And for anyone who's doing anything important, they never want to depend on a black box model because that just sounds really, really scary.

Ben Taylor: 00:25:53 So, I think the black box modeling with deep learning that was in the academic space, where they don't have to be as

Show Notes: http://www.superdatascience.com/433 15

defensible. But then when you get into applied settings where you can have bias amplification, or you can have a model that's just not working, it's doing things you didn't expect, the importance of being able to tell a story, not to a data scientist, but to a CEO or to a subject matter expert, it began to be enabled with the Grad-CAM visualizations where you can look at activation maps that became very helpful.

Ben Taylor: 00:26:25 The classic example of activation maps is there's a famous example online that you can search where a machine learning model is classifying a dog as being a wolf. And if you look at where it's activating, it's activating on the snow. So, it doesn't care at all about the dog. And that's where AI can be tricked. So, and subject matter experts don't like that. So, if you're showing that to a subject matter expert, well, there goes your model confidence. They don't really care what the accuracies is. They're super concerned about what that model is actually learning.

Ben Taylor: 00:26:55 The other thing I really like, this one is a little bit more clumsy. But any deep learning model you train, you can take that final encoding layer at the bottom, and you can cluster it and just look at a TSNE plot or your favorite cluster plot. And those aggregate groups are fascinating, because it's essentially telling you a story around topics.

Ben Taylor: 00:27:17 And so, what we found, we were working with a client and it began to cluster out different topics that were unknown to something like an image in it. So, an example would be a sleeping baby. So, image net doesn't know what a sleeping baby is, but by building these types of classifiers, we found that the concept of a sleeping baby was very important for an image ending up in a scrapbook. And so, the joke is, we all like our kids, but we like it more when they're sleeping. We can take a picture of them.

Show Notes: http://www.superdatascience.com/433 16

Ben Taylor: 00:27:46 Yeah. So, I think to wrap up that thought, it's so important for us to partner with subject matter experts. And I've gotten to the point now that I don't want to say I give up, but I want to invite the human that's been working the process the longest to the meeting immediately. Because I've been in meetings, where the person you should have had in the meeting was a technician, they're not a VP, they don't have the seniority to be at the meeting. And then later when they get brought in, you realize all the mistakes you've been making. So, I'm a huge fan of the underwriter, the technician. Find me a human that's worked that process for decades and the machine learning side of things, we'll get a lot of insight.

Jon Krohn: 00:28:24 So, when you talk about transparent storytelling is something that takes off in 2021. You're not just talking about the tools, there's more and more of these tools available for explainable AI or ex-AI. But also what you're suggesting is that if it's not a trend, it should be a trend, that the people who are actually down on the front line building these models should be involved even in high level discussions with management.

Ben Taylor: 00:28:52 Yeah. Yeah, definitely. I think that's mission critical. So, even some of these tools that we've mentioned, they might be a little technical. They're definitely much more technical than a technician would comprehend. But it's our job as data scientists to get it into a format that is understandable or crackable. And I think that's really how I see the world. I just see the world and the businesses that operate in it as having a lot of processes. They have a lot of business processes with data flowing around, and they are all opportunities for machine learning applications. And if you're not inviting the most senior human who understands that process through experience, you're missing out.

Show Notes: http://www.superdatascience.com/433 17

Ben Taylor: 00:29:31 It reminds me of the gray hair technician, like in a manufacturing plant. Why does everyone go to them for advice on new problems? Because they have a lot of experience. Experience came through time. They've seen a lot of edge cases, a lot outliers. That person can be really valuable in an AI discussion.

Jon Krohn: 00:29:47 I agree a hundred percent. And I'd like to think personally, and if anyone's listening to this and hasn't experienced me doing this personally, I think I'm pretty good at bringing those people into the meeting. And mostly just because I don't want to end up being in a situation where down the road, we've committed to something and we could have done a better job. It makes me really nervous.

Jon Krohn: 00:30:10 So, I think in situations where I realized retrospectively, okay, we're talking about something here that, because I wasn't involved on the keys, on the keyboard, actually coding this up, I haven't actually looked at samples of the data to make a note and make sure I talk to those people and loop them in, before any serious decisions are made. Good advice.

Ben Taylor: 00:30:32 They're also the best people to hold you accountable on what is the KPI, because the KPIs could differ. So, if you're talking to a VP, they might have something in mind, but if you're talking to the technician, they've got daily metrics they deal with there, which may not be top of mind to an executive.

Jon Krohn: 00:30:48 The AUC that you mentioned earlier.

Ben Taylor: 00:30:52 Yeah. I'm embarrassed to say, I remember having sales meetings, trying to educate prospects on the value of an AUC chart. These are like HR prospects, not very technical.

Show Notes: http://www.superdatascience.com/433 18

Jon Krohn: 00:31:06 Yeah. I bring up the AUC in pitches, probably more often than I should, but I try to just say, "A hundred is the best, and look at how close we're getting."

Ben Taylor: 00:31:18 I've digressed all the way down to the two or three bar chart, where you can just say the third bar is higher than the left bar or the first one.

Jon Krohn: 00:31:27 Yeah. I mean, that makes a lot more sense. All right. So, I think that covers the transparent storytelling as another trend.

Ben Taylor: 00:31:35 Yeah.

Jon Krohn: 00:31:36 So, the third one coming up here, the third trend for 2021 is federated learning. So, I think I understand what federated learning is. You can correct me here. But it's a situation where you try to learn off of people's data without actually getting any access to their individual data points. Is it something like that?

Ben Taylor: 00:32:02 Yeah. That's spot on. And I've got a COVID story related to this that hopefully, is very upsetting for all the people listening, is upsetting for me. So, I was talking to a senior health informationist in Utah. So, that's the data scientists equivalent, but they actually work with patient level data. And they were saying that enough people, Utah has not had enough COVID deaths to understand the disease. And I think at the time New York had over a thousand.

Ben Taylor: 00:32:27 And so, the issue was patient level data in New York could not be shared with other hospital networks. So, hospitals would share with hospitals, but they won't share across hospital networks because of HIPAA regulation and the process to get that approved. This should be super upsetting for people that are listening, I

Show Notes: http://www.superdatascience.com/433 19

still think to date, the US does not have a national database of COVID patient data.

Ben Taylor: 00:32:52 And so, a lot of these studies are coming out of single hospital networks, but how nice would that be to have all the data in a single place? And so, there's a few ways to approach that. So, one way you could just say, "Well, let's anonymize it all." Take out patient identifying information and throw it into a central repository. And people, there have been some efforts in the past where they've done that.

Ben Taylor: 00:33:12 Federated learning allows you to just come up with a standard, and the machine learning models are actually learning at the individual hospital networks. And then you can imagine that the dumbest approach, is they would just average their weights together to a weighted average, in that [inaudible 00:33:25] the final linear model.

Ben Taylor: 00:33:29 Yeah. So, that's the idea of federated learning, is you could get around privacy law. And companies, this also becomes a problem when you get really big data sets. You don't really want them to be going around everywhere. Or like edge applications, you don't want all your data streaming up in Amazon to train something. You'd rather have it train locally and then send weights up at the end of the day.

Jon Krohn: 00:33:50 Right. That makes sense. And I guess another potential advantage to this is even when you remove personally identifiable information and you put that into a central database, there's a lot of circumstances where you might be able to figure out who that person was anyway. And as we collect more and more data, as your Apple watch and all kinds of examples, like the example you gave earlier of having a hundred cameras in your home, so that you can't have any accidents at home anymore.

Show Notes: http://www.superdatascience.com/433 20

Jon Krohn: 00:34:22 As we have more and more and more data, stripping PII out of that, you're like, "Okay, well, we'll take the person's name and birthday out of the year of footage that we have from their house." There's identifiable information in data that we have today, even when the personally identifiable information is stripped out. And as we have more and more, and more data from more and more sources, that's going to become more and more of a problem.

Ben Taylor: 00:34:51 And there's some very famous data challenges that have caused problems. I think the first Netflix prize, they were able to reverse engineer someone's identity in it. I think it ended up in lawsuit. I just know the first Netflix prize was a problem for that reason. So, thinking about all the different ways that you can anonymize data, if you get enough features, it gets a little tricky. So, that's the benefit of federated learning. And I know some groups are working on that now. But the COVID example I think is for anyone that should be very upsetting, that to say something, I'm amazed at that individual wasn't mad saying that out loud, that we haven't had enough people in Utah die. And it's like, how can you say that and not be angry?

Jon Krohn: 00:35:38 Yeah. So, I guess the trend here would be that these kinds of situations, it's similar to the way that delivering results. So, OKRs around AI ROI... So many abbreviations, I can't even speak. The OKRs around AI ROIs, so that becoming more prominent because of COVID, we'll see the same kind of thing with federated learning, where COVID shows how, if we'd have these federated systems in place before, then people in Utah could have been treated better than they received.

Ben Taylor: 00:36:17 Yeah. And especially on a global scale, as long as you have data standards in place, the whole globe could have benefited much, much faster based on the COVID data, what's the patient level data. If you have the same patient

Show Notes: http://www.superdatascience.com/433 21

electronic health record with the same format and features that becomes a lot more straightforward. In a federated learning really, nails it. Then the privacy concerns will be less of a concern. It's more about where the federated model is running and where's the central server that they can share information.

Jon Krohn: 00:36:51 I love it. That sounds great. Do you guys do a federated learning stuff at DataRobot?

Ben Taylor: 00:36:56 We don't right now to my knowledge.

Jon Krohn: 00:37:01 All right. Well, this next topic, this has got to be one that DataRobot is involved with, ML productionization. It sounds like that's one of the core things that DataRobot is doing, end-to-end type work with models. So, tell us about ML productionization and why you think this is something that is taking off right now?

Ben Taylor: 00:37:23 Yeah. So, AI has felt very experimental. And I've definitely been part of those experimental cycles where you're working in a notebook, trying to build a model, and now it's time to deploy it. And we hear horror stories about throwing Python code over to engineering, where they're not familiar with it, and you want them to productionize that code or like an R script or something. And in there are open source tools that are making that easier. It's really easy to throw a flask scrapper or something around your model call. And there's even open source packages that will do a lot of that work for you. But when it comes to productionizing models, there's so much more to talk about.

Ben Taylor: 00:38:01 So, one of the things we talk about is continuous learning. So, you're going to get more data. You're going to have things happen. You're going to have a regime change. You might have a COVID hit. Do you have a process in place for you to rapidly deploy new models?

Show Notes: http://www.superdatascience.com/433 22

And a lot of AI projects feel more like one and done. They start as one, the research efforts. And then they celebrate the milestone of getting one thing into production many, many months later.

Ben Taylor: 00:38:25 And so, the appetite to retrain the model, and even when they do, it's like they're starting all over again. They don't have a process for that. And they probably lost their initial training set. That's never happened to me before. Just kidding.

Ben Taylor: 00:38:42 I just laugh because I give all this advice, and normally I'm just throwing myself into the bus of like bad past behavior, where someone wants the model retrain a year later, and you're like, "Well, crap, you should have told me that because that first training set is gone or can't be found."

Jon Krohn: 00:38:55 Yeah. Or the way that the raw data were transformed that could be lost. So, people are often good. Maybe not you, but people other than been often good at keeping the raw data somewhere in cold storage. But when you're creating a model, you have all these different ways that you pre- process it, and you might not have the notebooks that that happened in anymore. You might not have any say versions of the data. And so, you might have some amazing results from a previous model and you have the charts, maybe print it out at the bottom of a Jupiter Notebook or that you put in that presentation to management last year. And then you're like, "We need to do this again. We need to retrain the model." And you can't even get the same quality result as you had a year earlier. So, I totally understand.

Ben Taylor: 00:39:52 Yeah. So, the continuous one on deployments one. The other one that I think needs to be a higher priority for people it's defending prediction level insights. So, if I'm giving you a prediction, I shouldn't just give you a

Show Notes: http://www.superdatascience.com/433 23

confidence score. I should give you a feature breakdown. I should tell the subject matter expert a story about why this prediction is predicting high or low, because you're giving them a little bit more intuition for them to raise an alarm and say, "Yikes, this is really scary. We need to escalate to the data science team." Where there's so many models deployed in production that are black boxes to the user. It's just, "Here's a confidence score. Live with it or deal with it."

Ben Taylor: 00:40:30 Then the other thing we were already talking about is feature drift. So, there's so many issues you can have with the model. And I think a naive data scientists might say, "Well, the models are static. So, why do I need to worry?" But your features aren't in. And it doesn't matter what data type you're using. You can get sensor drift. You could be doing deep learning. We install a new camera system thinking that's going to be helpful for you. And the data scientist is smart enough to know that that's concerning. But what if they're not the boots on the ground?

Jon Krohn: 00:41:05 Yeah. And behavior is constantly changing over time. So, it's probably even hard to find applications where the human behavior isn't gradually changing over time. Okay. So, maybe face detection, maybe faces don't change that much. But with anything, we work a ton with job descriptions and resumes in my field. And so, if I train a model and then a new JavaScript library comes out, I won't know what to do with that. If I don't retrain, it won't have any sense of what to do with that.

Ben Taylor: 00:41:38 Yeah. So, with our old startup, we had an experience, and this really surprised me. So, deep learning, I think sometimes bigger, always sounds better. I'm not talking about the ways, I'm just talking about the training set. So, if I told you I had trained on a data set that was a million. They're like, "Oh, that sounds pretty good." But if

Show Notes: http://www.superdatascience.com/433 24

I tell you I trained on 20 million, you think, "Well, that sounds really good."

Ben Taylor: 00:41:57 As I remember we trained a model. It was huge. We trained on think 20 million images. And it was a not safe for work model. And we'd go to deploy this model, thinking like this model has been trained on like all sorts of images, and we go to deploy it. And it had really high accuracy by the way. But we deployed into the wild and this starts misclassifying babies and diapers as being not safe for work.

Ben Taylor: 00:42:23 And the data scientists would just kind of laugh and say, "Oh, you just have to include more babies in your training set." But the thing that's more upsetting is we train on 20 million images. So, we trained on a huge dataset and it's still misclassified this baby. I like bringing up this example because I feel like it's an example where people can laugh about it, but pretend that this model was life and death. There are machine learning models that are life and death. How could you catch the first baby?

Ben Taylor: 00:42:50 So, we can have a thousand babies come through and say, "Oops, oopsies, we're going to react to this." But how do you catch the first baby and not act on it? You get the machine learning prediction and you actually raise an alarm. And that's why it's so important to have feature drift detection to actually say that, "This feature set coming through is unique enough compared to the training set. We need human eyes on it." And so, I feel like that's a level of maturity that it's been foreign to the experimental side of AI, but for people who deal in production. Yeah.

Jon Krohn: 00:43:23 Yeah. I mean, I feel like a really bad data scientist right now, because I don't have a feature drift detector of any kind. And it wasn't really aware that that was a thing at all. That's something that your company provides?

Show Notes: http://www.superdatascience.com/433 25

Ben Taylor: 00:43:35 Yeah. They provide it for their pipeline. And I've built that stuff in the past when I was at HireVue. Because a lot of companies that have feature drift detection it's because they've had to react, they've had an issue. They've had something that they've had a mess that they've had to react to, and they've had to build systems in place. But for people that are out of the gate deploying, it's pretty easy to expose a lot of liability with what they don't know, when it comes to by simplification, feature drift protection, continuous learning, model deployment, or even bake-offs, or can you deploy two models at once and see if one is beginning to outperform the other.

Ben Taylor: 00:44:14 So, imagine if I did like a weekly retraining, but maybe I have rules in place that the new model doesn't replace the old model, unless the performance is higher, because that might not always be the case. So, that type of business logic is stuff that should just happen. From a business perspective, that should just happen. But from an experimental side, that is a lot of new work that people have to sign up for.

Jon Krohn: 00:44:41 All right. Well, that makes a lot of sense. I agree with you wholeheartedly. I think that we are at this transition point and this is something that's accelerating. I know that there's a lot of demand for learning about ML productionization. Because O'Reilly, who up until COVID were one of the preeminent conference organizers in the data world, they now run online only conferences. They call Superstreams. And the next one they have coming up in March is on ML models in production.

Ben Taylor: 00:45:12 That's awesome.

Jon Krohn: 00:45:14 Big topic. And yeah, I mean, it seems with all of the considerations that a largely experimental person like me worries about, and I hand off a model to the engineers to put into production. It seems like there's maybe orders of

Show Notes: http://www.superdatascience.com/433 26

magnitude, more things to worry about in production. And I need to be more concerned about it and probably a lot of listeners do too.

Ben Taylor: 00:45:38 I think definitely, because engineers are very task oriented. They're going to just get something done in the sprint, get it put into production. And the idea of being proactive and thinking about the possible issues that data science team has to own that. The things we've talked about, you can't expect the engineers to say, "I knew we were going to have feature drift." Well, no, the data science team, it needs to be proactive and they need to have those conversations. And this probably leads naturally into the bias discussions because it's a [inaudible 00:46:07].

Jon Krohn: 00:46:07 Exactly.

Ben Taylor: 00:46:09 Every bad headline with AI companies doing something bad when it comes to bias, racism and sexism, if we wouldn't have to name names, but each one of those, they were reactive. They weren't proactive. So, if you can be proactive and think, how can this model amplify racism, sexism, age-ism, or other types of bias. Some industries make it easy because you have compliance. So, if you're dealing with insurance or HR, they'll have compliance, or banking. But I think everyone should just be proactive, that'll help a lot.

Jon Krohn: 00:46:46 I totally agree. I work in the human resources space, so we have to be able to demonstrate how our models do not make decisions differently based on gender or ethnicity. It's a huge part of what we do. But it is interesting how we've seen a lot of headlines in 2020 with big tech companies and models that they deployed having surprising results, for sure. If they had anticipated that it would have this kind of results, they probably won't have

Show Notes: http://www.superdatascience.com/433 27

deployed it, but they weren't looking at for future drift, like you're describing.

Jon Krohn: 00:47:26 So, in 2020, yeah, as I said, I think with the time that we have left and probably just trying to avoid getting ourselves in trouble and speaking about specific cases. There has been a huge splash in the past year around AI ethics. So, do you think that 2021 will be a turning point where companies, including the big 10 companies move away from talking about dealing with these issues to really getting at the heart of them?

Ben Taylor: 00:48:02 Yeah. That's a great question. I love this topic because I think when you see examples of big companies failing, people want to throw their hands up in the air to say, "AI ethics is too hard. It's too difficult." And I've definitely trolled people intentionally where I've posted things on LinkedIn before, where I've said... What have I said that has pissed people off? I've essentially said that solving AI ethic issues when it comes to racism and sexism is easier than rocket science. I think, I essentially said something, because I felt like it was fifth grade math. I felt like it was pretty straight forward to get ahead of this. And that really upset people. So, there's definitely an academic thread where they feel like this is the impossible problem, we can never fix this.

Ben Taylor: 00:48:50 And I want to make it really clear to people that are listening, I have a strong opinion that we're already making progress. So, I know there've been things in the news where a lot of people feel discouraged. We've had big setbacks. We already have the work you're working on, the work that we're doing at HireVue, Pymetrics, where every AI model that goes into production has an adverse impact report when it comes to productive classes. So, any model in production, go look it up and say, "Okay, they're held accountable for this."

Show Notes: http://www.superdatascience.com/433 28

Ben Taylor: 00:49:18 And then the other thing I like to bring up is if you want to accuse me of black mirror technologies, applying AI in HR, all I have to do is point out the human black mirror. And if you look what the humans are doing in these HR processes with our [crosstalk 00:49:33], there are so many skeletons in those closets of biases with attraction, age-ism. It's a very long rabbit hole of bias.

Jon Krohn: 00:49:47 Yeah. There's a story I know where I absolutely would not name anybody related to this, but I heard a story where a recruiter was told by a hiring manager, "I'm looking for people who played hockey at the same school that I graduated from."

Ben Taylor: 00:50:07 Oh, wow. That's very specific.

Jon Krohn: 00:50:11 Yeah. You're like, "Wow. You can't do that. You can't have that be what you're looking for in this role." It wasn't a hockey player role. It was a financial services role.

Ben Taylor: 00:50:25 Wow. Yeah. The problem with humans is they can't escape their unconscious bias. So, having a name on a resume, first and last name, it's impossible to not look at it. And that's why you're better off having systems scrub out the name. So, is the name really going to play a role in your decision? And if it is, you need to talk about that. Why is that so?

Ben Taylor: 00:50:51 The other fascinating thing about AI ethics in when it comes to HR analytics is there actually are examples where bias is allowed. And biases a lot, if it's mapped to performance. So, people don't really talk about this that often, maybe because it makes them feel uncomfortable. But if I have a call center, and if all my metrics are attached to first call resolution, customer feedback, like we come up with list of business metrics. That's if I'm building models map to performance, and if I have English as a second language where that's, if you have a

Show Notes: http://www.superdatascience.com/433 29

strong accent and it's negatively impacting my bottom line, in the US I can legally discriminate against people.

Ben Taylor: 00:51:31 I can't use their race to say, "Oh, I think you have this problem." I can put them through an assessment. So, if I'm putting them through an assessment, if you find out looking at the data that I do have some type of bias. Bias is fine, if it's backed by performance. Most companies can't back it by performance. So, you think of like a banking teller or a flight attendant, they can't back. So, the vast majority companies, they can't back with performance. But I think that's a very interesting thing that people don't talk about, that most of the time bias is not okay. But there are times when, think of like the beauty industry, you're going after a certain demographic, and you have a lipstick model you're trying to hire, it's probably not going to be you. It's not going to be me.

Jon Krohn: 00:52:09 I have great lips.

Ben Taylor: 00:52:13 You do. [inaudible 00:52:14].

Jon Krohn: 00:52:14 There's a reason why I'm a podcast host.

Ben Taylor: 00:52:23 AI, we are making progress. So, despite the negative news, we are making progress in general, in the AI community, when it comes to the research that's being done, and the progress with bias models going into production. It still happens, but not for the people that are really dedicating their careers on this topic.

Jon Krohn: 00:52:43 Yeah. I also think that there is real progress being made. And you even see it in a lot of the big academic conferences, like NeurIPS, they have tracks for AI ethics. And if you're doing research on AI ethics, I think you have a better shot than maybe in a lot of other categories to have your research be featured at the conference because the conference appreciates how important the topic is.

Show Notes: http://www.superdatascience.com/433 30

Ben Taylor: 00:53:09 Yeah. And real quick, I was going to add for your listeners, there's a famous dataset called the first impression data set. It's a YouTube dataset. It's seven seconds of someone talking into a camera. Someone went through and paid mechanical Turks on Amazon to score people on big five personality, which is interesting. But then my favorite one is they said, whether or not you would give this person a job, what's the first impression? Hence, the name of the dataset.

Ben Taylor: 00:53:35 And so, I was one of the first people that analyzed that entire data set for race, gender, and attractiveness. And I presented my results in Chicago, and showed that it was racist. So, it was racist against black women. And it was even more upsetting to show that there's a very strong attractiveness bias. So, if you're in the top 10% for attraction, men and women, the bias was bigger for women. And this really shows you the human behavior. And it was interesting because showing you that a CYAP, it's a psychology conference, the people in the room, they always knew this, but this was like the first time they'd seen a lot of data to back it up. But if anyone's interested in that, the blog is called Racism Under Every Rock. It's on LinkedIn. It's pretty easy to find on Google, Racism Under Every Rock, and you'll find that analysis.

Jon Krohn: 00:54:24 Nice. That sounds very interesting. So, we are rounding through now the topics that we wanted to cover today. I just have one last question for you. Because at least I have an answer prepared for this. Do you think that there are any particular software languages or packages, or tools that are going to take off in 2021 that data scientists really need to be on top of?

Ben Taylor: 00:54:52 That's a great question. It's interesting, because I've lived and breathe deep learning steadily for the last four years. Right? I was presented to the MXNet Group. At Amazon when I started deep learning, I start with MXNet because

Show Notes: http://www.superdatascience.com/433 31

they had the best performance at the time. But since then, I think TensorFlow and PyTorch have caught up. Amazon uses MXNet internally at scale. It's interesting because everything's always moving.

Ben Taylor: 00:55:18 So, I'm an outspoken critic of TensorFlow. I hate TensorFlow with passion. And really, people that have criticized me, they say, I'm leaning too much on TensorFlow 1. And I've talked to Google employees that have said TensorFlow 1, version one was terrible, just a technical debt nightmare. And I think I just can't forgive TensorFlow for that. And I know people are saying TensorFlow 2 is better.

Ben Taylor: 00:55:44 The problem with me is I've come up to speed on MXNet, where I feel like I'm an expert. I know that [inaudible 00:55:52] and MXNet inside out. And if I was starting over, maybe I'd focus my attention on PyTorch because I think it's getting traction. There's a lot of interesting things happening there.

Ben Taylor: 00:56:02 I'm sad because Keras was so intuitive before it got mixed up in TensorFlow. So, the original Keras package, it was beautifully written. You jumped into the code. The code was really easy to understand. The example I gave was their image processing. You went to pre-processing, there's like an image short pipe file, that all these complicated image transforms. And you go read the functions and they were beautifully written. So, Francois, the original author was coding genius. But Keras has never, it never hit the performance milestones, like for throughput, for production grade.

Jon Krohn: 00:56:37 Yeah.

Ben Taylor: 00:56:37 So, what do you think are the emerging packages for 2021? Because you're deep in this space too.

Show Notes: http://www.superdatascience.com/433 32

Jon Krohn: 00:56:44 Yeah. So, I've never used MXNet. I've looked at some MXNet code, and convinced myself that if I had to use it, I could probably figure out how to do a lot of the high level stuff. Primarily, so I got started with TensorFlow 1 and Keras. And I now am adept to TensorFlow 2. I've done lots of instruction in TensorFlow 2. I probably teach in TensorFlow 2 the most. But I love PyTorch. I use PyTorch now, wherever I can. Currently, the production models at my company use TensorFlow. But I would not be surprised if in the future we were using PyTorch. In 2021, I wouldn't be surprised if we switched over.

Jon Krohn: 00:57:35 Some stats that I was looking at earlier this year in terms of Google search popularity, PyTorch has pretty much caught up to both the TensorFlow and Keras names. And in terms of job postings in the US at least, for every three job postings that mentioned TensorFlow, there's two that mentioned PyTorch. So, it is really catching up. And I completely understand why.

Jon Krohn: 00:58:01 I think that TensorFlow 1, yes, that was especially clunky to work with. So, there was this multi-step process, a three-step process for you to even do the simplest, "I want to add variable X to variable Y." It was this three-step process of allocating those variables to memory before you could actually put flow data into them and add X plus Y. So, I think that's probably what people are talking about when they talk about TensorFlow 1 being difficult. But I still think the TensorFlow 2 is a lot more difficult than PyTorch.

Jon Krohn: 00:58:40 If you're very, very comfortable in Python, you're used to code being Pythonic, you're used to working with NumPy, SPy, and Cyclone, PyTorch feels exactly the same as those. In fact, the vast majority of the time, the way that you would do something in NumPy, is exactly the same in PyTorch or very similar. And so, it's so easy. And in a way that's hard to describe, I have fun using PyTorch. Even

Show Notes: http://www.superdatascience.com/433 33

though I've used PyTorch for way fewer hours of my life than TensorFlow, I enjoy doing it.

Jon Krohn: 00:59:15 And now when I'm presented with from simple problems of, if I want to calculate the partial derivative of something related to something, I want to use PyTorch instead of TensorFlow. All the way up to the most complicated things, like billing a deep neural network, I want to be using PyTorch instead of TensorFlow. So, I will not be surprised if it overtakes a TensorFlow in Google search popularity in 2021.

Ben Taylor: 00:59:41 I think the DeepMind folks at Google use PyTorch. So, even their own TensorFlow, they... And I think I heard that somewhere. I'd love to throw in a reference. It's interesting, because TensorFlow does make me mad. And I've had all these arguments about it, where I think the thing that made me mad was people would say, "Well, it's the most popular. So, it's obviously the best." And it's the most popular for people. If it didn't have the Google backing, would it be popular or would it be abandoned? Well, that's just my opinion. I felt like. But then the Bazel compiler nonsense. And then if I had a different GPU driver. TensorFlow had a really hard time keeping up with the latest GPU drivers, where MXNet and other platforms are always ready to go. They're compiled, that you can PIP install. For TensorFlow, so many people were stuck having to compile their own TensorFlow to use the latest drivers. So, it was just really, really poor software support from Google side.

Jon Krohn: 01:00:39 The thing that TensorFlow still has today is it has so many side libraries for data pre-processing, input, output, surveying, having your model packaged up smaller for running on an embedded device or in someone's browser. So, there's all of these cool-

Show Notes: http://www.superdatascience.com/433 34

Ben Taylor: 01:00:57 Oh, Yeah. You have like TensorFlow JS, right? Or JS TensorFlow.

Jon Krohn: 01:01:01 Yeah, exactly. Tensorflow.js allows you to have a model run in somebody's browser. And you could have trained that model on your local machine or across a number of servers. Tensorflow Lite allows you to package it up and put it on a phone that doesn't have the same kind of compute as you'd expect on a server or a laptop. And so, all that extra ecosystem allows you to potentially have more deployment flexibility.

Jon Krohn: 01:01:29 But something that I think people don't talk about enough is there's something called the open neural network exchange, ONNX. And if you Google GitHub ONNX it brings you to a GitHub page that allows you to port your models between different deep learning libraries, including between PyTorch and TensorFlow. And so, I enjoy as I've already described, making my models in PyTorch. So, I could have a great time, having a great day working in PyTorch all day. And then at the end of the day, use ONNX to port it over to TensorFlow, and then deploy it in a series production system.

Ben Taylor: 01:02:11 We did the same thing with MXNet, because MXNet it's really good at multi-GPU. And I think TensorFlow for a very long time, struggled with multiple GPU utilization or just the way that they handled that. So MXNet, it's literally a Python list, if you want to do multiple GPU. But in TensorFlow, I don't know what 2.0 was now, but 1 was this graph object. It was so complicated. But we would train on MXNet, and then we would use the open neural network to deploy the core ML for iPhone or TensorFlow Lite.

Jon Krohn: 01:02:43 There you go.

Show Notes: http://www.superdatascience.com/433 35

Ben Taylor: 01:02:44 Yeah. One thing I forgot to add on that list just real quick, is [inaudible 01:02:49] so funny to me because before COVID, I remember there was a data scientist I'm friends with in Gunnison, it took him six months to get a remote work job in data science. And today that's laughable because if you're good, you can be in Gunnison. People don't care where you are. But before COVID, Gunnison is in Colorado, so it's an outdoor paradise. So, if you're an outdoor junkie and you want to ice climb, go live in Gunnison, there's zero tech there. And so, for people in data science, we typically knew that some type of tech center sort of, at least somewhere they'd have a meetup for you to go and learn. And now, with COVID, the opportunities for remote work are endless now, as long as you're you have good internet.

Jon Krohn: 01:03:35 Yeah. And I guess that is something that we can anticipate is going to change the workplace forever. I was blown away. We did a survey at my company last week of, so it was just in town hall asking people, when we can return to the office safely, "Would you like to be in the office full time? Would you like to be split between office and remote, but primarily office than the other way around?" So split, but primarily remote, or totally remote. And there's over a hundred people in my company, not one person said full-time in the office. And I didn't expect that.

Jon Krohn: 01:04:21 I love being in the office. But even for me having the flexibility and being able to at least spend a couple days a week working from home, I think that there's a lot of advantages to that. I don't have kids, but I can only imagine how much more flexibility that provides if one parent can work from home, Monday, Wednesday, the other one Tuesday, Thursday. I mean, it just makes, I can imagine that makes life a lot easier.

Show Notes: http://www.superdatascience.com/433 36

Ben Taylor: 01:04:43 Yeah. The funny thing is working from home is a lot of these homes that have been built, they haven't been built with work from home in mind. So, if you walk into the classic home and there's the front office right through the front door, you walk through the front door and it's like, oh French doors, front office. That is the dumbest design ever, because where is your noise buffering.

Ben Taylor: 01:05:03 And so, we're building a house right now where my office is on the other side of the master bedroom. So, huge buffer with sound. So, kids can be pulling out here and I don't care if I'm on a podcast or something. I obviously care, but I can't hear them where that your standard design of a house with office right next to the front door. I don't know why that is. You're actually going to invite someone into your office like, "Here, come sit down." I don't know.

Jon Krohn: 01:05:30 Yeah. I mean, I don't really know anything about that, because living in New York the idea of being able to be far away from someone else in the apartment, I can imagine. So, I didn't know that people were building their offices behind French doors, right inside the front hallway. Yeah, that doesn't seem to make a lot of sense to me. We actually, we had the experience when you were on [inaudible 01:05:54] early in 2020 of your kids. Yeah. As we were recording the podcast, we didn't video it all. So, I didn't know what was going on, but we could hear thumping, [inaudible 01:06:04].

Ben Taylor: 01:06:04 Yeah. They were banging on the door. It's so funny that all the emergencies they have during a meeting. I haven't had this during keynote, but I was moderating a panel and there's some emergency, and I had to hurry and ask the question to the panelists, then put myself on mute. The emergency was, "My X-Box controllers aren't working." Or something where [inaudible 01:06:25] emergency. So, anyway, yeah. I can't wait for COVID to be

Show Notes: http://www.superdatascience.com/433 37

over, I'm sure like you to get back to traveling. I miss traveling around the world and seeing, meeting different people.

Jon Krohn: 01:06:41 Yeah. So, yeah. Definitely. I mean, I haven't been able to see my family since COVID hit. So yeah, very much looking forward to being able to see them again, all kinds of friends all over the place, speaking at conferences, meeting people at conferences. But yeah, I guess, it is interesting that that wasn't on the list. I guess, it's just so obvious and you don't even really think about it being specific to machine learning or data science, or anything. But absolutely, I think it's safe to say that in 2021, there will continue to be remote work even after vaccines are widespread.

Ben Taylor: 01:07:17 Yeah. Which will be interesting. Because I really miss whiteboard collaboration. And I'm sure [crosstalk 01:07:22] say that there is a remote work alternative, but I haven't seen it.

Jon Krohn: 01:07:27 Not the same. It's just, it's never the same. That's the thing I miss the most, is for my science, R and D with my team being able to say, "Okay, look, you've been banging your head on the wall with this particular problem for three stand-ups in a row now, three days in a row in stand up meetings. I don't fully understand what problem you're running into, but let's book a conference room for two hours this afternoon. All of us are going to go in there. You're going to explain from scratch what your problem is that you're working on." And nine times out of 10, we come out of there with the solution that ends up working, or at least a path that gets us there.

Jon Krohn: 01:08:08 And I mean, no matter what, people aren't paying attention when they're looking at their computer and the million other things. Even if you genuinely don't have anything else on your screen, which probably almost

Show Notes: http://www.superdatascience.com/433 38

never happens, you're still, your mind because you're used to this device being a device where you have access to all of these other kinds of applications, you're not there and present with the problems in a way that you are in a conference room with a notepad and a whiteboard.

Ben Taylor: 01:08:43 Yeah. Yeah, definitely. Hopefully with the Oculus or some VR office set up, we can feel like we're in the same room, white boarding.

Jon Krohn: 01:08:51 It's possible.

Ben Taylor: 01:08:53 Yeah. Nothing in the short term that'll be a ways out.

Jon Krohn: 01:08:58 Yep. All right. So, we're wrapping up here Ben, but we always end the program with asking for a book recommendation. What are you reading these days?

Ben Taylor: 01:09:06 I've got two books here. I'll show you real quick. I wasn't planning on-

Jon Krohn: 01:09:09 Oh man. I guess it's your fourth time on the podcast, you're like, "I've got to know. I got to make sure I have my books right here."

Ben Taylor: 01:09:17 So, Hooked, it's about building addictive products that are engaging. So, it talks about different gamifications, they talk about the hunt, the tribe, and self motivation. So, think about, it gives you insight into why Facebook works and LinkedIn works, and these different products, why you keep coming back every day.

Ben Taylor: 01:09:35 And I just got this one today, Deep Reinforcement Learning With Python. It is using TensorFlow 2, and the OpenAI Gym toolkit. So, I just got this in the mail. I haven't gone through it yet.

Ben Taylor: 01:09:48 But the book that is blowing my mind the most is the book called Immortality Key. The author is Joe Rogan.

Show Notes: http://www.superdatascience.com/433 39

Book came out three or four months ago. And essentially, this guy spends a decade doing research on the origins of Greek witchcraft and their possible influence on Christianity. But the whole like evil witch throwing a toad in her potion, I guess that [inaudible 01:10:17] happened. And those potions actually worked because they were psychedelic.

Jon Krohn: 01:10:23 Sure.

Ben Taylor: 01:10:24 These long traditions of Greek families having psychedelic wines. And they'd find lizard bones in these wines in Pompei. And so, the book goes through this whole history where you realize that, they actually had witches, and the witches were vilified and burned at the stake because they had a sacrament that would take you, you would meet God on this sacrament.

Jon Krohn: 01:10:49 Right.

Ben Taylor: 01:10:49 So yeah, it's been super interesting. Not your typical book recommendation, maybe.

Jon Krohn: 01:10:56 No. I mean, that's great. It was Immortality?

Ben Taylor: 01:11:00 Immortality Key. It's on audible.

Jon Krohn: 01:11:02 Immortality Key.

Ben Taylor: 01:11:02 Yeah. I recommended the audible, because the author reads some of the original Greek passages and he sounds fluent, so pretty engaging.

Jon Krohn: 01:11:14 Nice. And I understand that you have a podcast that's about to overtake Joe Rogan's in popularity in 2021. That's my big trend prediction. So, it's the More Intelligent Tomorrow Podcast.

Show Notes: http://www.superdatascience.com/433 40

Ben Taylor: 01:11:26 Yeah. We've had a lot of fun with it. We've got DataRobot customers that go on there, but I'd say more than half of our guests are not DataRobot customers. So, we've had Congressman Will Hurd on, he led AI strategy for the US Senate. And then we've had a bunch of CEOs, CDOs and different people talking about AI ethics.

Ben Taylor: 01:11:48 So, you know this being a podcast host that everyone you interview, you feel like it changes your own perception. You get a little bit smarter. You see the world a different way than you saw it before. And so, for me, that's the biggest joy of my current job. As you interview... Hannah Fry, she was one of our guests. She wrote Hello World. So, these people are so smart. And to be able to interview them in a freeform way and ask them whatever question comes into your mind is super helpful. At least for me, hopefully it's helpful for the audience, right?

Jon Krohn: 01:12:20 Yeah. I mean, that's what we're trying to do here. And I have little doubt that the audience loved having you on the program today. I'm pretty sure it's a record for SuperDataScience to have someone on four times. And so, yeah, we'll have to bring you on again soon. It's been an absolute pleasure having you here. Yeah. Is there anything else that you'd like to leave the audience with today?

Ben Taylor: 01:12:48 Just follow [inaudible 01:12:49].

Jon Krohn: 01:12:49 [crosstalk 01:12:49] stay in touch with you.

Ben Taylor: 01:12:51 Oh yes. You can reach me, Ben Taylor Data on Twitter. I check Twitter maybe once a month. You can find me on LinkedIn. That's where I'm the most engaged on LinkedIn. Best way to get a hold of me. I think the last thing I'd leave with audience is, the AI machine learning space, it's so fascinating. Fall in love with it. Be selfish. Find projects that wake you up early on the weekend.

Show Notes: http://www.superdatascience.com/433 41

Ben Taylor: 01:13:18 We live in one of the most exciting times, who knows what we'll be talking about 20 years from now. And I'm sure many of your audience members will be involved in these unbelievable breakthroughs with AI. So very, very exciting time to be alive.

Jon Krohn: 01:13:32 Yeah, that's great. That's a powerful message. And yeah, since this is my first podcast with a guest, I guess I should also be mentioning that LinkedIn is definitely the best way to get in touch with me. I can confirm that Ben, is very active on LinkedIn. And also, that if you're a misogynist in private messages on LinkedIn, you are not safe because Ben, will post it. Be on your best behavior. I mean, this is just like, if your mom wouldn't be happy with your behavior, whether it's in private messages or a public forum, then maybe you shouldn't do it.

Ben Taylor: 01:14:10 That guy actually deleted his LinkedIn by the way, because that I was [inaudible 01:14:15] like 70,000 views and people were tagging him. But yeah, it's just the tip of the iceberg, the private messages that women get on LinkedIn.

Jon Krohn: 01:14:27 I can't imagine. Yeah. So yeah, I mean just a bit of backstory there. It probably doesn't make any sense if you don't know what I'm talking about. But somebody wrote this really inappropriate message to a woman that he didn't know on LinkedIn. And yeah, Ben, shared those one way messages from that guy to that woman on LinkedIn. And I'm not surprised it took off. So, we need more of that. I celebrated it.

Ben Taylor: 01:14:58 I didn't edge out his name. So, some people gave me a hard time for that. They said, I should have reached out to him, "Why did you call this girl that? What was going through your mind?" It's like, I think we all know what was going through his mind.

Show Notes: http://www.superdatascience.com/433 42

Jon Krohn: 01:15:10 Yeah. I think, yeah, there's some situations where it's like, if it would have been some borderline behavior and you're like, "Maybe we're misunderstanding what's going on here. Maybe this guy deserves a second chance." Absolutely not. In the scenario that you called him out, that was completely inappropriate and inexcusable. And I think you did the right thing.

Ben Taylor: 01:15:29 Well, that's good. Sometimes you throw something out there and then you get pushed back, and you're wondering. But now I have a PR group at DataRobot that I have to answer to sometimes to. When I was self- employed I could say whatever the hell I wanted, whatever I wanted. I have to use my filter a little bit more these days.

Jon Krohn: 01:15:51 All right. Well, wonderful having you on the program, Ben. And we will speak to you on the program again soon.

Ben Taylor: 01:15:59 Awesome.

Jon Krohn: 01:16:05 Wow. What an episode? Thanks to our fabulous guest, Ben Taylor. We covered all of the essential trends coming up in 2021, ranging from tackling biased models, to federated learning, and for by our love of the PyTorch deep learning library, to how to deliver business value with data science.

Jon Krohn: 01:16:25 As always, you can get all the show notes, including the transcript for this episode, the video recording, any materials mentioned on the show, as well as a URL to both, Ben Taylor's LinkedIn profile and my own at www.superdatascience.com/433. That's superdatascience.com/433.

Jon Krohn: 01:16:46 If you enjoyed this episode, make sure you leave a review on your favorite podcasting app or on YouTube, where you can enjoy a high fidelity video version of today's

Show Notes: http://www.superdatascience.com/433 43

program. Since this is the first full length SuperDataScience Podcast episode that I'm hosting, I'd particularly love to hear your feedback. Please comment on YouTube or make a post tagging me on LinkedIn or Twitter. I'd be delighted to hear your thoughts, especially if they're constructive.

Jon Krohn: 01:17:16 All right. It's been fun. Thank you. Looking forward to enjoying another round of the SuperDataScience Podcast with you soon.

Show Notes: http://www.superdatascience.com/433 44