Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #25: Could we have more in-database machine learning please?

Return to episode page view on github
Recorded on Wednesday, May 10, 2017.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:06 This is Episode 25, recorded May 10th, 2017.

00:11 I'm Michael Kennedy.

00:12 >> I'm Brian Ocot.

00:13 >> We've gathered up a bunch of cool Python things to share with you this week.

00:17 So Brian, I want to start with some news coming out of Microsoft's biggest developer conference this year.

00:23 There's some actually Python news, which is cool. You wouldn't expect that, right?

00:27 >> Right.

00:28 Yeah, they actually did a whole section on machine learning and AI and some very cool things.

00:33 But what I want to talk about is one of the biggest databases in the corporate space, most popular ones is Microsoft SQL Server.

00:41 So the thing that I want to point out or talk about is they've just announced a very interesting feature and I'm kind of hoping other database providers copy this like straight away.

00:52 So what they've announced is in database machine learning.

00:57 - Wow.

00:57 - So, yeah, it's crazy, right?

00:59 Like, wait a minute, what does in database machine learning even mean?

01:02 So here's the idea.

01:03 - Yeah, exactly.

01:03 - Like, if you are gonna transfer a lot of data, machine learning or otherwise, and you've got one server over here with your data and another server that's executing it, then you've got the network latency, you've got the crossing process boundaries, you've got all sorts of latency work in there.

01:21 So especially if you have a chatty API, this can be problematic.

01:24 But what this new feature is, they have now built the ability to run CPython 3.5 in process in SQL Server.

01:34 And you can install external packages.

01:36 It comes built in with some of the machine learning packages already there.

01:40 It runs a subset of the Anaconda distribution included right there.

01:46 So inside your database, you can basically install Python scripts and do full-on machine learning like with zero latency to your data.

01:53 - That's pretty cool.

01:54 - I think it's really cool.

01:55 You might have to go back to teaching people about Microsoft products.

01:59 Yeah, I'm not so sure I'm going that far.

02:03 In fact, what I would really like to see is other databases, other database providers take this on and go, "This is a cool idea.

02:12 Can we put this in other places?" Like on MySQL, on MongoDB, I think it would be super cool to see it there.

02:18 I mean, you kind of have that with SQLite in reverse, and your database runs in your machine learning process rather than your machine learning runs in your database process but if you're already for some reason using SQL Server like you know check and you want to do machine learning check this out this is a pretty cool feature yeah that's pretty neat all right awesome okay that's really neat I want to talk about some real fake stuff and actually a tool called faker so there's an article to introduce fakers been around for a while but there's a new article on the semaphore blog called generating fake data for Python unit tests with Faker. I had heard of it and I hadn't played with it before so the article is pretty neat but I played around with it this this afternoon and what Faker is a way to you know basically generate data for you just random stuff but in the right format and the list of stuff that that Faker can handle to generate is definitely can do like the lorem ipsum type things just some random text. But you can also do credit card numbers and phone numbers and URLs and a lot of stuff that you would want to do to to be able to fill out a set of data to make it look real and without to test the system. That's cool. I haven't to. Yeah, I see two major uses for this. I agree that Faker is awesome. It's no joke. So they're basically you install Faker and you can go to it and say, give me some words, give me a name. And if you save Faker dot name, it'll be like Joshua Wheeler, give me a month for give me a sentence, and I'll give you a sentence. Give me a state Michigan, give me a random number like you can ask for all these different things. One of the really good uses for this is if you're doing web development, and you don't have any data yet, it is super hard to even write the code to process the sequences, but also very hard to do the design of like, "Well, how is this supposed to look?" And having real-ish data makes that process so much easier.

04:15 And it's really easy to go, "Give me a month, "give me a year, give me a state," things like that, and generate fake data with this.

04:22 The other one, obviously, is with testing, right?

04:24 Like, instead of having all the trouble of coming up with these things for the fake pieces of data you're gonna pass, and you don't necessarily wanna hard-code it, maybe that's gonna put some dependency on that hard-coded value in your test, like, just run Faker across your objects and fill them up.

04:38 it has some in it some things that you don't really think about like i ran the phone number a few times and it listed phone numbers with extensions phone numbers with dashes phone numbers without so phone numbers with parentheses and stuff that you probably should deal with but might not come up with on your own i was looking through and one of the neat things is it has pi structures too it has uh...

05:02 under the py section you can generate a pi dictionary or basically get a dictionary or a tuple or set.

05:11 And it just comes up with random tuples and random dictionaries.

05:14 It's pretty cool.

05:15 Wow, how cool.

05:16 I didn't even know about the Pi section.

05:17 You can also switch it to multilingual, so US English, Japanese, Italian, Russian, and so on.

05:24 Oh, really?

05:24 Yeah, and so if you were doing localization, what would it be like if I got a Russian name in here?

05:29 Would my system still work?

05:31 Well, try it.

05:33 So that's pretty cool.

05:34 I like it.

05:35 Yeah.

05:35 you need fake data, check out faker.

05:37 Seems funny to say, but you know, yeah, indeed, it does. So I, Brian, I totally skipped over your first one with Stack Overflow trends. That's pretty exciting.

05:47 Oh, yeah. So let's, let's go ahead and talk about it. So Stack Overflow trends, Stack Overflow came out with a tool called Stack Overflow trends. And the article that they have to introduce it, the first example that they show is Python overtaking PHP for questions asked per programming language. Of course, they only compared to PHP, Perl, and Python. And Perl, apparently nobody asked questions about Perl.

06:19 Yeah, Perl is not a growing area of study. I think the closest analogy to this would be like, what does Google Trends do? This is like, that does that for searches. This is like the same type of tool but for stack overflow popularity.

06:33 Yeah, I think it's neat to look at like what kind of questions people are asking and how that grows.

06:40 And there was definitely a steep so there was Python was fairly around a fairly flat from like 2008 through 2012.

06:52 And then a sharp curve up just starts taking off.

06:56 So yeah, it's really, it's really great.

06:58 But yeah, it's just like somebody flipped a switch in 2012 and like, you know, the Python is growing.

07:04 It's awesome.

07:05 Yeah.

07:06 So if you want to study things, definitely this is a place to go do it.

07:09 You know, maybe you're looking like, what should we base our next project on?

07:13 What are the future trends in programming technologies?

07:15 This is a good tool for that.

07:16 And it's great to see that highlighting Python, the growth and popularity.

07:21 Yeah.

07:22 So normally we would have like a sponsor spot right now.

07:25 Yeah.

07:26 But there's like this quiet period, right?

07:28 Yeah, so no sponsor this week you guys what you know we have upcoming sponsors they're kind of playing stuff out in sort of sparsely.

07:35 But if you're out there and you're like hey my company wants to get the word out to Python developers send us a message just go to the contact page on Python bytes out of them and we'd love to talk to you.

07:44 All right.

07:45 I get questions all the time from people who are learning to code.

07:48 And one of the guys on Twitter Alan Jones sent us a message about a pretty cool medium article that really is very data-driven about people learning to code. So this article is called "We Asked 20,000 People Who They Are and How They're Learning to Code." So that's a lot of people.

08:08 Yeah.

08:09 Now, they said, all right.

08:10 They probably did Skype or something because that would be a really big phone bill.

08:13 Yeah, I'm going to mail you a letter. No. So they said, all right, who participated?

08:17 Well, there's 20,000 people who did this survey, and most of them have been coding for less than five years. 62% live outside the US. This is interesting, their average age of people learning to code is 28. So I get messages all the time like, hey, I'm 30. There's no way I can learn to code like, you're with these 20,000 other people, right? It's not that uncommon. That's actually the average age. And if you're over it, right, you know, it's still a lot of definitely an age range. And they have an interesting there's many interesting pictures in this article and graphs. It's like data analysis type thing. And they've You've got average age to learn to code by country.

08:55 You look at France and the UK, and those guys are in the 30s on average.

09:02 You look at India, and they're in their teens on average, which is, I don't know what that means, but that's interesting.

09:08 Another interesting stat that I thought we could pull out is 19% are women.

09:13 While obviously that is super low compared to where it should be, that should be 50%, But still 19% is, I guess it's higher than I expected.

09:22 And it kind of made me happy 'cause I feel like it's a positive trend, even if it's not where it should be.

09:26 The average person coding, learning code has been coding for 21 months and 25% of them already have the first job.

09:33 So there's a bunch of cool stats like this that you can go and pull out.

09:38 So check out that article.

09:39 We asked 20,000 people who they are and how they're learning to code.

09:43 - And almost 60% wanted, 59% wanted to become full stack web developers.

09:48 - Yeah, it's interesting, right?

09:49 Like the web definitely factors heavy with data science being number two.

09:53 So you can imagine, this is not a Python only study, right?

09:57 There's just people learning the code, but you can imagine Python is playing a heavy role in those two areas.

10:02 They also have a podcast section, which is kind of cool.

10:04 - What do you mean?

10:05 - They have a section of what podcasts people who are learning codes or code listen to.

10:09 - Okay, are we on there?

10:10 - Talk Python is, Talk Python is, but I didn't find Python Bytes, unfortunately.

10:14 but that's because we're still letting them know.

10:17 - Yeah.

10:19 Well, I'm glad that talk Python on there.

10:20 That's pretty cool.

10:21 Congratulations. - Yeah, it's pretty cool.

10:22 Thank you.

10:23 Would you say that it's an anomaly that Python bytes wasn't on there?

10:27 - I think it's just 'cause we're new.

10:29 We don't really teach people how to code though.

10:30 - No, no, no.

10:31 I think this is--

10:32 - Oh, you were trying to do a transition.

10:33 Oh, that's so cool.

10:34 - I was trying to.

10:35 So our next item is about anomaly detection.

10:39 - Yeah, anomaly detection.

10:41 You have to forgive me.

10:42 It's almost midnight here in Munich.

10:43 - That's right, you're still on your German tour.

10:46 - Yeah, two more days.

10:47 There was a really great article and I should have written the person's name down called "Introduction to Anomaly Detection." And it's kind of a link to Emmanuel Ruff, Emmanuel, I can pronounce that part.

11:01 But using Python, but using it for an interesting piece of a need for data analysis is anomaly detection.

11:11 basically looking at a whole bunch of data from something and finding the ones that you don't really know what the trend is going to be but the ones that don't fit whatever the trend is for everything else and It's a actually just a fascinating couple of Pages on here and there's there's code exam code samples. I'm not doing it justice talking about it, but it's a talking definitely a well thought out well studied article from datascience.com.

11:42 Yeah, they have like a couple of areas that they focus on. They've got like the types of categories of anomalies and like the ones you might think of which are they call point anomalies.

11:51 So detecting credit card fraud based on amount spent like I live in the US, somebody tried to buy $1,000 worth of lumber in Mexico with my card. No, that's probably not okay. Real story. So Then they have contextual anomalies.

12:10 They say sometimes these things make sense, but only within a context.

12:14 For example, spending $100 on food every day is totally reasonable on a vacation, but it's odd if you're not on vacation.

12:21 Can you determine are they on vacation?

12:23 Or collectively, copying tons of data off network servers might look like you're trying to steal data if it knows that you're doing this all over the place, but copying one big file would mean nothing.

12:34 - Right. - Yeah.

12:34 - Yeah, so they basically break it down by those three categories.

12:37 It's pretty interesting, all the machine learning based approaches and stuff.

12:41 - Yeah, and the math behind it, like the moving averages and the K nearest neighbor and K means algorithms.

12:48 - Oh, nice. - Things like that.

12:49 - Yeah, absolutely.

12:50 Very, very cool.

12:51 - I think I'm gonna use the K nearest neighbor just in random conversation tomorrow just to make me sound smarter.

12:57 - Where should we go to eat?

12:58 I don't know, but we're gonna have to apply the K nearest neighbor to these restaurant choices and figure it out.

13:02 - Yes, definitely.

13:03 I want to close this out with a message from the Beware guys.

13:10 Beware is a cool project that really does a bunch of fairly unique things.

13:16 It supports running Python apps on things like iOS and Android, macOS apps that are native .app files in Python, two alternate Python implementations, some cross-platform widgets and a couple of other things.

13:33 So it's done by Russell Keith McGee, and it's been going on for about four years.

13:37 So really great.

13:38 And he posted a thing that said a request for your help.

13:42 So basically, he's been working for a company that's largely funded the development or the furthering of these projects, right?

13:50 So they've got like, extensive improvements for like this cross code compiler, an Android back-end, Django back-end for these Toga apps that can be run as web apps or local, Windows Forms .NET UI for Toga, so you can have a Windows app that has a modern, natural appearance on Windows, all sorts of cool stuff.

14:10 Cool project.

14:11 And so obviously, with the request for help, what is up?

14:16 Well, his contract ended, so now he's like, "I don't have all this time and energy I can put in here.

14:22 I've got to go back to work." The reason I'm bringing it up is we've got a lot of projects that are looking at different funding models to allow people to work on it.

14:32 There's the pretty standard, I'll create a project and then try to sell consulting on top of the project.

14:37 There's more interesting platform as a service type things that people are doing.

14:42 The Redash guys that I talked about last week on Talk Python have hosted versions of their open source thing.

14:49 You've got the Scraping Hub guys with Scrapy doing their infrastructure as a service or platform.

14:55 Well, web scraping as a service, right?

14:58 All these are very interesting.

14:59 So basically Russell says, "Hey, could you sponsor my project?" And one, check out his page.

15:06 You can become a member and give him 10 bucks and help keep this moving because he's doing a bunch of cool stuff.

15:10 But also if you have a project and it needs funding, think about what he's up to.

15:16 Does it make sense for your project?

15:17 Things like that.

15:18 Yeah.

15:19 - I do too.

15:20 So I definitely think the B-Ware project has huge possibilities for where it could help people.

15:27 And certainly if you just want to work on an open source project, people ask me all the time, like, "Hey, can you recommend a project I could work on?

15:33 'Cause I just want to get started on something and I don't really know enough to pick something myself." I think B-Ware's a really good one.

15:40 They have a very welcoming, explicit way of onboarding people who are new to open source.

15:45 So that's also a way to help them out.

15:46 - Definitely.

15:47 - All right.

15:48 - Yeah, good luck.

15:49 That would be cool to see that keep growing because it's doing cool stuff over there in that project.

15:54 All right, well I have one shout out for us out of my own personal news, Brian.

15:59 - Okay.

16:00 - There's a brand new PyCon, not major, main PyCon, but a regional PyCon, and it's in a pretty sweet place, so I'm starting to think I might have to attend this.

16:10 So if you check out PyCascades.com, It's in Vancouver BC in January this year the next January So if you want to go up to the Pacific Northwest One of the more beautiful cities around here. They have things like a pike on hike as well as all the talks and stuff You know, you can check it out by cascades calm. That sounds like a lot of fun Yeah, actually, you know being in the Northwest you'd think I'd have been to Vancouver. I haven't ever been there Here's your chance. So maybe I might have to go up there Yeah, we might have to just jump on the train and go up there. Yeah, sounds good Well, I've been the book is very close. So I've been working late evenings here getting it ready working with my editor that right now it's supposed to the beta is supposed to be available right before PyCon or awesome technically at the beginning at on the 17th. So next Wednesday.

17:04 Yeah. So come check us out at our booth. Meet Brian, talk to him about his book.

17:08 And it should be ready by then you'll be looking probably tired.

17:13 >> Yeah, and my family's going to be a little irritated.

17:15 >> I haven't slept for three days.

17:17 >> So.

17:18 >> Yeah, perhaps. All right.

17:20 Well, Brian, thanks for chatting with me and sharing all this news with everyone.

17:24 >> Yeah, thank you.

17:25 >> You bet. Bye.

17:26 >> Bye.

17:27 >> Thank you for listening to Python Bytes.

17:30 Follow the show on Twitter via @pythonbytes, that's Python Bytes as in B-Y-T-E-S, and get the full show notes at pythonbytes.fm.

17:39 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

17:43 We're always on the lookout for sharing something cool.

17:46 On behalf of myself and Brian Auchin, this is Michael Kennedy.

17:49 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page