Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #94: Why don't you like notebooks?

Return to episode page view on github
Recorded on Wednesday, Sep 5, 2018.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:04 This is episode 94, recorded September 5th, 2018. I'm Michael Kennedy.

00:09 And I'm Brian Okken.

00:10 Hey, Brian. How are you doing?

00:11 I'm doing really good. Yeah.

00:13 Excellent. Doing very well. The sun is shining. Summer has not left us yet.

00:17 It's not that great for productivity, but it's definitely good for keeping the spirits up.

00:21 You know what else is keeping my spirits up is my DigitalOcean servers, the ones running this site and many others.

00:28 They've been working perfectly. So they've been going really, really strong.

00:31 And we'll tell you more about them later.

00:33 But in the meantime, if you want to check them out, pythonbytes.fm/DigitalOcean, get $100 credit for new users.

00:38 Brian, when I was in the C++ world, the C# world, design patterns were like this massive thing.

00:47 And you had to know all the design patterns.

00:49 And there was like dependency injection and IOC containers and all this stuff.

00:53 And I feel like Python doesn't have as much rigor around it because you don't have to jump through so many hoops to make certain things happen, I guess.

01:02 What do you think?

01:03 Yeah, I think so.

01:04 And it's actually something that's interesting because I came from the C++ world.

01:08 So patterns were a thing in C# also?

01:10 Oh, yeah.

01:11 Okay.

01:11 Well, I don't even know who the gang of four are.

01:15 But there were four authors that wrote the design patterns book.

01:21 Let's see.

01:21 Eric Gamma, Richard Helm, Ralph Johnson, and I'm not going to try to pronounce that last one.

01:27 John something.

01:28 Anyway, gosh, in the 90s, if you were in C++ or C#, apparently you read this book or others around design patterns.

01:37 And then when I got into Python, I was a little curious whether that was a thing in Python or not.

01:43 But I haven't really heard much other than I haven't really needed it.

01:48 A lot of this stuff isn't really needed.

01:49 What I think is interesting is there's those patterns that you see from the gang of four and the sort of derivative ones, derivative books and thinking.

01:58 And a lot of it, like you say, is not needed.

02:00 But there are other patterns that are really useful and come in.

02:03 Like, for example, meta classes, for example, or decorators.

02:06 Or there's other stuff, right?

02:08 Generator methods.

02:10 All sorts of stuff that is in here that don't appear in the gang of four because C++ or Smalltalk just didn't do that.

02:17 They were highly based on Smalltalk, actually.

02:19 Their patterns.

02:20 Right.

02:20 Well, one of the things that caught my attention today was a tweet by, gosh, who's this guy?

02:27 Brandon Rhodes.

02:28 Brandon Rhodes, yep.

02:29 And he's doing, he's got a site called pythonpatterns.guide.

02:34 And it has, he's sort of going through a lot of different, I think he's going through the gang of four book.

02:42 But he might be also doing other, pulling together other design pattern things that he's talked.

02:50 Yeah, he's pulling together information from talks and writing.

02:53 And I think he's creating more information, too.

02:55 But there are a whole bunch of these, trying to apply some of these patterns to Python.

03:00 And kind of sometimes different ways to do it.

03:03 So you can do things in different ways.

03:05 And so far, he's got abstract factory pattern, the builder pattern, factory method, composite, decorator.

03:13 Yeah, we definitely have decorators.

03:15 And then things like monkey patches and iterators, things like that.

03:20 And how that applies, I'm glad that somebody that knows what they're talking about is trying to figure out, how does this all apply in Python?

03:28 And I haven't really dug too much into this.

03:31 I just think it's a neat resource to try to read about some of these.

03:34 Yeah, I definitely think it's a really neat resource.

03:36 And Brandon has some interesting thinking on design patterns and architectures.

03:41 He gave a super counterintuitive talk called Clean Architecture.

03:47 I think it was at Pi Ohio a couple years ago.

03:49 And when I first started watching it, I was like, I just disagree with everything you're saying.

03:53 This just seems so wrong.

03:55 And then after 10 minutes, I'm like, but wait a minute.

03:58 I think it's right.

03:59 I think I've been thinking about this all wrong.

04:02 And it really caught my attention because I didn't agree with it so much.

04:07 But then I'm like, wow, this is really compelling what you're telling me.

04:11 So maybe I need to rethink what I'm thinking.

04:12 And whenever I have that feeling, I'm like, whoa, I need to pay attention because I might learn something really good here.

04:18 Yeah, so that's a good point.

04:20 I'm not necessarily saying, since I haven't really dug through this too much, I'm not sure.

04:25 I mean, I respect Brandon as a smart guy.

04:28 I expect that there's some really great stuff in here, but you may not agree with all of it.

04:34 So we'll try to dig up a link to that Clean Architecture too because that sounds interesting.

04:38 It's super interesting.

04:39 Yeah, yeah.

04:39 It's definitely a good one.

04:40 Cool.

04:41 Well, thanks for bringing this up.

04:42 I love these Python patterns.

04:44 And I love sort of the how would these traditional, more formalized patterns actually look in our language.

04:51 And there's a lot of interesting examples there.

04:53 Yeah.

04:53 What do we got next?

04:54 Well, we got this thing called Arctic.

04:56 Arctic.

04:57 Arctic is an API framework over top of MongoDB and Pandas.

05:04 And the idea is this is a thing that's been around since around 2012.

05:10 And its sole purpose is analyzing time series data super fast.

05:16 So one of their headline is basically Arctic millions of rows per second of time data in Python.

05:25 So that is really quite impressive.

05:27 I can tell you a lot of the ODMs and ORMs and stuff.

05:29 They don't do millions of records per second.

05:32 So the idea is that it basically bakes in pandas and numpies and all those kinds of things.

05:38 And it has an underlying data store that's backed by MongoDB.

05:41 And it actually uses the binary low-level communication.

05:46 So instead of trying to store all the data and then bringing it back and deserializing each row,

05:52 I think what it does is it actually just stores the binary data of pandas.

05:57 And it'll pickle numpy arrays and stuff like that.

06:00 And just exchanges the memory structure and just pulls it straight back and go,

06:04 yep, here it is.

06:05 Let's look at it.

06:06 And it's pretty cool.

06:07 Yeah.

06:08 Yeah, definitely.

06:09 I mean, there's a lot of applications that use just huge amounts of time series data.

06:13 Yeah.

06:14 So they say the two big areas they think it's useful is IoT, little tiny IoT devices that maybe Python is running on,

06:22 and financial analysis.

06:24 So it's sort of been extracted out of the work that this financial company called MAN, AHL,

06:32 I've never heard of them, but I think they're mostly an Asian company, but also in the U.S.,

06:36 around investment and so on.

06:38 So they've been working on this.

06:40 And they actually have some numbers on how this thing performs relative to other types of projects that they pursued or other things that were available.

06:50 So they talk about the different kinds of data that they store and analyze for stock trading and analysis.

06:54 And they say, look, we have this sort of data that's for one day, a whole bunch of it, maybe 10,000 rows.

07:01 And they can work with those 10,000 rows in four milliseconds.

07:05 And they say, compare that to what we were getting out of SQL Server, which was 2.2 seconds.

07:11 So, you know, 500 times slower, which is pretty incredible.

07:14 And they have this other tick data, like, you know, the stock ticker type of data.

07:19 They can say in one second they can process 3.5 megs worth of that data in Python or 15 megs in Java.

07:26 And there was some other project that they were trying to improve over called OtherTick, which took like 40 seconds versus one.

07:33 So really, really interesting, high-performance, database-backed time series.

07:40 Neat.

07:41 Yeah.

07:41 So if you're into Pandas, NumPy, and you've got to store and query a bunch of time series, whatever the reason, this is probably worth checking out.

07:49 And it's also tested with pytest, which is pretty cool, right?

07:52 Well, of course.

07:52 Any real project is tested with pytest.

07:55 That's right.

07:56 Of course.

07:57 So one of the things I really like about the Python community is the fact that there's so much sharing of information out of conferences and meetups and things like that.

08:08 So we have another thing you found here for PyCon, right?

08:13 Yeah.

08:13 So the PyCon – I don't remember when it was, but PyCon Australia wasn't too long ago.

08:18 And they've already got all the videos up.

08:22 And we have a link to the PyCon Australia videos.

08:25 And I've got quite a few of them queued up that I'd like to listen to and watch.

08:31 I'm kind of bad about videos, actually.

08:33 I often just listen to them and then go back and look at the slide parts of information that I wanted to capture.

08:40 But I like listening to talks as well.

08:43 But there's one from Mark Smith, which he always amuses me because his Twitter handle is Judy2K.

08:48 And he won't tell me why.

08:50 But his talk is how to publish a package on PyPI.

08:55 And that's the one I've watched so far.

08:57 There's a lot of great talks there, though.

08:58 But I think this one's a great one that it – the end punchline is use cookie cutter.

09:04 But he blasts through not using cookie cutter, all this sort of stuff you have to do to get up.

09:09 And, you know, it's – every little piece makes sense and it's not difficult.

09:12 But there are a lot of different little pieces.

09:14 But he goes through this entire thing in like less than half an hour.

09:17 And so that's pretty impressive to watch him talk about all the different pieces and why they're there and what they're used for.

09:24 So that's a good one to sort of understand what's going on in the packaging world in a very short amount of time.

09:31 Oh, that's really cool.

09:32 Yeah, there's a bunch of cool ones here.

09:33 A couple in MicroPython, actually.

09:36 So one writing fast and efficient MicroPython code.

09:39 And the other is async.io in MicroPython.

09:42 Both of those are pretty cool.

09:43 Kind of tie into what we were just talking about previously.

09:46 Yeah, and then there's like – gosh, there's solid APIs and there's – it looks like a lot of good stuff.

09:53 And I know that Australia is – since it's a big travel burden to other places, other PyCons, you'll see some speakers there that you're not going to see other places.

10:03 So that's cool.

10:04 Yeah, absolutely.

10:04 And they have 88 videos.

10:06 So that's pretty solid.

10:07 Yeah.

10:07 Quite cool.

10:08 That's a good one.

10:08 So before we move on, I'll tell you about another cool thing, DigitalOcean.

10:12 All right, so I'm a big fan.

10:14 And so one of the things that they've released – we talked about this just a couple of times, not very much – is this idea of projects.

10:22 So when you go into your name, your cloud provider, you might have a bunch of servers, a bunch of ESP storage type things, virtual storage blocks, load balancers, all sorts of stuff.

10:35 And it's really hard to know what goes with what.

10:38 Do you have a staging environment, a production environment, all that kind of stuff, right?

10:41 So how do you organize that?

10:43 So DigitalOcean has come up with this feature called projects that lets you group things like your droplets, that's virtual machines, and floating IPs, and back storage-like spaces into these different use cases.

10:55 So you know, yeah, actually, we're done with this project.

10:59 So we can turn that server off and destroy it and not like the fear of, I don't think we're using this one, but I'm not going to destroy it.

11:06 I'm not going to delete it because what if I'm wrong, right?

11:08 So a very cool feature you can take advantage of for all of their stuff.

11:12 Check them out at pythonbytes.fm/DigitalOcean, and they'll give you $100 credit for new users.

11:18 That's awesome.

11:19 Hey, let's talk about another cloud provider.

11:20 I'll write it on the back of that.

11:23 So one of the ways that you can run your code on the internet is like I just described with DigitalOcean, like I do for our stuff, is to create some virtual machines and various other pieces and sort of use it as so-called infrastructure as a service, right?

11:37 IaaS.

11:38 But you might also use platform as a service, like here's my code, run it.

11:41 So Google App Engine, Heroku, those types of things.

11:45 Okay.

11:45 So Google App Engine has a pretty interesting announcement, and it's interesting for both it's good now and like, oh my gosh, I can't believe it was like that.

11:54 So the announcement is that Google App Engine has released their second generation runtimes, which the Python one is now based on Python 3.7.

12:03 That's pretty awesome, right?

12:04 It is.

12:05 You want to run some code.

12:06 Boom.

12:06 Here's my Python 3.7.

12:08 So that's really good.

12:09 You might think, oh, Michael, what was the previous one?

12:11 3.6?

12:12 3.5?

12:13 No, I believe the previous one was 2.7.

12:15 Oh, no.

12:16 Until now.

12:17 Like if you were using Google App Engine, I believe you had to use a legacy Python, period.

12:22 Until that was like mid-2018 that I just said that.

12:26 That wasn't like a statement around 2012 or something.

12:28 That was just now.

12:30 But let bygones be bygones.

12:33 And now it's Python 3.7, which is pretty awesome.

12:36 So apparently it's a pretty big upgrade.

12:38 You get a bunch of new things.

12:39 Like, for example, it's based on their new sandbox container, sort of Docker-like things.

12:46 It removes a bunch of restrictions.

12:48 Like, in addition to only running on the old Python, legacy Python, you could only use a white-labeled set of packages.

12:56 And now in the new Google App Engine, you can use arbitrary packages.

13:00 Just put them in a requirements file, which is pretty sweet.

13:03 That's a big change.

13:04 It is a pretty big change.

13:05 Yeah.

13:06 So a lot of cool things like autoscaling and things that are a little bit easier as well.

13:09 So anyway, if you're interested in Google App Engine's platform as a service for Python, it just got many, many times better.

13:16 Yeah.

13:16 Neat.

13:17 Yeah.

13:17 Yeah.

13:18 So, Brian, I typically write my code in Python files, not really in notebooks per se.

13:24 How about you?

13:25 Yeah, mostly in files.

13:26 But I'm trying to learn Jupyter Notebooks some and utilize them.

13:31 They're kind of fun, especially in data science-y realms or looking at plotting data and stuff.

13:37 Notebooks are fun.

13:38 Yeah.

13:39 But there was a person named Joel Gruse that says he does not like notebooks.

13:46 And Joel is notable because he's not like a random dude on the internet.

13:49 But Joel Gruse has written a book called Data Science from Scratch.

13:54 He's done a lot of work in data science, things like that.

13:57 I've even had him on Talk Python many moons ago.

13:59 Yeah.

14:00 And this wasn't just like a one-off comment.

14:03 He gave this talk at JupyterCon.

14:07 And that's kind of hilarious.

14:09 But the video for that is not available yet as far as I couldn't find it.

14:14 So because that was just recently or still going on, I'm not sure.

14:17 But the slides are up.

14:19 He put the slides up.

14:20 And for one, it puts me to shame.

14:23 You know, this presentation has got so many animations and pictures and stuff.

14:28 Plus, it's like I haven't even got through it yet.

14:30 It's like 100 pages long or more.

14:32 But it's really good.

14:33 But it's a serious discussion about some of the issues with the problems with notebooks that people new to notebooks don't quite get.

14:44 And people old to notebooks just sort of know it and don't really think about it anymore.

14:48 And one of the big ones is that there's hidden state.

14:52 And so, essentially, we think of files as, like you said, we normally work in files.

14:59 So they get run from top to bottom, except for, you know, functions don't get run.

15:04 They get interpreted as functions.

15:05 And then when they are run, they run top to bottom, essentially.

15:08 And notebooks are not like that.

15:10 You can jump around and execute different bits of code in different orders if you feel like it.

15:15 And that stateness can lead to weird, confusing things.

15:21 So it's just like a gotcha to know about.

15:24 And then he goes on to talk about some of the issues where if you start learning how to code with notebooks, you may end up, you know, developing some bad habits like importing notebooks instead of just trying to.

15:36 I mean, like, that's a thing apparently you can do is you can define some functions in a notebook and then import them into another notebook.

15:43 Well, I mean, wouldn't it be better to just put them in a different library, in a package or a library?

15:49 Use the package, use the library.

15:50 Exactly.

15:51 Yeah.

15:51 So some of those, and, you know, I'm highlighting this not because I think notebooks are evil, but because I think it's important to start to listen to people saying, you know, listen to a voice that says they aren't a silver bullet.

16:05 They have their issues also.

16:07 And we just need to be careful and talk and make sure you don't fall into those traps.

16:11 Yeah, these are really interesting.

16:13 And these are certainly issues to look out for.

16:16 And while this is a funny presentation, I cannot wait to watch this video.

16:19 Joel, if you're listening, please let us know when it's out or if someone else sees it come out, shoot us a note, either email or Twitter, because this is fantastic.

16:26 Yeah.

16:26 Plus, also, like, I can't even imagine how long it took to put together this presentation because it's, yeah, there's a lot of animations in there and it's quite a riot.

16:36 It is quite a riot.

16:37 Yeah.

16:37 Anyway, there's that.

16:38 Just the other side of maybe notebooks aren't awesome.

16:42 Yeah.

16:42 It's pretty, pretty, pretty interesting.

16:44 So we've had a couple of conversations around the various peps and stuff that have been maybe causing some kerfuffle in the community.

16:56 Obviously, the biggest one was a PEP 572 about the in-place assignments.

17:03 And that was the thing with all the stress around it that Guido said, hey, after this, I'm, this is my last one.

17:08 I've given my all.

17:08 I'm out of here.

17:09 You guys, it's up to you.

17:11 We actually had Brett Cannon and Carol Willing on episode 87 to talk all about that.

17:16 Right.

17:16 And one of the things that we talked about was what comes next.

17:21 Right.

17:22 If, if it's not down to Guido to make the final decisions, which is how it has worked, how will the Python community decide, you know, what it's up to?

17:31 So, yeah.

17:33 So Barry Warsaw has published five peps at least around this.

17:38 And I don't think this is a decision.

17:40 It's sort of a structure to further the conversation and make a decision.

17:46 So he just published not too long ago, PEP 8000, which is Python language government proposal overview.

17:53 And I don't know if this is common in peps.

17:55 I haven't seen it that much, but it's like a gathering of other peps that are specific details.

18:01 So there's PEP 8,001, 8,002, 8,10, and 8,11.

18:05 The first two are about voting and ways in which this government might work.

18:10 And then the higher ones, the 8,10s are actual proposed models.

18:16 And there's a third one in 8,12 that I forgot to put in the notes.

18:19 And so there's for the government styles, government styles, we have the BDFL governance model is one of the proposed options,

18:28 which is to elect a new person who is the final decider.

18:33 Right.

18:34 Basically, GEDO step down, who is going to take that place to now participate in that way?

18:40 We also have the council governance model, which we talked about interesting things like should there be an even or odd number of people in the council?

18:46 And then the last one, I think, let me pull that up.

18:50 I think it is community.

18:52 Yeah, the community governance model.

18:54 And that one's a little more free form.

18:57 So these are all different ways of possibly arranging and solving that problem.

19:02 And there's a lot of examples like, let's see how Rust did it.

19:05 Let's see how OpenStack manages their organization and so on.

19:09 So there's a lot of concrete stuff there.

19:12 Anyway, that's pretty cool.

19:13 Yeah, that's interesting.

19:13 If you have a strong thought on this and you want to participate, get in there, make comments, let people know what you're thinking.

19:21 Because it's still open.

19:23 It's not anything decided.

19:24 Right?

19:25 It's still up in the air.

19:26 So if you want to have a say, now is the time to make statements.

19:29 Wow.

19:30 It's like government working in our own community.

19:34 What?

19:34 Incredible.

19:36 Incredible.

19:37 Yeah.

19:38 Yeah.

19:38 So this is pretty cool.

19:39 I don't know where it's going to go, but I like that it's all laid out like this.

19:43 My guess is it's going to go down the council model, maybe with, I don't know.

19:49 I think it's going to go down the council model, but we'll see.

19:51 Yeah.

19:51 I think that whatever they do, they need, they should, if there's a council, they should have to meet together to make decisions and pass around like a talking stick or something.

20:04 Yes.

20:05 I love it.

20:08 Oh, we could come up with something weird that they have to follow.

20:11 How about the Python staff of power that you were carrying around?

20:14 Yeah.

20:15 But then, you know, should it be the blue and yellow one or should it be the green and yellow one?

20:21 I don't know.

20:22 That is a big question.

20:23 Yeah.

20:23 Yeah.

20:24 Yeah.

20:24 I don't know.

20:24 So, sorry, green and gold.

20:28 So people in Australia say it's gold, not yellow, but it looks yellow to me.

20:32 Yeah.

20:32 I definitely, I thought that stick was a big hit.

20:34 I don't know if people don't know what you're talking about.

20:35 What should they Google to find the stick?

20:37 I think it's Pythonic staff of enlightenment.

20:40 I don't know.

20:40 Is that right?

20:41 That's got to do it.

20:42 How many hits on Google can there be for that?

20:45 I don't know.

20:46 Awesome.

20:46 So, yeah, they should have to pass that thing around.

20:48 All right.

20:49 Well, that's it for our items this week.

20:51 You got anything extra you want to share with folks?

20:53 I don't, actually.

20:54 Just trudging along.

20:56 We got a couple more testing codes out.

20:58 Yeah.

20:59 Very nice.

20:59 How about you?

21:00 I've got, of course, some TalkPilot stuff queued up to be released shortly.

21:03 I have been recording some courses, which are going to be awesome.

21:07 And I'm very excited about them, doing a bunch of stuff in parallel.

21:10 So I'll let you know when that's sort of further along.

21:13 But I do have two things I want to talk about this week really quickly.

21:17 One is we got a message on Twitter.

21:19 And I don't have the name of who sent us.

21:22 This was John, actually.

21:23 Thanks, John.

21:24 Who sent us this heads up that Brian Granger, one of the guys behind iPython and Jupyter and

21:31 all that stuff from the very early days, is giving a free webcast.

21:35 And it's an ACM-sponsored thing.

21:38 So it's Project Jupyter from computational notebooks to large-scale data science with sensitive data.

21:43 So if that sounds interesting to you, I put the link in there.

21:46 It's this Friday.

21:47 This episode probably will come out on Thursday.

21:50 So you've got to take action right away if you're listening.

21:52 There's probably a recording or something afterwards.

21:54 You can check that out.

21:55 The other thing is, you know, we talk sometimes about the popularity of Python.

21:59 Yeah.

21:59 So I don't want to beat this one to death too much.

22:02 It's not really worth its own item.

22:03 But Python continues to climb yet another ranking.

22:07 So the TOB index is one of the more well-respected, more long-running ways of ranking programming languages.

22:13 And I think when we started this podcast, Python was either fifth or sixth.

22:18 I think it was sixth on this list.

22:20 It is now third.

22:22 Probably because of the podcast.

22:24 Certainly partly because of it.

22:25 Yeah.

22:27 But that may be a very small part or more maybe it's meaningful.

22:30 But what's really interesting is it's now above C++, C#, JavaScript.

22:36 It's way above JavaScript.

22:37 And JavaScript's going down.

22:39 It's above Ruby.

22:41 It's above many, many things.

22:42 What it's not above is it is not above Java or C.

22:47 And not only is it not above them, but it's like half.

22:50 So it's like 7.6% to C is 15.4%.

22:55 It's going to be a long time, if ever, until it gets to a 2 or a 1.

22:59 But it's definitely doing quite well.

23:02 Yeah.

23:03 So, yeah.

23:04 What is the TOB index?

23:06 Anyway.

23:06 I'll have to look into that.

23:07 Yeah.

23:08 If you look into it, they talk about their philosophy and where they measure stuff from and so on.

23:12 It's been a long time since I read it, so I don't remember the details.

23:15 But they do lay out where the ranking comes from.

23:17 Okay.

23:18 Cool.

23:18 Yeah.

23:18 All right.

23:18 Well, that's it for this week.

23:20 Thanks for chatting with me, Brian.

23:21 Thank you.

23:21 Bye.

23:21 You bet.

23:22 Bye.

23:22 Thank you for listening to Python Bytes.

23:25 Follow the show on Twitter via at Python Bytes.

23:27 That's Python Bytes as in B-Y-T-E-S.

23:30 And get the full show notes at pythonbytes.fm.

23:34 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

23:38 We're always on the lookout for sharing something cool.

23:41 On behalf of myself and Brian Okken, this is Michael Kennedy.

23:44 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page