Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #68: Python notebooks galore!

Return to episode page view on github
Recorded on Sunday, Mar 4, 2018.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:04 It's episode 68, recorded February 28th, 2018.

00:09 I'm Michael Kennedy.

00:10 And I'm Brian Okken.

00:11 And we have yet another bundle of amazing stuff to share with you.

00:15 I'm super excited about the ones I got. How about you, Brian?

00:18 I'm really excited. I had to kick some out because I had too many things to cover.

00:21 I think I changed my list four times this week because I'm like, oh, this is a great list.

00:25 Oh, no, this one's more important. This is even better. It's awesome.

00:27 Yeah.

00:28 Yeah. So before we get to it, I just want to say thanks to DigitalOcean for sponsoring this episode.

00:32 Check them out at do.co slash Python.

00:34 Right now, I want to hear about PyPI, but there's something wrong with it.

00:39 What's up here?

00:40 Well, so I've had this on the list for a long time.

00:44 A project called the Dumb P-O-I-P-I.

00:47 So Dumb PyPI or PyPI. I don't know.

00:50 Anyway, it's not really that dumb, though.

00:52 So there's a lot of local.

00:55 So you can have your own repository.

00:57 So there's a bunch of different ways you can set up your own server so that you can serve your own packages.

01:02 Like if you've got a team or something that you've got proprietary code that you don't want to share with others on normal PyPI, you can have your own.

01:12 But you have to have a server running.

01:14 And there's a lot of the generation of the server code is tied to it.

01:19 So there's like a flask version and there's various versions.

01:22 This one is just a flat file creator.

01:25 So this package, Dumb PyPI, will just take a directory full of wheels or zipped packages and create a directory that you can just stick on any server and have it be served up for an index.

01:43 And for instance, I've got a...

01:44 So it doesn't do any caching.

01:47 It doesn't go through to PyPI and grab things that it's missing.

01:50 So you have to manually do that yourself.

01:52 But if we combine this with what we learned in episode 24 that you can just do pip download easily and download your own files somewhere.

02:01 This combined, I'm using this at work now to create a really simple PyPI server behind our firewall that doesn't have...

02:11 I don't have to give it permission to talk to the outside world.

02:14 It's just a bunch of files.

02:15 So...

02:15 It's actually really cool.

02:17 So you could even put it up on like Amazon S3 or somewhere like that, right?

02:22 Right.

02:22 And actually, there is an example on...

02:26 I think that is the example on the website or the package website.

02:31 GitHub site does have an S3 example.

02:33 It's like super fast and slick and it doesn't do anything like updates or anything.

02:39 You have to rebuild everything yourself.

02:41 But if you're going to...

02:42 You can set up a cron job or something to do some of this.

02:45 Exactly.

02:45 Just do it at night when nobody's around.

02:47 Yeah.

02:49 Just update it daily.

02:50 How often do these packages change, right?

02:52 But like, for instance, I've got like all of our test code that we're creating virtual environments to...

02:57 And then pulling in test packages and different packages.

03:00 That stuff just...

03:01 I don't want it to update all the time.

03:03 I want it to...

03:04 I want it to grab certain versions that I know are there.

03:07 So something like this is perfect.

03:09 Yeah.

03:09 It looks really cool.

03:10 I think it needs a better name than dumb PyPI.

03:12 Yeah.

03:13 Yeah, it does.

03:15 Clever, but doesn't do anything PyPI.

03:17 How about that?

03:18 No server.

03:19 Server.

03:20 Serverless PyPI.

03:22 How about this?

03:22 Come on.

03:23 Yeah.

03:23 Awesome.

03:24 Okay.

03:25 So the next thing I want to talk about is something for humans.

03:28 And if I said it was for humans, who would that mean?

03:31 Kenneth.

03:32 That's right.

03:32 Kenneth writes.

03:33 So he's got all of his things for humans.

03:35 He's got Maya, date time for humans, records, sequel for humans, obviously requests.

03:40 So he's out with a new human thing.

03:42 And this time for web scraping.

03:44 So he created this thing called requests HTML, HTML parsing for humans.

03:51 So when I looked at this, I thought, oh, is this maybe like a replacement for beautiful soup or something like that?

03:57 Like in some kind of extension to requests.

03:58 But in fact, it actually depends upon beautiful soup.

04:02 Right.

04:03 So what it is, it's a library that like puts a different API on top of combining requests plus beautiful soup, plus something called py query, which lets you run jQuery style CSS selectors.

04:17 So it does a bunch of cool stuff.

04:18 Some of the notable features are it has full JavaScript support, which I'm taking to mean that it will parse and execute the JavaScript necessary.

04:29 So if I hit like an AngularJS page, instead of just seeing curly brackets everywhere, there's data that would have gone in there, which is a big deal in web scraping.

04:37 Because if you just use straight up request plus beautiful soup, you just get the markup where those bits would execute when it does, right?

04:45 Yeah.

04:45 The CSS selectors, XPath selectors mocked user agents.

04:51 So it pretends to be a real browser.

04:52 So people don't know that you're trying to scrape their sites, which is kind of interesting.

04:55 It uses connection pooling and cookie persistence.

04:59 So you can like log in and then go do a bunch of stuff at a site.

05:02 And you can do it without reconnecting every time.

05:05 So that's pretty cool.

05:06 Yeah, and it keeps the session open and tying requests with, I mean, that's what people often did anyway is a request plus beautiful soup.

05:15 And tying it in with one API is great.

05:19 And actually, I like the idea anyway of somebody saying, hey, these tools are great, but I wish the API was different.

05:25 So just write another package that uses others and write a better API then.

05:30 Yeah, it's a little like Flask, what Flask did, but for requests and parsing.

05:34 Kenneth is a great one for, he's got good eye for APIs.

05:38 Yeah, that's for sure.

05:39 People definitely seem to love his APIs.

05:40 So I'll leave you with the final sort of tagline here from their website.

05:44 The request experience you know and love, but with magical parsing abilities.

05:48 That's nice.

05:49 Yeah, not bad, right?

05:51 Cool.

05:51 So what's up with this phony number thing?

05:54 You got some like prank calls to make?

05:56 This was awesome.

05:58 So Twilio does their Twilio blog where people can write for them.

06:02 And I think we've talked about it before.

06:05 They do a pretty cool program where they give you an editor even to help you out with it.

06:10 But this article is basically a, and you don't have to do a Twilio project, but this is a

06:15 Twilio project.

06:15 This is a phone number proxy.

06:17 So the idea is you imagine a situation like, for instance, you've got a, I don't know, a

06:23 meetup or some temporary event.

06:26 And you want people to be able to text you because you're not going to be around your

06:30 computer all the time.

06:31 You want to be able to people, people to be able to text you and you want to text back,

06:35 but you don't want to give out your phone number.

06:37 Well, this project gives you a little proxy so that you can set it up with Flask and set

06:41 up a server with Twilio and have give out a temporary phone number and have it be attached

06:47 to your phone.

06:48 And I'm going to definitely have to try this out because it looks fun.

06:51 Yeah, that looks really, really cool.

06:53 And I think that program they have is awesome.

06:55 One of the challenges of getting started blogging is nobody knows about you.

07:00 Nobody, like you'll put all this effort into writing this thing and you'll put it out there

07:03 and your 10 friends who are willing to follow your tech stuff off of Facebook glanced at it,

07:09 right?

07:10 And so here's a way to like appear on a major, major blog and highlight what you're doing

07:16 and maybe jumpstart your other tech stuff, right?

07:19 Like you could link back to your blog or something like this.

07:21 Having somebody work with you to polish it up a little bit.

07:25 Is a good idea.

07:25 Often when you just tap your friends for that sort of help, they'll just tell you, oh, it

07:31 looks great.

07:31 Go ahead and put it up.

07:32 Yeah, yeah.

07:32 Very cool.

07:33 Very cool.

07:33 But this project is also pretty neat.

07:35 It does encourage you to do some of the paid part of Twilio.

07:39 But I think for something like this, it's a good idea.

07:42 Yeah, very nice.

07:43 Good article.

07:44 All right.

07:44 Before we get to the next, let me just tell you about DigitalOcean.

07:48 They're doing some really amazing stuff.

07:50 So the thing I'd like to highlight is they just upgraded all of their things and left

07:56 the price the same.

07:56 And they, by upgraded, I mean doubled all the stuff at least.

08:00 So for example, you go to DigitalOcean and get a Linux server with all variety of Linux machines,

08:07 Linux distributions, with four gigs of RAM, two CPUs, 80 gigs of SSD for $20 a month.

08:15 Like that's insane.

08:16 Right?

08:17 That is a crazy thing.

08:19 And that used to cost $40.

08:20 And they just said, nope, that's now $20.

08:23 And it comes with four terabytes of free traffic.

08:26 If I were to just transfer that over S3, which is $0.09 a gigabyte, just that bandwidth would

08:33 be $368 at S3.

08:35 That's included in your $20 server.

08:37 So really, really awesome stuff.

08:38 Check them out over at do.co slash Python.

08:41 And you know, check out what they're doing.

08:44 Help support the show.

08:45 Everybody's getting good stuff.

08:46 So thanks to DigitalOcean for that.

08:48 All right.

08:49 I kind of want to just go on a Jupyter-like notebook rant for a while, Brian, because the

08:57 news around this stuff is just coming in fast and furious.

09:00 So there are so many things going on with notebooks right now.

09:04 And like, this is a world I don't really live in.

09:06 I'm much more a Creative Python project and have like 10 related files and run stuff on the

09:12 command line or my editor and not put it in these cells because that's just not my world.

09:17 But I see how powerful it is for people who are exploring data and being more iterative

09:22 with their code.

09:23 And in the last couple of weeks, they've got a lot more options.

09:27 They've been in the news a lot right now.

09:28 So I'll start with one for this one.

09:30 And then we'll do another one in the final segment.

09:32 So for this one, I want to talk about something that's brand new called Datalore.

09:37 Have you heard of Datalore?

09:39 I have not.

09:39 You've heard of PyCharm, right?

09:41 So this is like PyCharm in a notebook, online, hosted.

09:47 So it's from the JetBrains guys.

09:49 It's just in the cloud.

09:50 You just go sign up.

09:51 It has this intelligent editor, just like JetBrains has.

09:55 Like, you know, IntelliJ plus PyCharm has with all of the like the cool autocomplete and IntelliSense.

10:01 It comes like pre-installed with a bunch of stuff that you need, like Matplotlib and so on.

10:06 It has collaboration.

10:07 So you can log in and kind of like do Google Docs style, work on it together.

10:12 I don't know how real-time it is.

10:14 Like, do you actually see every character going in?

10:16 Or do you, you know, do you have to refresh it?

10:19 Does it automatically refresh?

10:20 I'm not entirely sure the level of collaboration, but there's some real-time multiple people working on the same notebook type of collaboration.

10:27 I got to check that out.

10:29 It has integrated version control.

10:31 So you don't have to be like if you're a student or you say you're an engineer, but you don't like, you're not like get pushed on the command line type of competent, right?

10:40 You go there and just say, create me a save point.

10:42 It basically saves it and tags it so you can get it back.

10:45 Things like that.

10:46 Oh, that's great.

10:47 Pretty cool.

10:47 The JetBrains, like the diff viewer for version control is really great.

10:51 So that, building that in here is cool.

10:53 Yeah, they've got some really cool stuff.

10:55 And finally, this might be pretty big for some folks, depending on what you're doing.

10:59 They have incremental calculations.

11:01 So you can like, if you're doing like machine learning and training and all sorts of analysis,

11:05 and there's a bunch of cells that work together to generate that data, they actually have figured out how to track the dependencies between where that data comes from.

11:14 And you have to rerun the entire thing.

11:15 If you're changing your model, it only reruns the parts that have changed, that depend upon something you've changed.

11:22 Oh, that's awesome.

11:23 Yeah, it's pretty cool, right?

11:24 So if your computation takes two minutes, but this little part's really quick because it uses mostly finished data,

11:29 that's a really big deal, I think.

11:30 Yeah.

11:31 So anyway, data lore, it seems like it's in beta.

11:33 I don't know what it costs, if there's a free thing or whatever.

11:36 But it's a Jupyter Notebook-like hosted service from JetBrains, which I thought was pretty cool and worth talking about.

11:43 Yeah.

11:44 Neat.

11:44 Nice.

11:45 I have no idea how to get started on this next one.

11:49 I'm just going to say the name, Belly Button.

11:51 Belly Button, yes.

11:52 For personal lint.

11:54 What's up with us?

11:56 So, yeah, I think it's a play on words around, like, linters and where lint usually shows up.

12:03 So we have things like pylint and flake 8, which in PyCode style, which used to be called Pep8, that I use all the time and love.

12:12 But there's times where you have, like, extra requirements for your own team or for your own project.

12:19 And it'd be cool to have, like, something like pylint, but just with your own rules in it.

12:25 And that's where Belly Button comes in.

12:27 So it's a way to create rules around for static analysis or style.

12:34 And one of the examples that I thought was great was, let's say you've got a library with some functions that you decide that your team uses, but you decided some of them are dumb and deprecate them.

12:46 Yeah, or maybe there's a better way to do things.

12:47 You can add some of these rules to Belly Button to say, hey, this code here, you need to change it this way.

12:55 And actually give exact examples of how somebody should change it.

12:59 And I think that's a really cool idea.

13:01 Yeah, awesome.

13:02 Belly Button.

13:02 I wanted to bring that up.

13:03 Yeah, it sounds really cool.

13:04 These linters are really great.

13:05 And I typically think of them in the context of, like, continuous integration and sort of team-wide things.

13:10 But, yeah, here's a cool way to sort of make your own overrides and whatnot.

13:14 Yeah, and any time where you've got, like, a coding style within your team, if you can automate it and take the person out of it and take that out of your code reviews, it helps with team dynamics to just have the computer say, hey, change this code instead of having your coworkers keep telling you to change your code.

13:32 Yeah, that's a really interesting dynamic, isn't it?

13:34 Like, people are willing to take petty, nitpicky criticism from robots and automated systems way more than from your manager.

13:44 Or whoever.

13:45 Yeah, and you can just, like, we've already had the discussion about what our style is.

13:50 This is what it is.

13:51 I don't want to keep opening up the discussion.

13:53 So, just, you know, do it.

13:55 Nice.

13:56 Manager speak.

13:57 That's right.

13:58 Cool.

13:59 All right.

13:59 You ready for Notebooks Galore Part 2?

14:01 Oh, more notebook news.

14:03 Yay.

14:04 Yes.

14:04 So, our friend, our friend of the show, Daniel Schorstein, posted something on Reddit, some news that has to do with free hosted notebooks in Azure, right?

14:16 This would be, like, pretty much a direct competitor to Datalore, right?

14:20 So, they are now supporting Python 3.6 Jupyter Notebooks in Azure.

14:26 And there's a nice conversation over on Reddit about that.

14:29 And you go over and read more about it and so on.

14:33 So, they have, basically, if you just drop in on notebooks.azure.com, then off you go.

14:40 You can go work with it right there.

14:42 And that's, like, straight up Jupyter Notebooks, I believe.

14:44 That's pretty cool, right?

14:46 Free, in the cloud, powered by Jupyter.

14:48 Like, I'm telling you, this is, like, a space that is just, like, so blowing up right now.

14:51 Yeah.

14:52 We better pay attention to it more if people are fighting over it.

14:55 Exactly.

14:55 There's big companies fighting over it.

14:57 So, speaking of big companies that want to fight over it, have you heard of Co-Laboratory?

15:00 No.

15:00 A great word, though.

15:01 It is.

15:02 So, this comes from a research, the research group at Google, colab.research.google.com.

15:08 And people, this has been around for a little while, and people have been kind of dissing on it a little bit because it had been just Python 2.

15:15 However, it is now Python supporting, not legacy Python, but modern Python.

15:21 So, that's really cool.

15:23 And since the time that I took this note to talk to you about it today, and today, they now have also launched GPU support.

15:32 So, you go to your notebook, and you say, I want to do some machine learning.

15:36 Oh, yeah.

15:37 Run this TensorFlow, this training process on a GPU.

15:42 And you can basically hit Command-Shift-P to make it run on a GPU.

15:47 Like, how insane is that?

15:48 That's cool.

15:49 Okay.

15:49 So, that was pretty cool.

15:50 You ready for some more notebook news?

15:52 Yes.

15:53 JupyterLab is ready for users.

15:56 It's now open.

15:57 What is JupyterLab?

15:58 So, Jupyter is something based on Jupyter Notebooks, but it's more than just – so, we're going to have to put this with a grain of salt.

16:07 Probably a lot of people out there know better than I do.

16:10 But so, it's like a hosted Jupyter Notebooks, which is really cool.

16:15 But it also enables you to use text editors, terminals, data file viewers, and, like, all sorts of other stuff that's not just in the notebook.

16:24 So, you could, like, SSH in and do stuff behind the scenes or something to this effect, right?

16:31 So, they've got some cool pictures.

16:34 Like, they have – it's almost like this crazy web IDE.

16:38 So, you've got, like, your files on the left.

16:40 You've got your standard notebook with graphs in the middle.

16:42 And then on the right, you might have, like, a map, a couple of JSON files, and a CSV in, like, an Excel thing all in the same window.

16:49 Okay.

16:50 Well, that's neat.

16:50 Yeah.

16:51 And you can build, like, extensions and plugins.

16:53 So, like, that CSV thing, it's probably, like, a JupyterLab extension.

16:56 Nice.

16:57 So, yet another really cool thing going on there.

17:01 And I guess the final piece, a tip, maybe from the very first one from this segment is, Daniel said, one thing that can happen is when you log into, say, like, the Azure notebook, some of their dependencies are a little bit old, like Pandas or Matplotlib or something like that.

17:18 He shows you how to import pip and then execute pip inside your notebook to force it to upgrade the dependencies in your project.

17:26 Oh, okay.

17:27 And it's good that you put – you're going to put the snippet in our notes.

17:30 Yeah, the snippet is in there.

17:32 But you can basically – it shows you how to, from code, run pip to upgrade stuff, which I think is interesting and useful outside of just notebooks.

17:41 But it happens to be, like, if you don't get a remote into them, to the servers, you still want to upgrade stuff.

17:45 It's pretty helpful.

17:46 Yeah, nice.

17:47 Cool.

17:48 All right.

17:48 Whew.

17:48 That's a lot of notebook news.

17:49 We'll probably have more next week.

17:50 Probably.

17:51 Probably.

17:52 It's really cool, though, to see so much innovation and creativity around this stuff.

17:57 So it's kind of a paradox of choice problem going on.

18:00 Like, if I wanted to get started, what the heck would I do?

18:02 But there's a bunch of good options here.

18:05 Definitely.

18:05 Awesome.

18:05 All right.

18:06 You got anything extra you want to let everyone know about this week?

18:08 Just that maybe I should spend more time paying attention to Jupyter.

18:11 But other than that, no.

18:13 Yeah, Jupyter is pretty cool.

18:15 Jupyter Labs is exciting.

18:16 Collaboratory is exciting.

18:17 Notebooks on Azure is exciting.

18:19 Data lore is exciting.

18:19 Yeah, I'll have to pay more attention as well.

18:22 Do you have any news?

18:23 No news.

18:24 Well, when this episode goes out, there's a very good chance that I'll be at PyCon Slovakia.

18:30 And if I am and you hear this, feel free to come say hi.

18:33 That'd be cool.

18:33 Neat.

18:34 Yeah.

18:34 So I think that's the right timing.

18:36 I'm pretty sure it will be.

18:37 I'll try to line it up that way.

18:38 All right.

18:39 Well, thanks for getting all this stuff together, Brian.

18:41 This is great stuff.

18:42 Yeah, thank you.

18:42 Thank you for listening to Python Bytes.

18:46 Follow the show on Twitter via at Python Bytes.

18:48 That's Python Bytes as in B-Y-T-E-S.

18:52 And get the full show notes at pythonbytes.fm.

18:55 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

18:59 We're always on the lookout for sharing something cool.

19:02 On behalf of myself and Brian Okken, this is Michael Kennedy.

19:05 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page