Transcript #68: Python notebooks galore!
Return to episode page view on github00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds.
00:05 It's episode 68 recorded February 28th, 2018. I'm Michael Kennedy.
00:10 And I'm Brian Okken.
00:11 And we have yet another bundle of amazing stuff to share with you.
00:15 Super excited about the ones I got. How about you, Brian?
00:18 I'm really excited. I had to kick some out because I had too many things to cover.
00:22 I think I changed my list four times this week because I'm like, oh, this is a great list.
00:25 Oh, no, this one's more important. No, this is even better. It's awesome.
00:27 Yeah.
00:28 Yeah, so before we get to it, just want to say thanks to DigitalOcean for sponsoring this episode.
00:32 Check them out at do.co/python. Right now, I want to hear about PyPI, but there's something wrong with it. What's up here? Well, so I've had this on the list for a long time, a project called the dumb P-Y-P-I. So dumb PyPI or PyPI. I don't know. Anyway, it's not really that dumb, though. So There's a lot of local, so you can have your own repository.
00:57 So there's a bunch of different ways you can set up your own server so that you can serve your own packages.
01:03 Like if you've got a team or something that you've got proprietary code that you don't want to share with others on normal PyPI, you can have your own.
01:12 But you have to have a server running.
01:14 And there's a lot of the generation of the server code is tied to it.
01:19 So there's like a Flask version and there's various versions.
01:22 This one is just a flat file creator.
01:25 So this package, dumpypi will just take a directory full of wheels or zipped packages and create a directory that you can just stick on any server and have it be served up for an index.
01:43 For instance, I've got a, so it doesn't do any caching.
01:47 It doesn't go through to pypi and grab things that it's missing.
01:51 So you have to manually do that yourself.
01:53 But if we combine this with what we learned in episode 24, that you can just do pip download easily and download your own files somewhere.
02:02 This combined, I'm using this at work now to create a really simple PyPI server behind our firewall that doesn't have, I don't have to give it permission to talk to the outside world.
02:14 It's just a bunch of files.
02:15 It's actually really cool. So you could even put it up on like Amazon S3 or somewhere like that, right? Right and actually the there is an example on I think that is the example on the the website or the the package website Yeah, hub site does have an S3 example It's like super fast and slick and it doesn't do anything like updates or anything You have to rebuild everything yourself But if you're gonna you can set up a crown job or something to do some of this exactly just do it at night When nobody's around Yeah, just update it daily how often do these packages change right but like for instance I've got like all of our test code that we're creating virtual environments to and then pulling in test packages and different packages That stuff just I don't want it to update all the time I wanted to I wanted to grab certain versions that I know were there. So something like this is perfect Yeah, it looks really cool. I think it needs a better name than dumb PyPI.
03:13 Yeah, it does.
03:15 Clever but doesn't do anything PyPI. How about that?
03:18 No server, server.
03:20 Serverless PyPI. How about this? Come on.
03:23 Yeah.
03:24 Awesome. Okay. So the next thing I want to talk about is something for humans.
03:29 And if I said it was for humans, who would that mean?
03:31 Kenneth.
03:32 That's right. Kenneth Wright. So he's got all of his things for humans.
03:35 He's got Maya, DateTime for humans, Records, SQL for humans, obviously requests.
03:41 So he's out with a new human thing and this time for web scraping.
03:45 So he created this thing called requests HTML, HTML parsing for humans.
03:50 So when I looked at this, I thought, oh, is this maybe like a replacement for beautiful soup or something like that, like in some kind of extension to request.
03:59 But in fact, it actually depends upon beautiful soup.
04:02 Right.
04:03 It's a library that like puts a different API on top of combining requests plus beautiful soup plus something called pie query, which lets you run jQuery style CSS selectors.
04:16 So it does a bunch of cool stuff.
04:18 Some of the notable features are it has full JavaScript support, which I'm taking to mean that it will parse and execute the JavaScript necessary.
04:30 So if I hit like an AngularJS page, instead of just seeing curly brackets everywhere, there's data that would have gone in there, which is a big deal in web scraping.
04:38 Because if you just use straight up request plus beautiful soup, you just get the the markup where those bits would execute when it does, right?
04:45 Yeah, the CSS selectors, XPath selectors, mocked user agents.
04:51 So it pretends to be a real browser.
04:52 So people don't know that you're trying to scrape their sites, which is kind of interesting, uses connection pooling and cookie persistence.
04:59 So you can like log in and then go do a bunch of stuff at a site.
05:03 And you can do it without reconnecting every time.
05:06 So that's pretty cool.
05:07 Yeah, and it keeps the session open and tying requests with, I mean, that's what people often need did anyways, request plus beautiful soup, and writing it in with one, one API is great.
05:20 And actually, I like the idea anyway, if somebody saying, hey, these tools are great, but I wish the API was different.
05:26 So just write another package that uses others and write a better API then.
05:30 - Yeah, it's a little like Flask, what Flask did, but for requests and parsing.
05:34 - Kenneth is a great one for, he's a good eye for APIs.
05:38 - Yeah, that's for sure.
05:39 People definitely seem to love his APIs.
05:41 So I'll leave you with the final sort of tagline here from their website.
05:45 The request experience you know and love, but with magical parsing abilities.
05:49 - That's nice.
05:50 - Yeah, not bad, right?
05:51 Cool, so what's up with this phony number thing?
05:54 You got some more like prank calls to make?
05:57 - This was awesome.
05:58 So Twilio does their Twilio blog where people can write for them.
06:03 I think we've talked about it before.
06:05 They do a pretty cool program where they give you an editor even to help you out with it.
06:09 But this article is basically a, and you don't have to do a Twilio project, but this is a Twilio project.
06:16 This is a phone number proxy.
06:18 So the idea is you imagine a situation like for instance you've got a, I don't know, a meetup or some temporary event, and you want people to be able to text you because you're not gonna be around your computer all the time, you want people to be able to text you and you wanna text back, but you don't wanna give out your phone number.
06:37 Well, this project gives you a little proxy so that you can set it up with Flask and set up a server with Twilio and give out a temporary phone number and have it be attached to your phone.
06:48 And I'm gonna definitely have to try this out because it looks fun.
06:52 - Yeah, that looks really, really cool.
06:53 And I think that program they have is awesome.
06:55 One of the challenges of getting started blogging is nobody knows about you.
07:00 Nobody, like, you put all this effort into writing this thing and you put it out there and your 10 friends who are willing to follow your tech stuff off of Facebook glanced at it, right?
07:10 And so here's a way to like appear on a major, major blog and highlight what you're doing and maybe jumpstart your other tech stuff, right?
07:19 Like you could link back to your blog or something like this.
07:21 Having somebody work with you to polish it up a little bit is a good idea.
07:26 Often when you just tap your friends for that sort of help, they'll just tell you, "Oh, it looks great.
07:31 "Go ahead and put it up." - Yeah, yeah, very cool, very cool.
07:33 - But this project's also pretty neat.
07:35 It does encourage you to do some of the paid part of Twilio, but I think for something like this, it's a good idea.
07:42 - Yeah, very nice.
07:43 Good article.
07:44 All right, before we get to the next, let me just tell you about DigitalOcean.
07:48 They're doing some really amazing stuff.
07:50 So the thing I'd like to highlight is they just upgraded all of their things and left the price the same.
07:56 And by upgraded, I mean doubled all the stuff at least.
08:01 So for example, you go to DigitalOcean and get a Linux server with all variety of Linux machines, Linux distributions, with four gigs of RAM, two CPUs, 80 gigs of SSD for $20 a month.
08:16 Like that's insane.
08:17 Right?
08:18 That is a crazy thing.
08:19 That used to cost 40 bucks and they just said, nope, that's now 20 bucks.
08:23 And it comes with four terabytes of free traffic.
08:26 If I were to just transfer that over S3, which is nine cents a gigabyte, just that bandwidth would be $368 at S3.
08:35 That's included in your $20 server.
08:37 So really, really awesome stuff.
08:39 Check them out over at do.co/python.
08:42 And you know, you check out what they're doing, help support the show.
08:45 Everybody's getting good stuff.
08:47 So thanks, thanks DigitalOcean for that.
08:49 All right, I kind of want to just go on a Jupyter-like notebook rant for a while, Brian, because the news around this stuff is just coming in fast and furious.
09:01 So there are so many things going on with notebooks right now.
09:04 And like, this is a world I don't really live in.
09:07 I'm much more creative Python project and have like 10 related files and run stuff on the command line or my editor and not put it in these cells because that's just not my world.
09:17 But I see how powerful it is for people who are exploring data and being more iterative with their code.
09:24 And in the last couple of weeks, they've got a lot more options.
09:27 They've been in the news a lot right now.
09:28 So I'll start with one for this one.
09:30 And then we'll do another one in the final segment.
09:33 So for this one, I want to talk about something that's brand new called data lore.
09:37 Have you heard of data lore?
09:39 I've not.
09:40 You've heard of PyCharm, right?
09:42 So this is like PyCharm in a notebook, online, hosted.
09:47 So it's from the JetBrains guys.
09:49 It's just in the cloud, you just go sign up.
09:52 It has this intelligent editor just like JetBrains has, like, you know, IntelliJ plus PyCharm has with all of the, like the cool autocomplete and IntelliSense.
10:02 It comes like pre-installed with a bunch of stuff that you need, like Matplotlib and so on.
10:06 It has collaboration.
10:07 So you can log in and kind of like do Google Docs style, work on it together.
10:12 I don't know how real time it is.
10:14 Like, do you actually see every character going in?
10:16 Or do you have to refresh it?
10:19 Does it automatically refresh?
10:20 I'm not entirely sure the level of collaboration, but there's some real-time multiple people working on the same notebook type of collaboration.
10:28 I gotta check that out.
10:30 It has integrated version control, so you don't have to be, like, if you're a student, or you say you're an engineer, but you don't like, you're not like, get pushed on the command line type of competent, right?
10:40 You go there and just say, create me a save point, basically saves it and tags it so you can get it back, things like that.
10:46 - Oh, that's great. - Pretty cool.
10:47 - The JetBrains, like the diff viewer for version control is really great, so building that in here is cool.
10:54 - Yeah, they've got some really cool stuff.
10:55 And finally, this might be pretty big for some folks, depending on what you're doing, they have incremental calculations.
11:02 So you can, if you're doing machine learning and training and all sorts of analysis, and there's a bunch of cells that work together to generate that data, they actually have figured out how to track the dependencies between where that data comes from and you have to rerun the entire thing.
11:16 If you're changing your model, it only reruns the parts that have changed, that depend upon something you've changed.
11:22 - Oh, that's awesome.
11:23 - Yeah, it's pretty cool, right?
11:24 So if your computation takes two minutes, but this little part's really quick 'cause it uses mostly finished data, and that's a really big deal, I think.
11:31 So anyway, Datalore, it seems like it's in beta.
11:33 I don't know what it costs, if there's a free thing or whatever, but it's a Jupyter Notebook-like hosted service from JetBrains, I thought was pretty cool and worth talking about.
11:43 Yeah.
11:44 Neat.
11:44 Nice. I have no idea how to get started on this next one.
11:49 I'm just going to say the name, Belly Button.
11:51 Belly Button, yes. For personal lint.
11:54 What's up with this?
11:54 So yeah, I think it's a play on words around linters and where lint usually shows up.
12:02 So we have things like PyLint and Flake8, which in PyCode style, which used to be called Pep8, that I use all the time and love.
12:13 But there's times where you have extra requirements for your own team or for your own project and it'd be cool to have something like Pilot but just with your own rules in it.
12:25 That's where our belly button comes in.
12:28 It's a way to create rules around for static analysis or style.
12:34 One of the examples that I thought was great was, let's say you've got a library with some functions that you decide that your team uses, but you decided some of them are dumb and deprecate them.
12:46 Yeah, or maybe there's a better way to do things.
12:48 You can add some of these rules to Belly Button to say, hey, this code here, you need to change it this way and actually give exact examples of how somebody should change it.
12:59 And I think that's a really cool idea.
13:01 - Yeah, awesome, Belly Button.
13:02 - Wanted to bring that up.
13:03 - Yeah, it sounds really cool.
13:04 These linters are really great.
13:05 And I typically think of them in the context of like continuous integration and sort of team-wide things.
13:10 But yeah, here's a cool way to sort of make your own overrides and whatnot.
13:14 - Yeah, and any time where you've got like a coding style within your team, if you can automate it and take the person out of it and take that out of your code reviews, it helps with team dynamics to just have the computer say, "Hey, change this code," instead of having your coworkers keep telling you to change your code.
13:32 - Yeah, that's a really interesting dynamic, isn't it?
13:34 Like people are willing to take petty nitpicky criticism from robots and automated systems way more than from your manager or whoever.
13:46 - Yeah, and you can just, like we've already had the discussion about what our style is.
13:50 This is what it is.
13:52 I don't wanna keep opening up the discussion.
13:54 So just do it.
13:56 - Nice.
13:57 - Manager speak.
13:58 - That's right, cool.
13:59 All right, you ready for Notebooks Galore Part Two?
14:02 - Oh, more notebook news, yay.
14:04 Yes. So our friend, a friend of the show, Daniel Shorstein posted something on Reddit, some news that has to do with free hosted notebooks in Azure, right? This would be like pretty much a direct competitor to Datalore, right? So they are now supporting Python 3.6 Jupyter notebooks in Azure. And there's a nice conversation over on Reddit about that. And You go over and read more about it and so on.
14:33 So they had basically if you just drop in on notebooks.azure.com, then off you go.
14:40 You can go work with it right there.
14:42 And that's like straight up Jupyter Notebooks, I believe.
14:45 That's pretty cool, right?
14:46 Free, in the cloud, powered by Jupyter.
14:48 Like I'm telling you, this is like a space that is just like so blowing up right now.
14:52 - Yeah, we better pay attention to it more if people are fighting over it.
14:55 - Exactly, there's big companies fighting over it.
14:57 So speaking of big companies who wanna fight over it, Have you heard of CoLaboratory?
15:00 - No, a great word though.
15:01 - It is.
15:02 So this comes from a research, the research group at Google, colab.research.google.com.
15:08 And people, this has been around for a little while, and people have been kind of dissing on it a little bit because it had been just Python 2.
15:15 However, it is now Python supporting, not legacy Python, but modern Python.
15:21 So that's really cool.
15:24 And since the time that I took this note to talk to you about it today, And today, they now have also launched GPU support.
15:33 So you go to your notebook and you say, "I want to do some machine learning." Oh yeah, run this TensorFlow, this training process on a GPU and you can basically hit Command + Shift + P to make it run on a GPU.
15:47 Like how insane is that?
15:48 - That's cool.
15:49 - Okay, so that was pretty cool.
15:50 You ready for some more notebook news?
15:52 - Yes.
15:53 - JupyterLab is ready for users, is now open.
15:57 So what is JupyterLab?
15:58 So Jupyter is something based on Jupyter Notebooks, but it's more than just--
16:04 so we're going to have to put this with a grain of salt.
16:07 A lot of people out there know better than I do.
16:11 So it's like a hosted Jupyter Notebooks, which is really cool.
16:16 But it also enables you to use text editors, terminals, data file viewers, and all sorts of other stuff that's not just in the notebook.
16:25 So you could like SSH in and do stuff behind the scenes or something to this effect, right?
16:32 So they've got some cool pictures, like they have, it's almost like this crazy web IDE.
16:38 So you got like your files on the left, you got your standard notebook with graphs in the middle.
16:42 And then on the right, you might have like a map, a couple of JSON files and a CSV in like an Excel thing, all in the same window.
16:49 - Okay, well, that's neat.
16:51 - Yeah, and you can build like extensions and plugins.
16:53 So like that CSV thing is probably like a JupyterLab extension.
16:57 So yet another really cool thing going on there.
17:02 And I guess the final piece, a tip, maybe from the very first one from this segment is Daniel said, one thing that can happen is when you log into, say, like the Azure notebook, some of their dependencies are a little bit old, like pandas or matplotlib or something like that.
17:18 He shows you how to import pip and then execute pip inside your notebook to force it to upgrade the dependencies in your project.
17:26 Oh, okay.
17:27 And it's good that you're going to put the snippet in our notes.
17:30 Yeah, the snippet is in there, but you can basically, it shows you how to, from code, run pip to upgrade stuff, which I think is interesting and useful outside of just notebooks, but it happens to be, like if you don't get a remote into them, to the servers, you still want to upgrade stuff.
17:46 It's pretty helpful.
17:47 Yeah, nice.
17:48 - All right, whew, that's a lot of notebook news.
17:50 Probably have more next week.
17:51 - Probably.
17:52 - Probably, it's really cool though to see so much innovation and creativity around this stuff.
17:57 So it's kind of a paradox of choice problem going on.
18:00 Like if I wanted to get started, what the heck would I do?
18:03 But there's a bunch of good options here.
18:05 - Definitely.
18:06 - Awesome, all right, you got anything extra you want to let everyone know about this week?
18:08 - Just that maybe I should spend more time paying attention to Jupyter, but other than that, no.
18:13 - Yeah, Jupyter's pretty cool.
18:15 JupyterLab's exciting, Collaboratory's exciting, Notebooks on Azure is exciting, data lore is exciting.
18:20 Yeah, I'll have to pay more attention as well.
18:22 - Do you have any news?
18:23 - No news.
18:24 Well, when this episode goes out, there's a very good chance that I'll be at PyCon Slovakia.
18:31 And if I am, and you hear this, feel free to come say hi, that'd be cool.
18:33 - Neat.
18:34 - Yeah, so I think that's the right timing.
18:36 I'm pretty sure it will be.
18:37 I'll try to line it up that way.
18:39 All right, well, thanks for getting all this stuff together, Brian, this is great stuff.
18:42 - Yeah, thank you.
18:43 - Thank you for listening to Python Bytes.
18:46 Follow the show on Twitter via @pythonbytes, that's Python Bytes as in B-Y-T-E-S.
18:52 And get the full show notes at pythonbytes.fm.
18:55 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
19:00 We're always on the lookout for sharing something cool.
19:02 On behalf of myself and Brian Auchin, this is Michael Kennedy.
19:06 Thank you for listening and sharing this podcast with your friends and colleagues.