#68: Python notebooks galore!

Published Tue, Mar 6, 2018, recorded Sun, Mar 4, 2018

Sponsored by DigitalOcean! http://do.co/python

This takes some fiddling with and trial and error. I definitely need to write up my experiences with this as a blog post.
Combine with pip download (covered in episode 24), this makes it super easy to create a static locally hosted pypi server, either for all of your packages, or for your proprietary packages.

Roughly:

    pip download -d my-packages-dir &lt;package name>
    ls my-packages-dir > package-list.txt
    dumb-pypi --package-list my-packages-dir \
              --packages-url &lt;url of my server> \
              --output-dir my-pypi

Now add something like this to requirements.txt or pip commands:
--trusted-host <my server name> -i http://<my server>/my-pypi/simple

Michael #2: Requests-HTML: HTML Parsing for Humans

This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.
When using this library you automatically get:
- Full JavaScript support!
- CSS Selectors (a.k.a jQuery-style, thanks to PyQuery).
- XPath Selectors, for the faint at heart.
- Mocked user-agent (like a real web browser).
- Automatic following of redirects.
- Connection–pooling and cookie persistence.
- The Requests experience you know and love, with magical parsing abilities

Brian #3: A phone number proxy

Naomi Pentrel, @naomi_pen on twilio blog
Set up a phone number that you can share for temporary events to send and receive texts that get forwarded to your actual number.

Michael #4: Notebooks galore part 1: Datalore

In cloud and ready to go
Intelligent code editor
Out-of-the-box Python tools
Collaboration
Integrated version control
Incremental calculations: Improve and adjust models without hustling with additional recalculations. Datalore follows dependencies between multiple computations and automatically applies relevant recalculations.

Brian #5: bellybutton

by Chase Stevens, @hchasestevens
Tool for creating personal static analysis/style tools like pycodestyle, pylint, and flake8
Teams often have some of their own style requirements that can’t be expressed as flake8 flags and exceptions.
Example: deprecating internal library functions and catching that by the linter.

Michael #6:Notebooks galore part 2

Python 3.6 Jupyter Notebook on Azure
Google Colaboratory
JupyterLab is Ready for Users
- JupyterLab is an interactive development environment for working with notebooks, code and data. Most importantly, JupyterLab has full support for Jupyter notebooks. Additionally, JupyterLab enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.
you can pip install python packages within python code itself.
- Super useful in situation #1 when you need a package that's not included but you don't have access to the shell.
- If you need to upgrade a package. For example the Pandas version is a little old on Azure, so you can upgrade by simply running:
```
    import pip
    pip.main(['install', 'pandas', '--upgrade'])
```

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:04 It's episode 68, recorded February 28th, 2018.

00:09 I'm Michael Kennedy.

00:10 And I'm Brian Okken.

00:11 And we have yet another bundle of amazing stuff to share with you.

00:15 I'm super excited about the ones I got. How about you, Brian?

00:18 I'm really excited. I had to kick some out because I had too many things to cover.

00:21 I think I changed my list four times this week because I'm like, oh, this is a great list.

00:25 Oh, no, this one's more important. This is even better. It's awesome.

00:27 Yeah.

00:28 Yeah. So before we get to it, I just want to say thanks to DigitalOcean for sponsoring this episode.

00:32 Check them out at do.co slash Python.

00:34 Right now, I want to hear about PyPI, but there's something wrong with it.

00:39 What's up here?

00:40 Well, so I've had this on the list for a long time.

00:44 A project called the Dumb P-O-I-P-I.

00:47 So Dumb PyPI or PyPI. I don't know.

00:50 Anyway, it's not really that dumb, though.

00:52 So there's a lot of local.

00:55 So you can have your own repository.

00:57 So there's a bunch of different ways you can set up your own server so that you can serve your own packages.

01:02 Like if you've got a team or something that you've got proprietary code that you don't want to share with others on normal PyPI, you can have your own.

01:12 But you have to have a server running.

01:14 And there's a lot of the generation of the server code is tied to it.

01:19 So there's like a flask version and there's various versions.

01:22 This one is just a flat file creator.

01:25 So this package, Dumb PyPI, will just take a directory full of wheels or zipped packages and create a directory that you can just stick on any server and have it be served up for an index.

01:43 And for instance, I've got a...

01:44 So it doesn't do any caching.

01:47 It doesn't go through to PyPI and grab things that it's missing.

01:50 So you have to manually do that yourself.

01:52 But if we combine this with what we learned in episode 24 that you can just do pip download easily and download your own files somewhere.

02:01 This combined, I'm using this at work now to create a really simple PyPI server behind our firewall that doesn't have...

02:11 I don't have to give it permission to talk to the outside world.

02:14 It's just a bunch of files.

02:15 So...

02:15 It's actually really cool.

02:17 So you could even put it up on like Amazon S3 or somewhere like that, right?

02:22 Right.

02:22 And actually, there is an example on...

02:26 I think that is the example on the website or the package website.

02:31 GitHub site does have an S3 example.

02:33 It's like super fast and slick and it doesn't do anything like updates or anything.

02:39 You have to rebuild everything yourself.

02:41 But if you're going to...

02:42 You can set up a cron job or something to do some of this.

02:45 Exactly.

02:45 Just do it at night when nobody's around.

02:47 Yeah.

02:49 Just update it daily.

02:50 How often do these packages change, right?

02:52 But like, for instance, I've got like all of our test code that we're creating virtual environments to...

02:57 And then pulling in test packages and different packages.

03:00 That stuff just...

03:01 I don't want it to update all the time.

03:03 I want it to...

03:04 I want it to grab certain versions that I know are there.

03:07 So something like this is perfect.

03:09 Yeah.

03:09 It looks really cool.

03:10 I think it needs a better name than dumb PyPI.

03:12 Yeah.

03:13 Yeah, it does.

03:15 Clever, but doesn't do anything PyPI.

03:17 How about that?

03:18 No server.

03:19 Server.

03:20 Serverless PyPI.

03:22 How about this?

03:22 Come on.

03:23 Yeah.

03:23 Awesome.

03:24 Okay.

03:25 So the next thing I want to talk about is something for humans.

03:28 And if I said it was for humans, who would that mean?

03:31 Kenneth.

03:32 That's right.

03:32 Kenneth writes.

03:33 So he's got all of his things for humans.

03:35 He's got Maya, date time for humans, records, sequel for humans, obviously requests.

03:40 So he's out with a new human thing.

03:42 And this time for web scraping.

03:44 So he created this thing called requests HTML, HTML parsing for humans.

03:51 So when I looked at this, I thought, oh, is this maybe like a replacement for beautiful soup or something like that?

03:57 Like in some kind of extension to requests.

03:58 But in fact, it actually depends upon beautiful soup.

04:02 Right.

04:03 So what it is, it's a library that like puts a different API on top of combining requests plus beautiful soup, plus something called py query, which lets you run jQuery style CSS selectors.

04:17 So it does a bunch of cool stuff.

04:18 Some of the notable features are it has full JavaScript support, which I'm taking to mean that it will parse and execute the JavaScript necessary.

04:29 So if I hit like an AngularJS page, instead of just seeing curly brackets everywhere, there's data that would have gone in there, which is a big deal in web scraping.

04:37 Because if you just use straight up request plus beautiful soup, you just get the markup where those bits would execute when it does, right?

04:45 Yeah.

04:45 The CSS selectors, XPath selectors mocked user agents.

04:51 So it pretends to be a real browser.

04:52 So people don't know that you're trying to scrape their sites, which is kind of interesting.

04:55 It uses connection pooling and cookie persistence.

04:59 So you can like log in and then go do a bunch of stuff at a site.

05:02 And you can do it without reconnecting every time.

05:05 So that's pretty cool.

05:06 Yeah, and it keeps the session open and tying requests with, I mean, that's what people often did anyway is a request plus beautiful soup.

05:15 And tying it in with one API is great.

05:19 And actually, I like the idea anyway of somebody saying, hey, these tools are great, but I wish the API was different.

05:25 So just write another package that uses others and write a better API then.

05:30 Yeah, it's a little like Flask, what Flask did, but for requests and parsing.

05:34 Kenneth is a great one for, he's got good eye for APIs.

05:38 Yeah, that's for sure.

05:39 People definitely seem to love his APIs.

05:40 So I'll leave you with the final sort of tagline here from their website.

05:44 The request experience you know and love, but with magical parsing abilities.

05:48 That's nice.

05:49 Yeah, not bad, right?

05:51 Cool.

05:51 So what's up with this phony number thing?

05:54 You got some like prank calls to make?

05:56 This was awesome.

05:58 So Twilio does their Twilio blog where people can write for them.

06:02 And I think we've talked about it before.

06:05 They do a pretty cool program where they give you an editor even to help you out with it.

06:10 But this article is basically a, and you don't have to do a Twilio project, but this is a

06:15 Twilio project.

06:15 This is a phone number proxy.

06:17 So the idea is you imagine a situation like, for instance, you've got a, I don't know, a

06:23 meetup or some temporary event.

06:26 And you want people to be able to text you because you're not going to be around your

06:30 computer all the time.

06:31 You want to be able to people, people to be able to text you and you want to text back,

06:35 but you don't want to give out your phone number.

06:37 Well, this project gives you a little proxy so that you can set it up with Flask and set

06:41 up a server with Twilio and have give out a temporary phone number and have it be attached

06:47 to your phone.

06:48 And I'm going to definitely have to try this out because it looks fun.

06:51 Yeah, that looks really, really cool.

06:53 And I think that program they have is awesome.

06:55 One of the challenges of getting started blogging is nobody knows about you.

07:00 Nobody, like you'll put all this effort into writing this thing and you'll put it out there

07:03 and your 10 friends who are willing to follow your tech stuff off of Facebook glanced at it,

07:09 right?

07:10 And so here's a way to like appear on a major, major blog and highlight what you're doing

07:16 and maybe jumpstart your other tech stuff, right?

07:19 Like you could link back to your blog or something like this.

07:21 Having somebody work with you to polish it up a little bit.

07:25 Is a good idea.

07:25 Often when you just tap your friends for that sort of help, they'll just tell you, oh, it

07:31 looks great.

07:31 Go ahead and put it up.

07:32 Yeah, yeah.

07:32 Very cool.

07:33 Very cool.

07:33 But this project is also pretty neat.

07:35 It does encourage you to do some of the paid part of Twilio.

07:39 But I think for something like this, it's a good idea.

07:42 Yeah, very nice.

07:43 Good article.

07:44 All right.

07:44 Before we get to the next, let me just tell you about DigitalOcean.

07:48 They're doing some really amazing stuff.

07:50 So the thing I'd like to highlight is they just upgraded all of their things and left

07:56 the price the same.

07:56 And they, by upgraded, I mean doubled all the stuff at least.

08:00 So for example, you go to DigitalOcean and get a Linux server with all variety of Linux machines,

08:07 Linux distributions, with four gigs of RAM, two CPUs, 80 gigs of SSD for $20 a month.

08:15 Like that's insane.

08:16 Right?

08:17 That is a crazy thing.

08:19 And that used to cost $40.

08:20 And they just said, nope, that's now $20.

08:23 And it comes with four terabytes of free traffic.

08:26 If I were to just transfer that over S3, which is $0.09 a gigabyte, just that bandwidth would

08:33 be $368 at S3.

08:35 That's included in your $20 server.

08:37 So really, really awesome stuff.

08:38 Check them out over at do.co slash Python.

08:41 And you know, check out what they're doing.

08:44 Help support the show.

08:45 Everybody's getting good stuff.

08:46 So thanks to DigitalOcean for that.

08:48 All right.

08:49 I kind of want to just go on a Jupyter-like notebook rant for a while, Brian, because the

08:57 news around this stuff is just coming in fast and furious.

09:00 So there are so many things going on with notebooks right now.

09:04 And like, this is a world I don't really live in.

09:06 I'm much more a Creative Python project and have like 10 related files and run stuff on the

09:12 command line or my editor and not put it in these cells because that's just not my world.

09:17 But I see how powerful it is for people who are exploring data and being more iterative

09:22 with their code.

09:23 And in the last couple of weeks, they've got a lot more options.

09:27 They've been in the news a lot right now.

09:28 So I'll start with one for this one.

09:30 And then we'll do another one in the final segment.

09:32 So for this one, I want to talk about something that's brand new called Datalore.

09:37 Have you heard of Datalore?

09:39 I have not.

09:39 You've heard of PyCharm, right?

09:41 So this is like PyCharm in a notebook, online, hosted.

09:47 So it's from the JetBrains guys.

09:49 It's just in the cloud.

09:50 You just go sign up.

09:51 It has this intelligent editor, just like JetBrains has.

09:55 Like, you know, IntelliJ plus PyCharm has with all of the like the cool autocomplete and IntelliSense.

10:01 It comes like pre-installed with a bunch of stuff that you need, like Matplotlib and so on.

10:06 It has collaboration.

10:07 So you can log in and kind of like do Google Docs style, work on it together.

10:12 I don't know how real-time it is.

10:14 Like, do you actually see every character going in?

10:16 Or do you, you know, do you have to refresh it?

10:19 Does it automatically refresh?

10:20 I'm not entirely sure the level of collaboration, but there's some real-time multiple people working on the same notebook type of collaboration.

10:27 I got to check that out.

10:29 It has integrated version control.

10:31 So you don't have to be like if you're a student or you say you're an engineer, but you don't like, you're not like get pushed on the command line type of competent, right?

10:40 You go there and just say, create me a save point.

10:42 It basically saves it and tags it so you can get it back.

10:45 Things like that.

10:46 Oh, that's great.

10:47 Pretty cool.

10:47 The JetBrains, like the diff viewer for version control is really great.

10:51 So that, building that in here is cool.

10:53 Yeah, they've got some really cool stuff.

10:55 And finally, this might be pretty big for some folks, depending on what you're doing.

10:59 They have incremental calculations.

11:01 So you can like, if you're doing like machine learning and training and all sorts of analysis,

11:05 and there's a bunch of cells that work together to generate that data, they actually have figured out how to track the dependencies between where that data comes from.

11:14 And you have to rerun the entire thing.

11:15 If you're changing your model, it only reruns the parts that have changed, that depend upon something you've changed.

11:22 Oh, that's awesome.

11:23 Yeah, it's pretty cool, right?

11:24 So if your computation takes two minutes, but this little part's really quick because it uses mostly finished data,

11:29 that's a really big deal, I think.

11:30 Yeah.

11:31 So anyway, data lore, it seems like it's in beta.

11:33 I don't know what it costs, if there's a free thing or whatever.

11:36 But it's a Jupyter Notebook-like hosted service from JetBrains, which I thought was pretty cool and worth talking about.

11:43 Yeah.

11:44 Neat.

11:44 Nice.

11:45 I have no idea how to get started on this next one.

11:49 I'm just going to say the name, Belly Button.

11:51 Belly Button, yes.

11:52 For personal lint.

11:54 What's up with us?

11:56 So, yeah, I think it's a play on words around, like, linters and where lint usually shows up.

12:03 So we have things like pylint and flake 8, which in PyCode style, which used to be called Pep8, that I use all the time and love.

12:12 But there's times where you have, like, extra requirements for your own team or for your own project.

12:19 And it'd be cool to have, like, something like pylint, but just with your own rules in it.

12:25 And that's where Belly Button comes in.

12:27 So it's a way to create rules around for static analysis or style.

12:34 And one of the examples that I thought was great was, let's say you've got a library with some functions that you decide that your team uses, but you decided some of them are dumb and deprecate them.

12:46 Yeah, or maybe there's a better way to do things.

12:47 You can add some of these rules to Belly Button to say, hey, this code here, you need to change it this way.

12:55 And actually give exact examples of how somebody should change it.

12:59 And I think that's a really cool idea.

13:01 Yeah, awesome.

13:02 Belly Button.

13:02 I wanted to bring that up.

13:03 Yeah, it sounds really cool.

13:04 These linters are really great.

13:05 And I typically think of them in the context of, like, continuous integration and sort of team-wide things.

13:10 But, yeah, here's a cool way to sort of make your own overrides and whatnot.

13:14 Yeah, and any time where you've got, like, a coding style within your team, if you can automate it and take the person out of it and take that out of your code reviews, it helps with team dynamics to just have the computer say, hey, change this code instead of having your coworkers keep telling you to change your code.

13:32 Yeah, that's a really interesting dynamic, isn't it?

13:34 Like, people are willing to take petty, nitpicky criticism from robots and automated systems way more than from your manager.

13:44 Or whoever.

13:45 Yeah, and you can just, like, we've already had the discussion about what our style is.

13:50 This is what it is.

13:51 I don't want to keep opening up the discussion.

13:53 So, just, you know, do it.

13:55 Nice.

13:56 Manager speak.

13:57 That's right.

13:58 Cool.

13:59 All right.

13:59 You ready for Notebooks Galore Part 2?

14:01 Oh, more notebook news.

14:03 Yay.

14:04 Yes.

14:04 So, our friend, our friend of the show, Daniel Schorstein, posted something on Reddit, some news that has to do with free hosted notebooks in Azure, right?

14:16 This would be, like, pretty much a direct competitor to Datalore, right?

14:20 So, they are now supporting Python 3.6 Jupyter Notebooks in Azure.

14:26 And there's a nice conversation over on Reddit about that.

14:29 And you go over and read more about it and so on.

14:33 So, they have, basically, if you just drop in on notebooks.azure.com, then off you go.

14:40 You can go work with it right there.

14:42 And that's, like, straight up Jupyter Notebooks, I believe.

14:44 That's pretty cool, right?

14:46 Free, in the cloud, powered by Jupyter.

14:48 Like, I'm telling you, this is, like, a space that is just, like, so blowing up right now.

14:51 Yeah.

14:52 We better pay attention to it more if people are fighting over it.

14:55 Exactly.

14:55 There's big companies fighting over it.

14:57 So, speaking of big companies that want to fight over it, have you heard of Co-Laboratory?

15:00 No.

15:00 A great word, though.

15:01 It is.

15:02 So, this comes from a research, the research group at Google, colab.research.google.com.

15:08 And people, this has been around for a little while, and people have been kind of dissing on it a little bit because it had been just Python 2.

15:15 However, it is now Python supporting, not legacy Python, but modern Python.

15:21 So, that's really cool.

15:23 And since the time that I took this note to talk to you about it today, and today, they now have also launched GPU support.

15:32 So, you go to your notebook, and you say, I want to do some machine learning.

15:36 Oh, yeah.

15:37 Run this TensorFlow, this training process on a GPU.

15:42 And you can basically hit Command-Shift-P to make it run on a GPU.

15:47 Like, how insane is that?

15:48 That's cool.

15:49 Okay.

15:49 So, that was pretty cool.

15:50 You ready for some more notebook news?

15:52 Yes.

15:53 JupyterLab is ready for users.

15:56 It's now open.

15:57 What is JupyterLab?

15:58 So, Jupyter is something based on Jupyter Notebooks, but it's more than just – so, we're going to have to put this with a grain of salt.

16:07 Probably a lot of people out there know better than I do.

16:10 But so, it's like a hosted Jupyter Notebooks, which is really cool.

16:15 But it also enables you to use text editors, terminals, data file viewers, and, like, all sorts of other stuff that's not just in the notebook.

16:24 So, you could, like, SSH in and do stuff behind the scenes or something to this effect, right?

16:31 So, they've got some cool pictures.

16:34 Like, they have – it's almost like this crazy web IDE.

16:38 So, you've got, like, your files on the left.

16:40 You've got your standard notebook with graphs in the middle.

16:42 And then on the right, you might have, like, a map, a couple of JSON files, and a CSV in, like, an Excel thing all in the same window.

16:49 Okay.

16:50 Well, that's neat.

16:50 Yeah.

16:51 And you can build, like, extensions and plugins.

16:53 So, like, that CSV thing, it's probably, like, a JupyterLab extension.

16:56 Nice.

16:57 So, yet another really cool thing going on there.

17:01 And I guess the final piece, a tip, maybe from the very first one from this segment is, Daniel said, one thing that can happen is when you log into, say, like, the Azure notebook, some of their dependencies are a little bit old, like Pandas or Matplotlib or something like that.

17:18 He shows you how to import pip and then execute pip inside your notebook to force it to upgrade the dependencies in your project.

17:26 Oh, okay.

17:27 And it's good that you put – you're going to put the snippet in our notes.

17:30 Yeah, the snippet is in there.

17:32 But you can basically – it shows you how to, from code, run pip to upgrade stuff, which I think is interesting and useful outside of just notebooks.

17:41 But it happens to be, like, if you don't get a remote into them, to the servers, you still want to upgrade stuff.

17:45 It's pretty helpful.

17:46 Yeah, nice.

17:47 Cool.

17:48 All right.

17:48 Whew.

17:48 That's a lot of notebook news.

17:49 We'll probably have more next week.

17:50 Probably.

17:51 Probably.

17:52 It's really cool, though, to see so much innovation and creativity around this stuff.

17:57 So it's kind of a paradox of choice problem going on.