Transcript #53: Getting started with devpi and Git Virtual FS
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.
00:05 This is episode 53, recorded November 21st, 2017.
00:10 I'm Michael Kennedy.
00:11 And I'm Brian Okken.
00:11 And we've got a ton of cool things we picked for you today.
00:14 Well, six plus a few more, but it feels like a lot of good stuff to share with you guys.
00:18 So I'm looking forward to it.
00:20 How about you, Brian?
00:20 I'm really looking forward to it, yeah.
00:22 Yeah, definitely.
00:22 So before we get into that, let's say thank you, Rollbar.
00:25 If you think there are errors lurking in your app and you want to get notified right away, go to pythonbytes.fm/rollbar and check it out.
00:33 Tell you more about that right now.
00:34 I want to know your philosophy on line length.
00:37 Are you a strictly 79 or less sort of person, Brian?
00:40 I'm trying to do the 79 thing, but it's really short.
00:43 So we do like 120 in my work group at work.
00:47 You guys, what, use like 44-inch HD TVs for your monitors or what?
00:52 It's still pretty good to have like, I like a little bit shorter so that you can put a whole bunch of, you can do side-by-side diffs easier and stuff.
01:00 Yeah, for sure.
01:00 But 79 is really tight.
01:02 It is tight.
01:03 How about you?
01:03 What do you use?
01:04 I guess I do stick to 79 pretty much.
01:07 You know, if the editor says, hey, this one's too long, you should reformat it according to Pip-Aid.
01:11 I guess I do.
01:12 But I feel like it has a tendency to put pressure on you to make bad decisions.
01:17 For example, if you have like an expression involving like, say, five variables and like a string, like you're, say, formatting a string.
01:24 And it would encourage you to have those variables super short and nondescriptive so they fit within 79.
01:30 But if they're long and descriptive, that might be 100, right?
01:33 And so I feel like there's this pressure, but I guess I succumb to it anyway.
01:35 Things that I share on GitHub or something, I try to keep it to 79, but I don't know if it's a good idea or not.
01:43 Mostly because I do testing and stuff, people will run a flake 8 over my code and say, hey, dude, how come your code doesn't clean?
01:49 Exactly.
01:50 You failed the build.
01:51 Isn't clean.
01:51 Yeah.
01:51 So there's an article from Jake Vanderplass.
01:56 He's the astronomer guy that did PyCon talk, did the keynote.
02:01 Yeah, he did a keynote there, and I think he also did another talk.
02:03 But yeah, it was great.
02:04 He's up at University of Washington doing all sorts of cool astronomy stuff.
02:08 So what do you have to say about line lengths?
02:09 Because of the switch of Twitter between 140 and 280 that they've done, he was intrigued by looking at the statistics and did an exploration of line lengths in Python packages.
02:23 And he did it like a Jupyter Notebook article so that you can kind of follow through all of his stuff and mostly looking at NumPy, SciPy, Pandas, CyCentLearn, Matplotlib, and AstroPy.
02:35 So I didn't know about AstroPy, but that makes sense because I'm not an astronomer.
02:40 Yeah.
02:41 How often do you analyze telescopic images with machine learning?
02:46 So far, zero times.
02:47 It sounds fun, though, doesn't it?
02:49 Yeah.
02:49 But it's kind of a neat look at basically, I wouldn't know how to do this right off the bat.
02:53 But it's pretty simple to write a little bit of code to import a bunch of modules and check out the line lengths and examine that and graph it and plot it and clean it up.
03:04 It's a pretty cool article.
03:06 And then also just sort of looking at it, it looks like most of them, they follow a distribution, a...
03:13 Normal distribution?
03:14 It's not exactly normal, but it's...
03:16 An abnormal distribution?
03:17 An abnormal.
03:18 A log normal distribution.
03:20 That's it.
03:20 Oh, wow.
03:21 Okay.
03:21 That's a little bit more statistics than I understand, but it's sort of normal, I guess.
03:26 But it follows a log normal distribution, and except for there's an artificial bump near the right side, the 80 character side, because many of these packages are trying to hit 80 or less.
03:41 But there's an argument there for you don't really need it, because code naturally fits anyway.
03:48 It's a cool look at it.
03:50 I was thinking about using the code within this to take a look at our code at work to see where our line lengths are at work.
03:58 Yeah.
03:59 That'd be an interesting analysis to run some PEP 8-style metrics across your organization.
04:05 Yeah.
04:05 You know what I think people should do?
04:06 Someone, some enterprising listener out there should build, like, a little package we can all drop in and do cool stuff like that with.
04:12 Yeah.
04:12 And at the end of the article, he does ask, he's curious about what different popular packages, where they fit into the line length distribution match.
04:21 So that'd be neat.
04:22 Right.
04:22 In other languages, like, how does this compare to, say, JavaScript versus C++ versus Python?
04:28 Things like that.
04:28 Also interesting to know, but I don't have those answers, so they're open questions for now.
04:32 So it's a good day.
04:34 Yet another good day for modern Python.
04:37 And, you know, sort of the sun continues to set on legacy Python.
04:40 This time around, very, you know, you mentioned this package just previously, NumPy.
04:45 Yeah, there's some interesting news with NumPy.
04:47 Yeah.
04:47 So NumPy is dropping support for legacy Python.
04:50 And they say, you know, we know that the Python core developers are dropping support for Python 2 in 2020.
04:58 It's still an open question on the day.
05:01 I like that guy who voted for the keynote of PyCon 2020 as the official end date.
05:07 But who knows what day it is?
05:09 It hasn't been officially announced.
05:10 But they say, basically, this requirement to continue supporting Python 2 makes it harder and harder to advance NumPy.
05:18 And so they're going to drop it.
05:20 I think that's great.
05:20 I can see that.
05:21 It's such an important library.
05:23 And, you know, data science is definitely moving towards Python 3.
05:26 And so their plans are December 31st, 2018.
05:30 Up until then, they're going to support Python 2 and Python 3 100%.
05:35 And that's not very far away.
05:36 What is that, like 41 days?
05:38 No, that's 41 days in a year.
05:39 So a little bit of time on that one.
05:42 And then January 2019, all new features will be Python 3 only.
05:49 And then the year after that, I guess, when Python 2 support goes out, it probably goes out of here as well.
05:54 It isn't just a spiteful thing.
05:56 You've got real reasons to do it because the increased burden of trying to be Python 2 compatible is unreasonable.
06:04 Yeah, it definitely means it's like there's features that are not in NumPy because it works on Python 2.
06:09 Right?
06:10 So it's time to say thank you, but goodbye to Python 2, they say, which is, I think, great.
06:15 Speaking of data science, one thing I've tried to learn a lot but haven't done a great job of is pandas.
06:21 Actually, pandas and kind of the whole data science tool chain, it's something I'm curious about, but I'm not sure how to go about it.
06:28 So I really liked this article from Ted Petrou about how to learn pandas and how to go about it.
06:35 His opinion, of course, but it seems like a really pretty reasonable thing.
06:39 He was recommending some of the learning, reading the documentation and reading about pandas and how it works.
06:48 But then also kind of jumping back and forth and using it for small projects.
06:52 And I guess with any tool, that makes sense.
06:57 But there is some, he gives a little bit more, I guess, more details of how to do that so that you can jump back and forth and know what to learn first.
07:06 Yeah, I think one of the challenges that I have learning pandas, like I can sort of do a few things with it, but not a lot, is I don't really have a project to use it on.
07:14 Like I just kind of poke at it and go, oh, okay, it does this cool stuff.
07:17 But, you know, like I just haven't done like data science-y things or financial analysis things.
07:22 So he talks about things like here's some Jupyter Notebooks, here's some Kaggle kernels and data sets in the form of, these are data sets in the form of Jupyter Notebooks.
07:31 So some concrete ways to play with it, not just, you know, fire it up and poke at the API.
07:36 Yeah, or maybe go back to that Jake article and examine your line lengths.
07:42 Exactly.
07:42 There's an example.
07:44 And then one of the things I thought was a nice ending is when you think you have it fairly well, go a little bit further and then start answering some questions on Stack Overflow and kind of measure yourself against the other things that people are running into problems with.
07:59 I think that's a cool idea.
08:00 That is a cool idea.
08:01 And the people on Stack Overflow will let you know if you're wrong.
08:03 Yeah, definitely.
08:05 It's one of the nice and not nice things about the internet is the best way to find out whether you're right about something is to post the wrong answer.
08:13 Yeah, people don't really hold back on you too often, do they?
08:16 Yeah, no, no, you get that right away.
08:18 Yeah, if you have a thick skin or if you're willing to grow a thick skin, then that's actually a great way to do it.
08:23 Yeah.
08:23 Reddit would probably also work too.
08:25 Also, I'm sure the data science people are similar, but the Python community as a whole is fairly gentle with people.
08:33 They'll tell you you're wrong, but they'll be nice about it and probably use more words than you've written to explain something to explain why you're wrong about it.
08:42 Yeah, maybe they'll have a good explanation of your misunderstanding and you can connect some more dots, right?
08:47 I depend on that a lot.
08:48 Nice.
08:50 All right, before we get to the next one, which is some more social coding stuff, I just want to say thank you to Rollbar.
08:56 If you have a web application and it's running on the internet, it's probably crashing at some point.
09:02 And it would be great to know about that.
09:04 Like, how often do you go back and read logs?
09:06 Like, do you go and read logs at your work very often, Brian?
09:09 Actually, more than I want to.
09:10 Yes.
09:11 I'm in a manager role, so I get to tell other people to do it.
09:14 Here's a problem in the log.
09:15 Go fix that.
09:16 Yeah.
09:16 But you don't want to have to depend on reading that, right?
09:19 If you could avoid it and just get the notifications right away, that'd be awesome.
09:21 So Rollbar actually, I normally talk about it in the context of Python, and that's totally true, but it actually supports 26 languages and frameworks.
09:29 So Python, obviously, Flash, Django, Pyramid, et cetera.
09:32 But Node.net, it even has a Flash plugin in client-side JavaScript.
09:37 So totally cool.
09:39 Like, whatever you're using, you can use Rollbar.
09:40 It's awesome.
09:40 And they have this thing called people tracking.
09:42 So, for example, on, like, my training site, people are logged in.
09:46 And if there's a crash, I can emit a little thing that will tell Rollbar, this is the user that had this error.
09:53 So not only do I know what the error was, I can actually go back and send that person a message, say, I saw you run into a crash, and here's how I fixed it.
09:59 Like, whoa, I didn't even tell you what happened.
10:01 That's kind of creepy but awesome.
10:02 So anyway, if you want to be creepy and awesome, check out Pyramid.fm slash Rollbar and solve the problem before your users even tell you about them.
10:10 All right.
10:10 So one of the things that came out recently was an announcement from Microsoft and GitHub.
10:17 I'm not sure what the order of, but this sort of came out.
10:21 But it started, I think it started at Microsoft.
10:24 And they want to use Git.
10:26 Okay.
10:27 So everybody wants to use Git because Git is awesome.
10:29 But the problem is they actually have some pretty large projects.
10:34 And it turns out they tried to use Git, and it was basically unusable for some of their projects at Microsoft.
10:40 So, Brian, you're probably thinking Git was built for Linux, and Linux is a huge project, right?
10:45 Yeah.
10:46 Yeah.
10:46 So what's up with these Microsoft people?
10:48 They must be doing it wrong.
10:49 And I kind of actually thought that when I read this first as well.
10:51 But it turns out if you look at the Linux kernel, it's like 640 megs of data in the source code repository in Git.
11:00 That's big, right?
11:02 That's quite big.
11:03 But it turns out that if you look at the Visual Studio tools, those are 3 gigabytes, which is five times bigger than Linux.
11:10 And they're trying to use it for that.
11:12 And that was kind of a little sketchy.
11:14 But then they wanted to use it for Windows.
11:16 And apparently the repository for Windows is 270 gigabytes or 421 times larger than Linux.
11:24 Wow.
11:24 No wonder it's slower.
11:25 That's a little bit bigger.
11:26 And there's 4,000 people committing to it all day as their job, right?
11:30 So it's got a lot of contention as well.
11:32 And so what they've done in the announcement is Microsoft and GitHub team up to create a Git virtual file system.
11:41 And the GitHub part is mostly to make this work on other platforms, macOS and Linux and things like that.
11:47 So what they did is they said, look, the problem is we literally have like, I don't know how many million, thousands, maybe millions of files when we do a checkout.
11:58 So when did it like a regular Git checkout, it would take 12 hours to clone the repository, three hours to do just a straight checkout of a branch, eight minutes to ask Git status, and 30 minutes to commit like one file.
12:12 So it was pretty broken.
12:13 And they said the reason it's broken primarily is there's like all these files.
12:17 And generally, you're only working with like a little sub part of them.
12:21 So what they did is they created a virtual file system that understands Git repositories.
12:25 And it only checks out like a metadata list, like a directory listing.
12:29 Wow, cool.
12:29 And then if you interact with it, it basically will create those files by getting them from the server on demand.
12:35 And it doesn't have to be like some plugin.
12:37 It's like at the file system level.
12:39 So if I open up like command prompt or I open up some editor, I just type like GCC and it has to touch like 10 files, like that will automatically get them from Git if they weren't there.
12:49 Isn't that crazy?
12:50 It sounds a lot like ClearCase before ClearCase started to suck.
12:54 Yeah, exactly.
12:55 So they built this for Windows and they got really good success.
12:59 They said instead of 12 hours to clone it, it takes 90 seconds.
13:01 Instead of eight minutes to do a Git status, it takes three seconds.
13:05 Instead of 30 minutes to do Git commit, it takes eight seconds.
13:07 And so they've actually been pushing about half of these changes back upstream into Git.
13:14 And they've been working with the Git developers to make this a general thing, not a Microsoft thing,
13:19 which I think that's pretty noble.
13:20 That's definitely like a new Microsoft, not the old Steve Ballmer Microsoft.
13:24 Is it just for GitHub or can we use it with other Git?
13:27 This is just purely for Git.
13:29 So they're pushing this back to the Git developers, not for GitHub.
13:34 But where GitHub comes into this is GitHub, maybe they have this problem for projects hosted on GitHub, but people are already using those projects on GitHub.
13:43 So it's probably okay.
13:44 But they're trying to sell enterprise GitHub, which is like a box you put in your company to run those things.
13:51 And these enterprise projects can be like huge, like this Windows problem.
13:55 And so GitHub is trying to basically expand this to Linux and macOS so that they can make that part of their enterprise story.
14:02 That'd be cool.
14:03 I'd like to have it be part of the GitLab experience as well.
14:07 That'd be good.
14:07 Yeah, absolutely.
14:08 Yeah.
14:09 So hopefully this makes it back into Git proper.
14:11 And then the OS support can come from Microsoft and GitHub.
14:15 That'd be awesome.
14:15 Yeah, this is pretty cool, actually.
14:17 I'll keep an eye on this.
14:18 Yeah, yeah.
14:18 We'll see where it goes.
14:19 But they've already got demos and stuff working for Microsoft Windows.
14:22 And there's actually a 10-minute little video, like as they work through this stuff, you can check it out.
14:26 It's really short.
14:27 I'd link that as well.
14:28 Speaking of downloading stuff from servers and getting your libraries all put together.
14:32 I don't know if I'm just dense or what.
14:34 But the multiple times I've tried to set up a DevPy server for caching PyPI stuff locally, and mostly I need to do this partly because of setting up, you know, if you want to do a laptop setup for while you're on the plane or something, but also behind a firewall so I can have my build server not have to go outside the firewall and stuff like that.
14:57 I'd like to have a local one.
14:59 And I ran across this article.
15:01 I haven't actually gone through it.
15:02 I was going to do that this morning.
15:03 But it looks pretty good from Stefan Scherfke that's getting started with DevPy.
15:09 And it walks through.
15:10 Basically, he had the same thing.
15:13 He needed to set it up a local server again.
15:16 Couldn't remember how to do it.
15:18 The documentation is okay, but still has some issues.
15:22 And so he just sort of walks through the whole thing and shows you how to do it in at least one use case, which is pretty close to what I think most people need, which is mostly mirroring.
15:33 Entering the packages from PyPI that your company actually uses.
15:37 Not everything, just the stuff you're using.
15:39 And then also being able to store your own local things there.
15:43 Yeah, that's a great combination.
15:44 Yeah, that's a great combination.
15:44 I think the caching bit is really nice.
15:47 Like you can just point at this thing and it'll just pass through and get the ones from the full PyPI, right?
15:53 And then you can tell it to refresh occasionally and stuff.
15:55 And then you can also just push up your own local ones so that you can share your own stuff around.
16:02 I think that's a really great thing that probably not too many organizations are doing.
16:06 If you have different teams working on different packages, like you can actually publish it to like your company through these things, which is pretty awesome.
16:14 We also have a PyPI whitelist.
16:17 So that might be really positive given some of the recent security scares we've had there, right?
16:24 Depending on how paranoid you are.
16:25 Part of the article is talking about user management.
16:28 For me, I'd probably set up things for all my local dev team plus the build to be able to get things.
16:35 But he was having it locked down to just the build server being able to do it, which is an interesting idea as well.
16:42 Nice.
16:42 So the last thing I want to cover this week is what I think a lot of people who are developers or work for a company building a product that are kind of new to it, sort of a technical company, maybe miss, which is the whole marketing side of software.
17:01 Right?
17:01 Like the hardest thing about making something successful, if it's a web app or it's a regular app or it's a SaaS thing or whatever, is not building it.
17:10 Building it may be challenging, but that is not the hardest thing.
17:12 The hardest thing is getting people to notice it in a busy world and getting the word out.
17:17 The whole marketing side of stuff that most of us developers are not super good at.
17:21 So there's this GitHub repository called Marketing for Engineers.
17:25 And it's a curated collection of marketing articles and tools to grow your product.
17:30 That's nice.
17:31 Yeah.
17:31 Isn't that cool?
17:31 So these guys, they created some kind of iOS app and they're like, it took us almost two years to learn how to market our project.
17:38 It was painful.
17:38 So we're trying to help that.
17:40 So they said, look, we're going to come up with a bunch of resources that help you solve practical marketing tasks, such as finding beta users, growing your first user base, advertising your product without a budget, all those different things.
17:54 So they have a whole bunch of different areas that if you're new to this, you can really learn a lot from.
18:00 Like how to market on social media, where are the right places, how to leverage Quora, how to leverage product hunt and business models, all kinds of stuff.
18:08 So I thought that might be useful.
18:10 There's about 4,000 people who have started on GitHub.
18:13 They probably also thought it was useful.
18:14 It's a huge list.
18:16 Yeah.
18:16 It's massive.
18:17 One of the things on there that I saw near the top is doing things that don't scale, which I love that advice.
18:24 Yeah, I do.
18:25 I like that as well.
18:26 Yeah, definitely.
18:27 Do things that don't scale.
18:28 As I was writing the pytest book, I tried to help out as many people as possible on the Slack channel.
18:34 And even if it meant a couple of times, I just asked people, hey, are you available?
18:40 Can I just call you on the phone?
18:41 I just talked to people about their issues with pytest and with testing.
18:45 Now, clearly, you can't do that on a huge scale.
18:47 But when you don't have any users at all yet, it's pretty easy.
18:51 Yeah, for sure.
18:51 And the behavior creates super advocates for you.
18:54 And it also lets you realize some of the challenges.
18:57 So maybe in the final version of your book, it reflects some of those challenges that that one person had.
19:03 But maybe there's 1,000 or more people who actually have it.
19:06 They didn't call you because they just read your book because you already got it, right?
19:09 I love this because a lot of us nerds didn't become nerds because we really like talking with people.
19:15 I used to laugh at the people in business school.
19:16 Now I'm kind of like, huh, they probably know something, don't they?
19:18 Yeah.
19:19 Well, those guys don't know calculus like nothing.
19:23 Oh, I see how it's going for them.
19:25 All right.
19:25 Anyway.
19:25 Awesome.
19:26 So that's it for this week.
19:27 Those are tons of fun things.
19:29 Thanks for sharing them, Brian.
19:31 You have one more bit of crazy sort of American flavored shopping madness around Python for us, right?
19:38 Yeah, I guess I forget that.
19:39 Yeah, there's plenty of listeners outside of America.
19:42 But one of the traditions we have is a Black Friday sale, which has spilled over into online things as well.
19:50 So starting the day after Thanksgiving usually.
19:52 But we're doing it, I think, a little early here.
19:55 Maybe not.
19:56 If anybody doesn't know, I wrote a book.
19:59 I've been talking about it for a year, so you probably do.
20:01 But the Python testing with pytest is through Pragmatic.
20:05 And Pragmatic has a book sale going on the 22nd through December 1st.
20:11 And you get 40% off all e-books.
20:13 That is awesome.
20:14 Yeah.
20:14 So get in there and get it.
20:15 The reviews are awesome for that book.
20:16 Is this a global thing?
20:18 Even though it's the sort of terminology and date is U.S. inspired, can people all over the world come and get it for 40% off, whatever it is?
20:25 Yeah.
20:25 To get the discount, just use coupon code TURKEYSALE2017.
20:31 Awesome.
20:32 All right.
20:32 Well, go get that book.
20:34 You've been on the shelf.
20:35 On the fence, if you've been on the shelf.
20:37 One more thing that just came up.
20:39 I had somebody actually from the Testing Slack channel again ask me if I could mention PyCon Colombia.
20:47 So tickets are available.
20:49 They're going to have their first Colombia PyCon in Medellin in February 9, 10, and 11 of 2018.
20:56 So we'll put a link in, but it's pretty easy to find.
20:59 So that'll be fun.
21:01 Yeah.
21:01 Awesome.
21:01 Check it out.
21:02 If you're down in South America, it could be a good time.
21:05 Or if you want to go visit there, right?
21:06 How about you?
21:06 Do you have any news to share with us?
21:08 I have no news.
21:09 There's no news for me.
21:10 I'm actually working on some stuff.
21:13 I don't want to announce it yet, but absolutely got some cool things I'm working on.
21:17 Always trying to juggle too much, which is kind of the curse of my personality, but it's fun.
21:22 You're doing a lot of cool stuff, though.
21:23 I can't wait to see.
21:23 Oh.
21:24 Yeah.
21:24 Thanks.
21:25 Back on the PyCon Colombia thing, they have a really cool logo.
21:29 So if anybody's going to that, if you could snag me a t-shirt, that would be cool.
21:33 Yeah.
21:33 Order the t-shirt.
21:34 They come with a logo.
21:35 Well, thanks for talking to me this year.
21:38 You bet.
21:38 Great to chat with you, Brian.
21:39 And everyone as well.
21:42 Thank you for listening.
21:42 See you later.
21:45 Thank you for listening to Python Bytes.
21:47 Follow the show on Twitter via at Python Bytes.
21:49 That's Python Bytes as in B-Y-T-E-S.
21:52 And get the full show notes at pythonbytes.fm.
21:56 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.
22:00 We're always on the lookout for sharing something cool.
22:03 On behalf of myself and Brian Okken, this is Michael Kennedy.
22:06 Thank you for listening and sharing this podcast with your friends and colleagues.