Transcript #158: There's a bounty on your open-source bugs!
Return to episode page view on github00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly
00:04 to your earbuds. This is episode 158, recorded November 20th, 2019. I'm Michael Kennedy.
00:11 And I'm Brian Okken.
00:12 And this episode is brought to you by DigitalOcean. DigitalOcean's awesome. Check them out at
00:16 pythonbytes.fm/DigitalOcean. Tell you more about that later. But Brian, I find that Python
00:22 is making its way into all these different areas, not just traditional computer science or maybe
00:27 data science. Right. There's an article that I saw that's kind of interesting. I mean,
00:33 there's not a lot of details, but essentially it's saying that Python is replacing Excel in
00:40 banking and investing. The real title is Python already replaced Excel in banking. But we've got
00:48 some interesting quotes from here. So I'm just going to read it out. This is from the article.
00:52 If you wanted to prove your mettle as an entry level banker or trader, it used to be the case that you
00:57 had to know all about financial modeling in Excel. Not anymore. These days, it's all about Python,
01:03 especially on the trading floor. And it goes on to talk about how a lot of different modeling that
01:09 used to be done in smaller cases in Excel, but it would take like a few minutes to run the Excel
01:16 modifications and analysis. Now they can do even like way more data and have it done in like a second
01:24 or two. So it does, it doesn't make sense when in cases where split second decisions are change,
01:30 how you react to the market that you'd want to have speed and ease. So Python makes sense to me.
01:36 Yeah, that's really interesting. I'm sure it's using a lot of the data science stuff like NumPy and whatnot
01:40 to make that fast deep down below. The whole trading, the algorithmic trading, high speed training,
01:46 all that kind of stuff. The latency that those folks care about is crazy, right? Like if you could get it from
01:51 four milliseconds to three milliseconds, we'd really appreciate that, right? And they'll actually like rent
01:56 servers that are nearly co-located to the stock market to reduce the actual latency or set up alternate
02:02 direct connections over microwaves. There's all kinds of crazy stuff. And so if you can go from minutes to seconds,
02:07 that already seems like it would make a big difference to these folks.
02:10 Yeah. And also being able to go to from minutes to seconds and while incorporating more data.
02:16 Yeah. Super, super cool.
02:17 I'm imagining like walking through the trading floor and seeing some, some guy in a hoodie sitting with
02:23 a laptop on the floor. I mean, like, I don't understand this, but yeah, whatever.
02:28 Five years ago, that person would have been arrested. Now people are like, Hey, I need some help,
02:32 man. Can you give me some advice on this trade?
02:34 Yeah.
02:35 I have a little personal experience with this Python replacing Excel and banking and trading.
02:39 Can't talk about the details, but I did teach a class through a bunch of folks working on the
02:44 European stock market and they actually couldn't even take the class during the day because they
02:50 had to be there for a while. The market was open. So we had the class in the evening for a week over
02:55 there and they were all really into learning Python because they had been trying to analyze how their
03:00 day went and do this kind of analysis that you're talking about in Excel. And they're just like,
03:03 we can't do this anymore. We have to get like better tools. And Python was the answer for them as well.
03:07 Pretty cool.
03:08 Oh, that's great. Interesting.
03:10 Yeah. Another thing that I think is really, really good news is something that GitHub just announced.
03:16 GitHub has announced a ton of things. While you were not with us last week when we recorded in Florida,
03:21 we talked about how GitHub has added code navigation to all the source code there, much of the source code.
03:29 you go in there and like click on functions and classes and say, go to definition and Python. And
03:34 that's pretty awesome. So give it a week and GitHub launches security lab to help secure the open source
03:41 ecosystem. Wow.
03:42 So you've probably heard about bug bounties and like these bounties paid out to security
03:47 researchers before, I would guess. Yeah. Yeah. So it's pretty much like that is my understanding of
03:53 it. So it's like a bug bounty program to go and find bugs in open source libraries. But what's kind
04:00 of cool is it seems like the folks like paying out that money are not the open source projects,
04:06 right? Like Apple might pay out a huge amount of money, like a hundred thousand dollars for finding
04:11 a big vulnerability in iOS or Microsoft might or whoever, but who's going to pay to find that
04:19 security bug in Flask or wherever it is. Right.
04:22 All right.
04:23 It seems like that this is to pay for those types of things. So it says organizations as well as
04:30 individual security researchers can join a bug bounty program with rewards of up to $3,000 is available
04:35 to compensate bug hunters for the time they put into searching for vulnerabilities in open source
04:40 projects. Oh, that's neat. Cool, right? Yeah. Yeah. So apparently this has been in beta since
04:44 for a little while. When was it exactly? A little while, not very long. Anyway, the founding members
04:50 who were part of it have already found, reported, and helped fix more than a hundred security flaws
04:55 already across the open source ecosystem. That's pretty cool. Another thing that's interesting is the bug
05:01 report in order to count must contain a code QL, like SQL, but code QL or something. I don't know.
05:11 Code QL, which is an open source tool that GitHub released at the same time. Remember we talked about
05:18 there's semantic code analysis engine. And what it does is basically this is a query that runs against
05:23 source code that will uncover the vulnerabilities in dependent projects.
05:30 Okay. So if I find a bug in Flask, I don't know there is one, but let's just say I just pick a
05:34 random project. I find a bug in Flask and I submit this, I submit a query to GitHub so that they can go
05:39 find all the projects that depend on Flask that have out of date versions of Flask that need to also
05:45 subsequently receive warnings to get their stuff updated.
05:48 So do they then notify all these, the other maintainers or?
05:51 Yes. So if you look at that article, there's like some screenshots of what it gets. So they will get,
05:57 the actual project will get an automated pull request that fixes the security vulnerability.
06:03 Maybe it bumps the requirements pinned version to something where it's fixed or something, right? It gets the
06:09 PR to automatically fix it. And then there's also a button where they can publish an advisory
06:16 out to, from that repository to dependent repositories. And they could also request a CVE,
06:22 which is like a vulnerability official number to be recognized as an actual issue. So GitHub became,
06:30 what was the term they used? A CVE numbering authority, a CMA, of course, to, so that they can actually
06:38 issue these vulnerability numbers to be understood and like referenced as unique IDs across the security
06:45 landscape. Interesting.
06:47 Yeah. So all this stuff is integrated into GitHub. So GitHub, the researchers find the issue in the
06:51 main project. The main project gets a PR. The main project can then also push out these warnings to
06:58 other folks and request CVEs for their projects. That's pretty cool, right?
07:02 Yeah. Open source is growing up.
07:03 Yeah, it totally is. And it seems like it's, it's pretty solid for, for all the folks working on it.
07:10 It doesn't seem like it requires much of the maintainers. It's more like there's this bug bounty program,
07:14 from what I can tell. And also they threw in there right at the end of this. GitHub also updated the
07:20 token scanning and in-house service that scans for like API keys, like AWS access keys or whatever
07:29 that have been accidentally left inside of source code.
07:31 Oh, that's good. Yeah.
07:32 That's really good.
07:33 Yeah. It'd be pretty nice to like, you probably didn't mean this. Click this button to make this go.
07:38 Anyway, I think this is really cool. I think this is like, this is just plumbing to make open source
07:44 more secure. And I like that.
07:45 Yeah. And also just to, to be able to say, to have companies put money at open source projects to keep
07:51 them fixed. And it's not necessarily trying to get the, maintain the official maintainer to do it,
07:57 but to have some incentive for, for everybody else to watch these things. So that's great.
08:04 Absolutely. Yeah. These bug bounty programs have been working really well for the industry and it's cool to see
08:10 GitHub putting that in there. Also cool is digital ocean, not just for sponsoring the show, but because
08:15 they have awesome infrastructure and awesome product and we use them for our stuff. So let me tell you
08:19 about a new thing that they have generally available memory optimized droplets. And if you have a memory
08:26 heavy workload, basically this is the best way to get tons of memory in a droplet or a virtual machine.
08:33 So you can get eight gigs of Ram for each dedicated CPU. And then it goes from two CPUs all the way up to
08:41 enough to get you 256 gigs of Ram, whatever that math works out to be. And it's really good for like
08:47 high memory applications, like high performance SQL or no SQL databases and memory caches like Redis or
08:52 indexes, some kind of large data analysis runtime, something like that. So check those out at
08:58 pythonbytes.fm/digital ocean, really good stuff over there. Lots of cool things coming.
09:03 Brian, what you got next for us?
09:05 Well, we have a couple of friends of ours, Bob Belderbos and Julian Sequeira. They run a thing
09:12 called PyBytes and PyBytes challenges, not affiliated with Python bytes, just sounds similar.
09:18 It's the I versus the Y it's not even close to the same thing.
09:21 It's P Y B it I T dot dot. Yes. Anyway, I enjoy it. It's a challenges platform where you can just
09:31 sort of, there's a few of them for free, but it is a paid service that they give. It's
09:36 one of those things where you, they give you an, like kind of a written assignment and some test code
09:41 already there. And it checks to see, and then you have to fill in like the body of a function to make
09:47 all the test pass. It's a kind of a brain teaser sort of thing. It's a fun way to keep up, make sure
09:52 that you're practicing out of the box Python stuff that you don't normally do. That's what I use it
09:58 for. But the news is they just added test coverage. So, or tests testing. So in the past you were,
10:05 you didn't write the tests. They wrote them to evaluate your code, but they've added, a few test
10:11 challenges where they write the code and you have to write the test code to check that code.
10:15 And it's kind of cool, but they were, they actually talked to me about this as well as to
10:20 try to pick my ideas, but they came up with it on their own. How do you evaluate if the test code is
10:26 good? So if you, you evaluate if your source code is good by running tests, but the other way around
10:31 is a little difficult. Yeah. How do you test the tests? Yeah. So they did it a couple of ways.
10:36 they're using coverage up high to make sure that you're hitting a hundred percent coverage and,
10:41 you know, yes, it's debatable as for a large project of whether you should get a hundred percent
10:45 coverage, but for a small function or some small bit of code, it should, you should be able to hit
10:50 a hundred percent coverage. That's a nice thing. The other one is mutation testing. So there's a couple
10:56 projects we've heard of mut mut and mut pie M U T P Y. And, I think we talked about this earlier,
11:05 but, Ned Batchelder did write an article about his experience with mut mut, but,
11:12 PyBytes is using mut pie. And what it does is it takes your, the source code and changes something
11:18 about it. And mut pie works at the level of the, abstract syntax tree. And it changes like,
11:25 for instance, a division operator to a multiplication or, or changes a string to some other string or
11:31 something. And then it runs the tests again. And the idea is you want your test
11:35 to be able to, it makes a whole bunch of mutants of the code and you want the tests to be able to
11:40 kill off all the mutants, except for the original. That's how they're testing. It's kind of a neat
11:46 idea, but it's fun to play with. It is an interesting question to ask. How do you test the test? And I
11:52 think this is pretty creative. well done, Bob and Julian. I haven't used a mutation testing a lot.
11:57 I've tried it out, but I haven't used it like for projects. The idea of using it in a training
12:02 situation is a novel thing I haven't heard of before. And I think that's a cool idea to be able
12:08 to, to try to test somebody's, test code. Yeah, I agree. And like you said, a hundred percent code
12:14 coverage for a project that's real is challenging. I think also maybe a mutation testing for a project
12:19 that's real tricky because maybe it changes like, you know, the print statement that shows what the title
12:24 the app is and who cares? Like no one's going to check for that. Right. Right. But in this case,
12:28 where pretty much it's a very small focus bit of code and you're supposed to test it, like
12:32 presumably any changes to that are going to appear in the couple of tests. You're right. Yep. Nice.
12:38 Now, speaking of tests, I feel like I stole this one from you, Brian, just out of the universe. I mean,
12:42 so I want to talk about pi HTTP test. So this one comes from Florian Dallas or Dallas, sorry. And,
12:52 he actually sent in two things for this week, which they were both excellent. So I'm going to cover
12:57 them. And this is a command line tool for HTTP tests against restful APIs. Okay. All right. So
13:03 the idea is basically I want to test some restful endpoint and instead of going over and say, okay,
13:10 I'm going to create, I'm going to get requests. I'm going to do a get, I'm going to get the dictionary.
13:13 I'm going to verify like this thing is in the dictionary and so on. What you basically do is you just
13:18 write a simple little JSON document for each test that you want to run. Oh, cool. Yeah. So then it
13:25 has things like, what is the name of the test? What HTTP verb do you want to use? What is the URL
13:30 combination between host and endpoint? The headers you need to pass, a query string you need to pass,
13:35 and then you get back a report. It actually gives you a cool report in a like column or style validation
13:41 that lets you assert things about it. Yeah. There's a handful of these types of things. And I think it's kind of
13:46 neat way to describe API testing. Yeah. It seems really cool. There's a bunch of neat little libraries
13:52 that are used as well, like tabulate, which is a cool way to print the tabular data that they're
13:57 showing there and things like that. But yeah, I like this project. If your job is to test a bunch of
14:03 HTTP endpoints, you know, this is pretty cool. Yeah. Neat. Nice. All right. What else? What's next?
14:08 Oh, next. X-Ray. This was suggested by a listener. I think it's Guido Imperial.
14:16 Yep. I agree. Thanks, Guido. Sent it in. We haven't covered it before. And actually,
14:21 I didn't know about it before. People in the data science community probably do because it seems like
14:25 pretty powerful. But the gist of it is it's built, it uses and builds on top of NumPy and Pandas and
14:33 ask to offer in-dimensional arrays. You can do in-dimensional arrays in Pandas already,
14:40 I believe. But one of the neat things about these is that they've got labels on them. So
14:47 they're self-describing and they've got indexes. There's a few data types within it. There's a data,
14:52 so there's X-Ray data array. The data array is the in-dimensional array, but it has metadata,
14:59 like names and labels for the dimensions. And you can also have coordinates and attributes. And
15:06 coordinates are essentially like the tick elements for the different axes. And then attributes,
15:13 the data array doesn't really do anything with the attributes, but it's a way to consistently keep data
15:20 with data. So if you have to keep track of some extra things like, you know, where was this data
15:26 collected or really anything you can, you can add them as an attribute. And then a data set is a
15:33 dictionary-like collection of data array elements. I was playing with this and it's, it's pretty darn
15:39 cool. The, one of the things, nice things about using it is just keeping all of that, the dimension
15:45 names together. So if you have a multi-dimensional array, even just like a three-dimensional array,
15:51 it's sometimes hard to keep track of, you know, which axes is which, and this is all together. But it's not
15:58 just packaged together. You can also do things like use the label names and the axie names, and even axie
16:06 elements at the coordinates, they don't actually need to be numbers. For instance, you could have like the
16:12 months of the, months of the year or, or the letters of the alphabet be coordinates. You can use those as
16:18 selectors to be able to select rows and columns and those return different data array elements. The data
16:25 array elements also can be used in algorithms. They can just be passed directly to pandas algorithms.
16:29 So these are pretty cool. Yeah. It looks a little bit like it's taken some of the features from NumPy,
16:34 some of the features from pandas, some of the features from Dask, and sort of brings them together
16:39 into one package. So when I was going through some of the tutorials, I was to get somebody to talk about
16:45 this. It was like a three-dimensional array in, I think it's in pandas, is used to be, is considered
16:52 a panel. But when I went to look at the panel information, it looks like panels are being
16:57 deprecated for something else. So even in the pandas documentation, it was pointing to this x-ray
17:03 project. So... Oh, interesting. I think the people in the pandas community are definitely familiar with it.
17:08 But if you're using pandas kind of on the side, and you're not really in it all the time,
17:12 this might be helpful. Now, previously you spoke about Bob Bilderbos, and I said we got this item
17:17 from Florian Dalits. I'm going to bring those two things together in this next one. So...
17:22 Okay.
17:22 Bob had introduced us to Carbon. Remember that?
17:26 Yeah.
17:26 Carbon is like screen, sort of beautiful screenshots for colored code, right? Code, it's like a mock
17:32 faux little shell or whatever editor. Like, you don't use screenshots of real editors. You just
17:39 create that with carbon at carbon.now.sh. And that's cool, but those are generally static.
17:44 So Florian sent in this thing called term to SVG.
17:49 And it's a cool way to create animated terminal GIFs. So instead of going all the way to create, like, full-on screencasts of your
17:58 screen, you can run this in your terminal, and then you just do whatever you want to do in the terminal, and it captures it
18:05 perfectly into SVG, and then you get... convert that out to some kind of animated thing. Like, I guess the SVG itself is animated,
18:14 so you just show that in the browser or wherever you want to put it. Isn't that cool?
18:18 Yeah.
18:19 Very cool.
18:20 You basically just type term to SVG. Once you have it installed and it starts recording, you do a bunch of
18:24 stuff, and then there's a way to get out of its recording status. So it's pretty cool. It produces, like,
18:30 lightweight, clean-looking animations, or you can even do still frames if you want for, like, a project page.
18:37 So instead of, like, carbon is cool because I can put in the text and the code I want to show up, but maybe it
18:42 doesn't have, here is what the progress bar, and then the install steps with the spinner look like.
18:49 Right? It doesn't naturally capture what actually happens when that code or those terminal commands
18:55 execute.
18:56 So this file, it has color themes, animation controls, all sorts of good stuff. And yeah, it's pretty cool.
19:03 So there's a... probably if you want to... if this sounds interesting, you want to check out the examples.
19:08 So there's a whole page of examples, and there's a bunch of different stuff happening. You can just look
19:12 through there. And I think there's also templates that configure how it records and stuff. So there's a
19:18 bunch of predefined templates that you can go play with to get started from.
19:20 That'd be really cool for, like, a tutorial site or something to...
19:23 Yes, exactly.
19:24 Yeah.
19:25 Or even... or if you have a project, right? Like, if you're the maintainer of pipx,
19:29 it'd be cool to use this to create a way to, like, show how awesome pipx is. Like, this step,
19:34 then this step, and then boom. Right? And just put that right in your GitHub readme.
19:37 Yeah, I love it when there's little animated things in the readme. So when you go to GitHub,
19:42 you just see that.
19:43 Yeah. You and I, we spend an inordinate amount of time jumping into new projects and going,
19:49 is it interesting? Yes or no? Why is it interesting, right?
19:53 And this kind of stuff is the thing that just goes, after 10 seconds, I knew I wanted to learn
19:58 about it, right? It really makes a difference, and it's easy.
20:00 Yeah. Yeah. Very cool. Definitely check this out.
20:03 Yeah, for sure. All right. Yeah. So that's a good one. People can check that out,
20:06 Term2SVG. Pretty cool. All right. Well, that's it for our main items. What else you got?
20:11 I have one bit of extra news, is that pytest 5.3.0 was released the other day. And it is mostly,
20:20 there's some cool features. And if you, you know, pytest nerds, definitely check it out. But I wanted
20:25 to bring it up because I think a lot of people that just use pytest and are using it with continuous
20:29 integration systems should pay attention to this because the JUnit XML output, they've changed the
20:36 default. So the default format, there's an XML output has, there's an old version and a new version.
20:44 The new version has some more information, but they wanted to make sure that people know about
20:48 this. So if you run it, you'll get a warning and it's not really a warning. It just says,
20:52 it's just to make you aware that there's a particular format that's being deprecated.
20:57 So eventually in the 5.4 release, they won't support the old format. So if you see this,
21:03 encourage anybody using pytest and continuous integration to read the change log and understand
21:10 what's going on and make sure they're ready to either pin pytest or change their system.
21:15 Yeah. It's a good thing to put on people's radar for sure.
21:17 Okay. How about you, Michael? Any extra spits?
21:21 Yeah, I got a bunch for you. Actually, a couple of things. PyCon. PyCon's awesome. We love that
21:27 each year. And this year it's going to be in Pittsburgh for the first of its two years in that
21:32 city. And PyCon registration is now open. You can go and register, get your ticket before it
21:38 sells out. Oh, cool. Yeah. That comes to us from Jacqueline Wilson. So thank you very much
21:42 for sending that in. And then also I saw, I can't remember where I saw this somewhere. Actually,
21:47 I think somewhere funky, like a flip board or something. So Facebook has now decided that
21:53 Microsoft's Visual Studio Code is their default development platform. That's a little surprising
21:58 to me. Yeah. Interesting. Yeah. That's an article on ZDNet. And they're also helping Microsoft
22:03 improve the remote development experience in VS Code. Cats, dogs, all live in the same place.
22:10 Okay. Yeah. This is cool. I suspect that things like Vim and Emacs and stuff probably have a strong
22:17 representation there. But apparently it's all about Visual Studio Code over there now.
22:21 Anything else?
22:21 Yes. Two more things. Very exciting. So if the release schedule lines up correctly in the future,
22:28 extends as I expect it, this should be Wednesday before Thanksgiving, right? And that would mean
22:36 the day or two after that is going to be Black Friday. So I just want to point out that Talk Python
22:42 training is going to have a really awesome Black Friday sale. Get a whole bunch of stuff on buying
22:48 all of the courses, but also we're doing some special things to support the PSF and other stuff,
22:54 some surprises in there that I suspect people won't guess at. And there's no way people are going to
22:59 guess what is there. So check it out over at training.talkpython.fm. But you got to act right
23:04 away because it's only going to be there for like four days. It's a big deal. So check that out. And also,
23:08 we have a new course coming, Python for the .NET developer. So, so many people are coming from C#
23:15 and the .NET world over into the Python space. I thought it would be cool to create a course that
23:19 kind of gives them a big hug and holds their hand and helps them step over that divide. So it's like,
23:25 do you know about ASP.NET? Here's Flask. And here's how you use it in Python. Do you know about any
23:31 framework? Here's SeqWalchemy. Here's how you use it in Python. Like all the things that they need or
23:36 they love from C# and .NET. Here's the Python equivalent and why it's awesome and how it works.
23:41 Is that one that you did or did somebody else do that?
23:44 No, no, I did that one.
23:45 Because you're like the perfect person for that.
23:46 Exactly. I spent so many years doing C# and now I'm all about Python. So exactly. I figured
23:50 like, why don't I try to think back to the way it was for me many years ago and like sort of extend
23:56 that experience back to other people. It's probably not going to be out yet. It may be out at the time
24:01 that people hear this, but it's coming really soon. So I'll just put it out there as that.
24:04 That's nice. Hey, speaking of Black Friday, I do not have any insider knowledge,
24:09 but Pragmatic Publishers often does a Black Friday sale too. It's usually fairly steep. So if you've
24:16 not picked up the pytest book yet, and really, if you're listening to this and you haven't read it yet,
24:22 what's going on?
24:22 Come on.
24:23 If you haven't, maybe check out pragfrog.com and see if there's a sale.
24:29 Definitely. I'm sure there will be. It would be surprising.
24:31 Yep.
24:31 There weren't. Awesome.
24:32 How about a joke or two or three?
24:34 I like three jokes.
24:35 Okay.
24:35 It's a good number. So this one, first one is more of just a geeky STEM type of joke,
24:41 but I think people will like it. So I love soda drinks, you know, Coca-Cola, Dr. Pepper,
24:47 root beer, things like that. So this one, I try to not drink too much, but I do like it. But here's
24:53 how that world can clash together with math. What do you get when you put root beer into a square
24:58 glass?
24:59 I don't know. What?
25:00 Beer.
25:01 I don't even get it, but it's funny.
25:05 If you take root of beer and you square it.
25:07 Oh, okay.
25:08 Okay.
25:08 Right? Like the square root of beer and then you put it in a square glass.
25:12 Okay. That was bad.
25:13 What's your next one here?
25:14 Okay. What do you call an optimistic front end developer?
25:17 I don't know. What do you call it?
25:19 A stack half full developer.
25:21 That is awesome.
25:22 Okay. Now I also, I was going to tell a version control joke, but they're only funny if you
25:29 get them.
25:29 Get G-I-D. Awesome. Those are both good. I like them. Yeah. Great.
25:34 Cool. Well, thanks again for having a nice conversation this week.
25:38 Yeah. You bet. Thanks as always. See you later, Brian.
25:40 Bye.
25:40 Thank you for listening to Python Bytes. Follow the show on Twitter via at Python Bytes. That's
25:45 Python Bytes as in B-Y-T-E-S. And get the full show notes at pythonbytes.fm. If you have a news
25:51 item you want featured, just visit pythonbytes.fm and send it our way. We're always on the lookout
25:56 for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you
26:01 for listening and sharing this podcast with your friends and colleagues.