Brought to you by DigitalOcean - grab your $50 credit and deploy your first project for free


« Return to show page

Transcript for Episode #158:
There's a bounty on your open-source bugs!

Recorded on Wednesday, Nov 20, 2019.

0:00 KENNEDY: Hello, and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds. This is Episode 158, recorded November 20th, 2019. I'm Michael Kennedy.

0:11 OKKEN: And I'm Brian Okken.

0:12 KENNEDY: And this episode is brought to you by Digital Ocean, Digital Ocean's awesome. Check them out at pythonbytes.fm/digitalocean, tell you more about that later. But Brian, I find that Python is making it's way into all these different areas, not just traditional computer science or maybe data science?

0:28 OKKEN: Right, there's an article that I saw that's kind of interesting, I mean there's not a lot of details, but essentially it's saying that Python is replacing Excel in banking and investing. The real title is, "Python Already Replaced Excel In Banking," but we've got some interesting quotes from here, so I'm just going to read it out. This is from the article, "If you wanted to prove your mettle as an entry-level banker or trader, it used to be the case that you had to know all about financial modeling in Excel, not any more. These days it's all about Python, especially on the trading floor." And it goes on to talk about how a lot of different modeling that used to be done in smaller cases in Excel, but it would take like a few minutes to run the Excel modifications and analysis, now they can do even way more data and have it done in like a second or two. So it does make sense in cases where split-second decisions change how you react to the market, that you'd want to have speed and ease. So Python makes sense to me.

1:36 KENNEDY: Yeah, that's really interesting, I'm sure it's used in a lot of the data science stuff like NumPy and whatnot, to make that fast. Deep down below, the whole trading, algorithmic trading, high speed trading, all that kind of stuff, the latency that those folks care about is crazy, right? Like if you could get it from four milliseconds to three milliseconds, we'd really appreciate that, right? And they'll actually rent servers that are nearly co-located to the stock market to reduce the actual latency, or set up alternate direct connections over microwaves, there's all kinds of crazy stuff, and so if you can go from minutes to seconds, that already seems like it would make a big difference to these folks.

2:11 OKKEN: Yeah, and also being able to go from minutes to seconds and while incorporating more data.

2:16 KENNEDY: Yeah. Super cool.

2:17 OKKEN: I'm imagining like walking through the trading floor and seeing some guy in a hoodie sitting with a laptop on the floor, I mean like, I don't understand this but, you know. Whatever.

2:28 KENNEDY: Five years ago that person would have been arrested, now people are like, "Hey, I need some help man, can you give me some advice on this trade?"

2:34 OKKEN: Yeah!

2:35 KENNEDY: I have a little personal experience with this, Python replacing Excel in banking and trading. I can't talk about the details, but I did teach a class to a bunch of folks working on the European stock market, and they actually couldn't even take the class during the day, because they had to be there while the market was open. So we had the class in the evening for a week over there, and they were all really into learning Python because they had been trying to analyze how their day went and do this kind of analysis that you're talking about in Excel, and they were just like, "We can't do this anymore, we have to get better tools." And Python was the answer for them as well. Pretty cool.

3:09 OKKEN: Oh, that's great. Interesting.

3:10 KENNEDY: Yeah. Another thing that I think is really, really good news, is something that GitHub just announced. GitHub has announced a ton of things while you were not with us last week, when we recorded in Florida, we talked about how GitHub has added code navigation to all of the source code there, much of the source code you go in there and click on functions and classes, and say, "Go to definition in Python." and that's pretty awesome. So give it a week and GitHub launches Security Lab, to help secure the open source ecosystem.

3:42 OKKEN: Wow.

3:43 KENNEDY: So you've probably heard about bug bounties and these bounties paid out to security researchers before, I would guess? Yeah, so it's pretty much like that, is my understanding of it. So, it's like a bug bounty program to go and find bugs in open source libraries, but what's kind of cool is it seems like the folks paying out that money are not the open source projects. Like, Apple might pay out a huge amount of money, like $100,000 for finding a big vulnerability in iOS, or Microsoft might, or whoever, but who's going to pay to find that security bug in Flask, or wherever it is, right?

4:20 OKKEN: Oh right, yeah.

4:23 KENNEDY: So it seems like that this is to pay for those types of things. So it says, "Organizations as well as individual security researchers can join, a bug bounty program with rewards of up to $3,000 is available to compensate bug hunters for the time they put into searching for vulnerabilities in open source projects."

4:40 OKKEN: Oh, that's neat.

4:41 KENNEDY: Cool right? Yeah, so apparently this has been in beta since, for a little while, when was it exactly, a little while, not very long, anyway the founding members who were part of it have already found, reported, and helped fix more than 100 security flaws already across the open source ecosystem. That's pretty cool. Another thing that's interesting is the bug report, in order to count, must contain a codeQL, like SQL, but codeQL or something? I don't know. CodeQL, which is an open source tool that GitHub released at the same time. Remember we talked about it, their semantic code analysis engine? And what it does is basically this is a query that runs against source code, that will uncover the vulnerabilities in dependent projects.

5:30 OKKEN: Oh, okay.

5:30 KENNEDY: So if I find a bug in Flask, I don't know if there is one, but let's just say, I'm just picking a random project. I find a bug in Flask, and I submit this, I submit a query to GitHub so that they can go find all the projects that depend on Flask that have out-of-date versions of Flask that need to also subsequently receive warnings to get their stuff updated.

5:48 OKKEN: So do they then notify all the other maintainers?

5:51 KENNEDY: Yes. So if you look at that article, there's some screenshots of what it gets. So they will get, the actual project will get an automated pull request that fixes the security vulnerability, maybe it bumps the requirements, pinned version to something where it's fixed, or something, right? It gets the PR to automatically fix it, and then there's also a button where they can publish an advisory out to, from that repository to dependent repositories. And they can also request a CVE, which is like a vulnerability official number to be recognized as an actual issue. So GitHub became, what was the term they used, a CVE-numbering authority. A CNA, of course. So that they can actually issue these vulnerability numbers to be understood and referenced as unique IDs across the security landscape.

6:46 OKKEN: Interesting.

6:47 KENNEDY: Yeah, so all this stuff is integrated into GitHub, so GitHub researchers find the issue in the main project, the main project gets a PR, the main project can then also push out these warnings to other folks and request CVEs for their projects. That's pretty cool, right?

7:02 OKKEN: Yeah, open source is growing up.

7:03 KENNEDY: Yeah, it totally is, and it seems like it's pretty solid for all the folks working on it. It doesn't seem like it requires much of the maintainers, it's more like there's this bug bounty program from what I can tell. And also they threw in there right at the end of this, GitHub also updated the token scanning, an in-house service that scans for like, API keys, AWS access keys, or whatever, that have been accidentally left inside a source code.

7:31 OKKEN: Oh, that's good. That's really good.

7:33 KENNEDY: Yeah, it'd be pretty nice to like, "Ah, you probably didn't mean this. Click this button to make this go away." Anyway, I think this is really cool, I think this is just plumbing to make open source more secure, and I like that.

7:45 OKKEN: Yeah and also just to be able to say, to have companies put money at open source projects to keep them fixed. And it's not necessarily trying to get the official maintainer to do it, but to have some incentive for everybody else to watch these things, so that's great.

8:04 KENNEDY: Absolutely. These bug bounty programs have been working really well for the industry, and it's cool to see GitHub putting that in there. Also cool is Digital Ocean, not just for sponsoring the show, but because they have awesome infrastructure and awesome product, and we use them for our stuff. So let me tell you about a new thing that they have generally available, memory optimized droplets. And if you have a memory-heavy workload, basically this is the best way to get tons of memory in a droplet or a virtual machine. So you can get eight gigs of RAM for each dedicated CPU, and then it goes from two CPUs, all the way up to enough to get you 256 gigs of RAM, whatever that math works out to be. And it's really good for high-memory applications like high performance SQL, or NoSQL databases, in-memory caches like Redis, or indexes, some kind of large data analysis run-time, something like that. So check those out at pythonbytes.fm/digitalocean, really good stuff over there, lots of cool things come in. Brian, what you got next for us?

9:05 OKKEN: Well, we have a couple of friends of ours, Bob Belderbos and Julian Sequeira, they run a thing called PyBites and PyBites challenges. Not affiliated with Python Bytes, just sounds similar.

9:18 KENNEDY: It's the I versus the Y, it's not even close to the same thing.

9:22 OKKEN: It's P-Y-B-I-T, dot E-S. Anyway. I enjoy it, it's a challenges platform where you can just sort of, there's a few of them for free, but it is a paid service, it's one of those things where they give you kind of a written assignment and some test code already there, and it checks to see, and then you have to fill in the body of a function to make all the tests pass. It's kind of a brain teaser sort of thing. It's a fun way to keep up, make sure that you're practicing out of the box Python stuff that you don't normally do, that's what I use it for. But the news is they just added test coverage, or testing, so in the past you didn't write the tests, they wrote them to evaluate your code, but they've added a few test challenges where they write the code and you have to write the test code to check that code. And it's kind of cool, but they actually talked to me about this as well as to try and pick my ideas, but they came up with it on their own. How do you evaluate if the test code is good? So you evaluate if your source code is good by running tests but the other way around is a little difficult.

10:32 KENNEDY: Yeah, how do you test the tests?

10:33 OKKEN: Yeah, so they did it a couple of ways. They're using coverage.py to make sure that you're hitting 100% coverage, and yes it's debatable as for a large project of whether you should get 100% coverage, but for a small function or some small bit of code you should be able to hit 100% coverage, that's a nice thing. The other one is mutation testing. So there's a couple of projects we've heard of, MutMut and MutPy, M-U-T-P-Y, and I think we talked about this earlier but Ned Batchelder did write an article about his experience with MutMut. But PyBites is using MutPy, and what it does is it takes your source code and changes something about it, and MutPy works at the level of the abstract syntax tree and it changes like, for instance, a division operator to a multiplication. Or it changes a string to some other string, or something, and then it runs the tests again. And the idea is, you want your tests to be able to, it makes a whole bunch of mutants of the code, and you want the tests to be able to kill off all the mutants, except for the original. That's how they're testing it. It's kind of a neat idea, but it's fun to play with.

11:48 KENNEDY: It is an interesting question to ask, how do you test the test? And I think that's pretty creative. Well done Bob and Julian.

11:54 OKKEN: I haven't used mutation testing a lot, I've tried it out but I haven't used it for projects. The idea of using it in a training situation is a novel thing I haven't heard of before and I think that's a cool idea, to be able to try to test somebody's test code.

12:11 KENNEDY: Yeah I agree, and like you said, 100% code coverage for a project that's real, that's challenging, I think also maybe mutation testing for a project that's real tricky, because maybe it changes, I don't know, the print statement that shows what the title of the app is, and who cares? No one's going to check for that, right?

12:27 OKKEN: Right.

12:28 KENNEDY: But in this case, where pretty much it's a very small focused bit of code, and you're supposed to test it, presumably any changes to that are going to appear in the couple of tests you write.

12:37 OKKEN: Yep.

12:38 KENNEDY: Nice, now speaking of tests, I feel like I stole this one from you Brian, just out of the universe I mean. So I want to talk about pyhttptest, now this one comes from Florian Dahlitz, and he actually sent in two things for this week, which they were both excellent so I'm going to cover them. This is a command line tool for HTTP tests against RESTful APIs. Alright? So the idea is basically I want to test some RESTful endpoint, and instead of going over, say okay, I'm going to get requests, I'm going to do a GET, I'm going to get the dictionary, I'm going to verify this thing is in the dictionary, and so on, what you basically do is you just write a simple little JSON document for each test that you want to run.

13:23 OKKEN: Oh cool.

13:24 KENNEDY: Yeah, so then it has things like, "What is the name of the test? What HTTP verb do you want to use? What is the URL? Combination between host and endpoint, the headers you need to pass, and a query string you need to pass." and then you get back a report, it actually gives you a cool report in a columnar-style validation that lets you assert things about it.

13:43 OKKEN: Yeah, there's a handful of these types of things and I think it's kind of a neat way to describe API testing.

13:49 KENNEDY: Yeah, it seems really cool. There's a bunch of neat little libraries that are used as well, like Tabulate, which is a cool way to print the tabular data that they're showing there, and things like that. But yeah, I like this project. If your job is to test a bunch of HTTP endpoints, this is pretty cool.

14:06 OKKEN: Yeah, neat.

14:07 KENNEDY: Nice. Alright, what else? What's next?

14:09 OKKEN: Oh next, xarray. This was suggested by a listener, I think it's Guido Imperiale?

14:17 KENNEDY: Yep, I agree. Thanks Gido.

14:18 OKKEN: Yeah. Sent it in, we haven't covered it before and actually I didn't know about it before, people in the data science community probably do because it seems pretty powerful, but the gist of it is, it uses and builds on top of NumPy and pandas and Dask, to offer N-dimensional arrays. You can do N-dimensional arrays in pandas already, I believe, but one of the neat things about these is that they've got labels on them, so they're self-describing and they've got indexes. There's a few data types within it, there's xarray data array, the data array is the n-dimensional array, but it has metadata like names and labels for the dimensions. And you can also have coordinates and attributes, and coordinates are essentially like the tick elements for the different axes. And then the attributes, the data array doesn't really do anything with the attributes but it's a way to consistently keep data with data, so if you have to keep track of some extra things like, you know, where was this data collected or really anything, you can add them as an attribute. And then a data set is a dictionary-like collection of data array elements. I was playing with this and it's pretty darn cool, one of the nice things about using it is just keeping all of that, the dimension names together, so if you have a multi-dimensional array, even just like a three-dimensional array, it's sometimes hard to keep track of which axis is which, and this is all together. But it's not just packaged together, you can also do things like use the label names and the axis names, and even axis elements, the coordinates, they don't actually need to be numbers. For instance, you could have the months of the year or the letters of the alphabet be coordinates. You can use those as selectors to be able to select rows and columns, and those return different data array elements. The data array elements also can be used in algorithms, they can just be passed directly to pandas algorithms, so these are pretty cool.

16:31 KENNEDY: Yeah, it looks a little bit like it's taken some of the features from NumPy, some of the features from pandas, some of the features from Dask, and sort of brings them together into one package.

16:40 OKKEN: So when I was going through some of the tutorials, I was going to get somebody to talk about this, it was like a three-dimensional array, I think it's in pandas, used to be, is considered a panel, but when I went to look at the panel information it looks like panels are being deprecated for something else. So even in the pandas documentation, it was pointing to this xarray project.

17:04 KENNEDY: Oh, interesting.

17:05 OKKEN: So I think the people in the pandas community are definitely familiar with it, but if you're using pandas kind of on the side and you're not really in it all the time, this might be helpful.

17:13 KENNEDY: Now, previously you spoke about Bob Belderbos, and I said we got this item from Florian Dahlitz, I'm going to bring those two things together in this next one, so Bob had introduced us to Carbon, remember that?

17:26 OKKEN: Yeah!

17:26 KENNEDY: Carbon is sort of beautiful screen shots for colored code, right? It's like a mock, faux little shell or whatever editor, like you don't use screen shots of real editors, you just create that with Carbon at carbon.now.sh. And that's cool, but those are generally static, so Florian sent in this thing called termtosvg, and it's a cool way to create animated terminal gifs. So instead of going all the way to create full-on screencasts of your screen, you can run this in your terminal and then you just do whatever you want to do in the terminal and it captures it perfectly into SVG, and then you convert that out to some kind of animated thing. I guess the SVG itself is animated, so you just show that in the browser, or wherever you want to put it. Isn't that cool?

18:19 OKKEN: Yeah, very cool.

18:20 KENNEDY: You basically just type termtosvg, once you have it installed and it starts recording, you can do a bunch of stuff and then there's a way to get out of it's recording status. So that's pretty cool, it produces lightweight, clean-looking animations or you can even do still frames if you want, for a project page. So instead of, like Carbon is cool because I can put in the text and the code I want to show up, but maybe it doesn't have, "Here is what the progress bar, and then the install steps with the spinner look like." right? It doesn't naturally capture what actually happens when that code or those terminal commands execute. So this will, it has color themes, animation controls, all sorts of good stuff, and yeah it's pretty cool. So there's probably if you want, if this sounds interesting, you want to check out the examples. So there's a whole page of examples and there's a bunch of different stuff happening, you can just look through there. And I think there's also templates that configure how it records and stuff, so there's a bunch of predefined templates that you can go play with to get started from.

19:20 OKKEN: That'd be really cool for like a tutorial site or something.

19:23 KENNEDY: Yes, exactly. Or even if you have a project, like if you're the maintainer of pipx it would be cool to use this to create a way to show how awesome pipx is. Like this step, then this step, and then boom, right? Just put that right in your GitHub README.

19:37 OKKEN: Yeah, I love it when there's little animated things in the README. So when you go to GitHub you just see that.

19:43 KENNEDY: Yeah, you and I, we spend an inordinate amount of time jumping into new projects and going, "Is it interesting? Yes or no. Why is it interesting?" Right? And this kind of stuff is the thing that just goes, after ten seconds I knew I wanted to learn about it, right? It really makes a difference, and it's easy.

20:01 OKKEN: Yeah. Yeah, very cool, I'll definitely check this out.

20:03 KENNEDY: Yeah, for sure. All right yeah, so that's a good one, people can check that out at termtosvg. Cool, all right, well that's it for our main items, what else you got?

20:11 OKKEN: I have one bit of extra news, is that pytest 5.3.0 was released the other day and it is mostly, there's some cool features, and if you're pytest nerds definitely check it out, but I wanted to bring it up because I think a lot of people that just use pytest and are using it with continuous integration systems should pay attention to this, because the JUnit XML output, they've changed the default so the default format, there's an XML output, there's an old version and a new version, the new version has some more information. But they wanted to make sure that people know about this, so if you run it you'll get a warning, and it's not really a warning, it just says, it's just to make you aware that there's a particular format that's being deprecated. So eventually in the 5.4 release, they won't support the old format. So if you see this, encourage anybody using pytest and continued integration to read the change log and understand what's going on, and make sure they're ready to either pin pytest or change their system.

21:15 KENNEDY: Yeah, that's a good thing to put on people's radar for sure.

21:18 OKKEN: Okay, how about you, Michael? Any extra spits?

21:21 KENNEDY: Yeah, I got a bunch for you actually. Couple of things, PyCon, PyCon's awesome, we love that each year, and this year it's going to be in Pittsburgh for the first of it's two years, in that city, and PyCon registration is now open. You can go and register, get your ticket before it sells out.

21:38 OKKEN: Oh cool.

21:39 KENNEDY: Yeah. That comes to us from Jacqueline Wilson, so thank you very much for sending that in, and then also I saw, I can't even remember where I saw this, actually I think somewhere funky like Flipboard or something, So FaceBook has now decided that Microsoft's Visual Studio Code is their default development platform, that's a little surprising to me.

21:58 OKKEN: Yeah, interesting.

21:59 KENNEDY: Yeah, there's an article on zdnet, and they're also helping Microsoft improve their remote development experience in VS Code. Cats, dogs, all living in the same place. Yeah, this is cool, I suspect that things like Vim and Emacs and stuff probably have a strong representation there, but apparently it's all about Visual Studio Code over there now.

22:21 OKKEN: Anything else?

22:22 KENNEDY: Yes, two more things, very exciting. So if the release schedule lines up correctly and the future extends as I expect it, this should be Wednesday before Thanksgiving, right? And that would mean a day or two after that is going to be Black Friday. So I just wanted to point out that Talk Python Training is going to have a really awesome Black Friday sale, you get a whole bunch of stuff on buying all of the courses but also we're doing some special things to support the PSF and other stuff, some surprises in there that I suspect people won't guess at. There's no way people are going to guess what is there, so check it out over at training.talkpython.fm, but you got to act right away because it's only going to be there for like four days, it's a big deal. So, check that out, and also we have a new course coming, Python for the .NET Developer. So many people are coming from C# and the .NET world over into the Python space, I thought it would be cool to create a course that kind of gives them a big hug and holds their hand and helps them step over that divide, so it's like, "Do you know about ASP.NET? Here's Flask, and here's how you use it in Python. Do you know about Entity Framework? Here's SQLAlchemy, here's how you use it in Python."

23:34 OKKEN: Oh nice.

23:35 KENNEDY: Like all the things they think that they need or they love from C# and .NET, here's the Python equivalent and why it's awesome and how it works.

23:41 OKKEN: Is that one that you did, or did somebody else do that?

23:43 KENNEDY: Yeah. Nah, nah, I did that one.

23:45 OKKEN: Cause you're like the perfect person for that.

23:46 KENNEDY: Exactly, I spent so many years doing C#, and now I'm all about Python. So exactly, I figured like, why don't I try to think back to the way it was for me many years ago, and sort of extend that experience back to other people? It's probably not going to be out yet, it may be out at the time that people hear this, but it's coming really soon so I'll just put it out there as that.

24:04 OKKEN: That's nice. Hey speaking of Black Friday, I do not have any insider knowledge but Pragmatic Publishers often does a Black Friday sale too, it's usually fairly steep so if you've not picked up the pytest book yet, and really if you're listening to this and you haven't read it yet, what's going on?

24:22 KENNEDY: Come on.

24:23 OKKEN: If you haven't, maybe check out pragprog.com and see if there's a sale.

24:29 KENNEDY: Definitely, I'm sure there will be. It would be surprising if there weren't. Awesome!

24:33 OKKEN: How about a joke, or two, or three?

24:34 KENNEDY: I like three jokes. It's a good number. So this first one is more of just a geeky STEM type of joke, but I think people will like it. So I love soda drinks, you know, Coca-Cola, Dr. Pepper, root beer, things like that, so this one, I try to not drink too much but I do like it. But here's how that world can clash together with math. What do you get when you put root beer into a square glass?

24:59 OKKEN: I don't know, what?

25:00 KENNEDY: Beer.

25:01 OKKEN: Beer! I don't even get it, but it's funny.

25:05 KENNEDY: If you take the root of beer and you square it, right?

25:07 OKKEN: Oh, okay. Okay.

25:10 KENNEDY: Like the square root of beer, and then you put it in a square glass.

25:12 OKKEN: Okay, that was bad.

25:13 KENNEDY: What's your next one here?

25:14 OKKEN: Okay, what do you call an optimistic front-end developer?

25:18 KENNEDY: I don't know, what do you call him?

25:19 OKKEN: A stack-half-full developer.

25:21 KENNEDY: That is awesome.

25:23 OKKEN: Okay, now I also was going to tell a version control joke, but they're only funny if you git them.

25:30 KENNEDY: Git, G-I-T. Those were both good, I like them, yeah. Great.

25:35 OKKEN: Cool. Well thanks again for having a nice conversation this week.

25:38 KENNEDY: Yeah, you bet. Thanks as always. See you later, Brian.

25:40 OKKEN: Bye.

25:41 KENNEDY: Thank you for listening to Python Bytes. Follow the show on Twitter via @PythonBytes, that's Python Bytes as in B-Y-T-E-S, and get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way, we're always on the look-out for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page