Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #158: There's a bounty on your open-source bugs!

Return to episode page view on github
Recorded on Wednesday, Nov 20, 2019.

00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds. This is episode 158 recorded November 20th, 2019. I'm Michael Kennedy. And I'm Brian Okken. And this episode is brought to you by DigitalOcean.

00:14 DigitalOcean is awesome. Check them out at pythonbytes.fm/digitalocean.

00:18 Tell you more about that later. But Brian, I find that Python is making its way into all these different areas, not just traditional computer science or maybe data science.

00:28 Right, there's this an article that I saw that's kind of interesting. I mean there's not a lot of details but essentially it's saying that Python is replacing Excel in banking and investing.

00:42 The real title is "Python already replaced Excel in banking" but we've got some interesting quotes from here so I'm just going to read it out. This is from the article. "If you wanted to prove your mettle as an entry-level banker or trader it used to be the case that you had had to know all about financial modeling in Excel. Not anymore. These days it's all about Python, especially on the trading floor. And it goes on to talk about how a lot of different modeling that used to be done in smaller cases in Excel, but it would take like a few minutes to run the Excel modifications and analysis. Now they can do even like way more data and have it done in like a second or two. So So it does make sense in cases where split-second decisions or change how you react to the market, that you'd wanna have speed and ease.

01:35 So Python makes sense to me.

01:36 - Yeah, that's really interesting.

01:37 I'm sure it's using a lot of the data science stuff like NumPy and whatnot to make that fast deep down below.

01:43 The whole trading, the algorithmic trading, the high-speed trading, all that kind of stuff, the latency that those folks care about is crazy, right?

01:50 Like if you could get it from four milliseconds at three milliseconds, we'd really appreciate that, right?

01:55 And they'll actually rent servers that are nearly co-located to the stock market to reduce the actual latency or set up alternate direct connections over microwaves.

02:04 There's all kinds of crazy stuff.

02:05 And so if you can go from minutes to seconds, that already seems like it would make a big difference to these folks.

02:11 - Yeah, and also being able to go from minutes to seconds and while incorporating more data.

02:16 - Yeah, super cool.

02:17 - I'm imagining walking through the trading floor and seeing some guy in a hoodie sitting with a laptop on the floor.

02:24 I mean, I don't understand this, but whatever.

02:28 - Five years ago, that person would have been arrested.

02:30 Now people are like, hey, I need some help, man.

02:32 Can you give me some advice on this trade?

02:35 - Yeah.

02:36 - I have a little personal experience with this, Python replacing Excel in banking and trading.

02:40 Can't talk about the details, but I did teach a class to a bunch of folks working on the European stock market, and they actually couldn't even take the class during the day 'cause they had to be there for a while, the market was open, so we had the class in the evening for a week over there, and they were all really into learning Python because they had been trying to analyze how their day went and do this kind of analysis that you're talking about in Excel, and they're just like, we can't do this anymore, we have to get better tools, and Python was the answer for them as well.

03:08 Pretty cool.

03:09 - Oh, that's great.

03:10 Interesting.

03:10 - Yeah.

03:11 Another thing that I think is really, really good news is something that GitHub just announced.

03:16 GitHub has announced a ton of things.

03:17 While you were not with us last week when we recorded in Florida, we talked about how GitHub has added code navigation to all the source code there, much of the source code.

03:29 You go in there and click on functions and classes and say go to definition in Python, and that's pretty awesome.

03:36 So give it a week, and GitHub launches Security Lab to help secure the open source ecosystem.

03:42 - Wow.

03:43 - So you've probably heard about bug bounties and like these bounties paid out to security researchers before, I would guess.

03:48 Yeah.

03:49 Yeah.

03:49 So it's pretty much like that is my understanding of it.

03:53 So it's like a bug bounty program to go and find bugs in open source libraries.

03:59 But what's kind of cool is it seems like the folks like paying out that money are not the open source projects, right?

04:07 Like Apple might pay out a huge amount of money, like $100,000 for finding a big vulnerability in iOS or Microsoft might or whoever, but who's going to pay to find that security bug in Flask or wherever it is, right?

04:22 All right.

04:23 It seems like that this is to pay for those types of things.

04:27 So it says organizations as well as individual security researchers can join a bug bounty program with rewards of up to $3,000 is available to compensate bug hunters for the time they put into searching for vulnerabilities in open source projects.

04:40 Oh, that's neat.

04:41 Cool, right?

04:42 Yeah.

04:43 Yeah.

04:44 So apparently this has been in beta since for a little while.

04:45 When was it exactly?

04:46 A little while, not very long.

04:49 Anyway, the founding members who were part of it have already found, reported, and helped fix more than 100 security flaws already across the open source ecosystem.

04:58 That's pretty cool.

05:00 Another thing that's interesting is the bug report, in order to count, must contain a a code QL, like SQL, but code QL or something.

05:10 I don't know, code QL, which is an open source tool that GitHub released at the same time.

05:17 Remember we talked about their semantic code analysis engine and what it does is basically this is a query that runs against source code that will uncover the vulnerabilities in dependent projects.

05:30 - Okay.

05:31 - So if I find a bug in Flask, I don't know if there is one, but let's just say I'm just picking a random project.

05:34 find a bug in Flask and I submit this, I submit a query to GitHub so that they can go find all the projects that depend on Flask that have out-of-date versions of Flask that need to also subsequently receive warnings to get their stuff updated. So do they then notify all these the other maintainers? Yes, so if you look at that article there's like some screenshots of what it gets. So they will get the actual project will get an automated pull request that fixes the security vulnerability, maybe it bumps the requirements pinned version to something where it's fixed or something, right? It gets the PR to automatically fix it and then there's also a button where they can publish an advisory out to from that repository to dependent repositories and they could also request a CVE which is like a vulnerability official number to be recognized as an an actual issue.

06:29 So GitHub became, what was the term they used?

06:32 A CVE numbering authority, a CNA of course, so that they can actually issue these vulnerability numbers to be understood and referenced as unique IDs across the security landscape.

06:46 - Interesting.

06:47 - Yeah, so all this stuff is integrated into GitHub.

06:49 So GitHub, the researchers find the issue in the main project, the main project gets a PR, the main project can then also push out these warnings to other folks and request CVEs for their projects.

07:01 That's pretty cool, right?

07:02 - Yeah, open source is growing up.

07:03 - Yeah, it totally is, and it seems like it's pretty solid for all the folks working on it.

07:10 It doesn't seem like it requires much of the maintainers.

07:13 It's more like there's this bug bounding program from what I can tell.

07:16 And also, they threw in there right at the end of this, GitHub also updated the token scanning, an in-house service that scans for API keys, like AWS access keys or whatever, that have been accidentally left inside a source code.

07:31 - Oh, that's good.

07:32 That's really good.

07:33 - Yeah, it'd be pretty nice to like, you probably didn't mean this.

07:36 Click this button to make this go away.

07:38 Anyway, I think this is really cool.

07:40 I think this is just plumbing to make open source more secure, and I like that.

07:45 - Yeah, and also just to be able to have companies but money at open source projects to keep them fixed.

07:52 And it's not necessarily trying to get the official maintainer to do it, but to have some incentive for everybody else to watch these things.

08:03 So that's great.

08:04 - Absolutely.

08:05 Yeah, these bug bounty programs have been working really well for the industry, and it's cool to see GitHub putting that in there.

08:11 Also cool is DigitalOcean, not just for sponsoring the show, but because they have awesome infrastructure and awesome product, and we use them for our stuff.

08:18 So let me tell you about a new thing that they have generally available, memory optimized droplets.

08:25 And if you have a memory heavy workload, basically this is the best way to get tons of memory in a droplet or a virtual machine.

08:33 So you can get 8 gigs of RAM for each dedicated CPU.

08:37 And then it goes from 2 CPUs all the way up to enough to get you 256 gigs of RAM, whatever that math works out to be.

08:45 And it's really good for high memory applications like high performance SQL or NoSQL databases in memory caches like Redis or indexes, some kind of large data analysis runtime, something like that.

08:57 So check those out at pythonbytes.fm/digitalocean.

09:01 Really good stuff over there.

09:02 Lots of cool things coming.

09:03 Brian, what you got next for us?

09:05 - Well, we have a couple of friends of ours, Bob Belderbos and Julian Sequeira.

09:11 They run a thing called PyBytes and PyBytes challenges.

09:15 not affiliated with Python bytes, just sounds similar.

09:18 - It's the I versus the Y. It's not even close to the same thing.

09:21 - It's P-Y-B-I-T dot yes. Anyway, I enjoy it. It's a challenges platform where you can just sort of, there's a few of them for free, but it is a paid service. It's one of those things where they give you like kind of a written assignment and some test code already there, and it checks to see and then you have to fill in like the body of a function to make all the tests pass.

09:48 It's a kind of a brain teaser sort of thing. It's a fun way to keep up, make sure that you're practicing out of the box Python stuff that you don't normally do. That's what I use it for.

09:59 But the news is they just added test coverage, so or tests testing. So in the past you were, you didn't write the tests, they wrote them to evaluate your code. But they've added a few test challenges where they write the code and you have to write the test code to check that code.

10:16 And it's kind of cool, but they were, they actually talked to me about this as well as to try to pick my ideas, but they came up with it on their own. How do you evaluate if the test code is good? So if you evaluate if your source code is good by running tests, but the other way around is a little difficult. Yeah. How do you test the tests?

10:33 Yeah. So they did it a couple of ways. They're using coverage up high to make sure that you're hitting 100% coverage. And, you know, yes, it's debatable for a large project of whether you should get 100% coverage, but for a small function or some small bit of code, it should, you should be able to hit 100% coverage. That's a nice thing. The other one is mutation testing. So there's a couple projects we've heard of, MutMut and MutPy, M-U-T-P-Y. And I think we talked about this earlier but Ned Batchelder did write an article about his experience with mutmut, but PyBytes is using mutpy and what it does is it takes your the source code and changes something about it and mutpy works at the level of the abstract syntax tree and it changes like for instance a division operator to a multiplication or or changes a string to some other string or something and then it runs the tests again and the idea is you want your tests to be able to it makes a whole bunch of mutants of the code and you want the test to be able to kill off all the mutants except for the original that's how they're testing it's kind of a neat idea but it's fun to play with it is an interesting question to ask how do you test the test and I think this is pretty creative well done Bob and Julie I haven't used a mutation testing a lot I've tried it out but I haven't used it like for projects.

12:00 The idea of using it in a training situation is a novel thing I haven't heard of before and I think that's a cool idea to be able to try to test somebody's test code.

12:11 - Yeah, I agree and like you said, 100% code coverage for a project that's real is challenging.

12:17 I think also maybe mutation testing for a project that's real is tricky.

12:21 'Cause maybe it changes like the print statement that shows what the title of the app is.

12:25 who cares, no one's going to check for that, right?

12:27 - Right.

12:28 - But in this case, where pretty much it's a very small focused bit of code and you're supposed to test it, presumably any changes to that are going to appear in the couple of tests you write.

12:37 - Yep.

12:38 - Nice, now speaking of tests, I feel like I stole this one from you, Brian, just out of the universe, I mean.

12:43 So I want to talk about PyHttpTest.

12:47 Now this one comes from Florian Dalets, and he actually sent in two things for this week, which they were both excellent, so I'm gonna cover them.

12:57 This is a command line tool for HTTP tests against RESTful APIs.

13:02 - Okay.

13:03 - All right, so the idea is, basically, I want to test some RESTful endpoint, and instead of going over and say, okay, I'm gonna create, I'm gonna get requests, I'm gonna do a get, I'm gonna get the dictionary, I'm gonna verify, like, this thing is in the dictionary, and so on, what you basically do is you just write a simple little JSON document for each test that you want to run.

13:24 - Oh, cool.

13:24 Other than it has things like what is the name of the test?

13:27 What HTTP verb do you want to use?

13:29 What is the URL?

13:30 A combination between host and endpoint.

13:32 The headers you need to pass.

13:34 A query string you need to pass.

13:35 And then you get back a report.

13:36 It actually gives you a cool report in a like columnar style validation that lets you assert things about it.

13:43 Yeah, there's a handful of these types of things.

13:45 And I think it's kind of a neat way to describe API testing.

13:49 Yeah, it seems really cool.

13:51 There's a bunch of neat little libraries that are used as well, like Tabulate, which is a cool way to print the tabular data that they're showing there, and things like that.

13:59 But yeah, I like this project.

14:01 If your job is to test a bunch of HTTP endpoints, this is pretty cool.

14:06 Yeah, neat.

14:07 Nice.

14:08 All right, what else?

14:09 What's next?

14:10 Oh, next.

14:11 X-Ray.

14:12 This was suggested by a listener.

14:14 I think it's Guido Imperial.

14:16 Yep, I agree.

14:17 Thanks, Guido.

14:18 Sent it in.

14:19 haven't covered it before and actually I didn't know about it before. People in the data science community probably do because it seems like pretty powerful but the the gist of it is it's built it uses and builds on top of NumPy and pandas and Dask to offer in-dimensional arrays. You can do in-dimensional arrays in pandas already, I believe, but the one of the neat things about these is that they're they've got labels on them so they're self-describing and they've got indexes. There's a few data types within it. There's a data, so there's X-

14:54 array, data array. The data array is the n-dimensional array but it has metadata like names and labels for the dimensions. And you can also have coordinates and attributes. And coordinates are essentially like the tick elements for the different axes. And then attributes, the data array doesn't really do anything with the attributes, but it's a way to consistently keep data with data.

15:21 So if you have to keep track of some extra things like, you know, where was this data collected or really anything, you can add them as an attribute. And then a data set is a dictionary-like collection of data array elements. I was playing with this and it's pretty darn cool. One of the nice things about using it is just keeping all of that the dimension names together. So if you have a multi-dimensional array, even just like a three-dimensional array, it's sometimes hard to keep track of, you know, which axis is which and this is all together. But it's not just packaged together. You can also do things like use the label names and the axi names and even axi element at the coordinates, they don't actually need to be numbers.

16:09 For instance, you could have the months of the year or the letters of the alphabet be coordinates.

16:17 You can use those as selectors to be able to select rows and columns and those return different data array elements.

16:24 The data array elements also can be used in algorithms.

16:27 They can just be passed directly to Pandas algorithms.

16:30 These are pretty cool.

16:31 >> Yeah, it looks a little bit like it's taken some of the features from NumPy, some of the features from Pandas, some of the features from Dask and sort of brings them together into one package.

16:40 So when I was going through some of the tutorials I was to get somebody to talk about this it was like a three-dimensional array in I think it's in pandas is used to be is considered a panel but when I went to look at the panel information it looks like panels are being deprecated for something else so even in the pandas documentation it was pointing to this x-ray project so oh - Oh, interesting.

17:05 - I think the people in the pandas community are definitely familiar with it, but if you're using pandas kind of on the side and you're not really in it all the time, this might be helpful.

17:13 - Now, previously you spoke about Bob Belderbos, and I said we got this item from Florian Dalitz.

17:19 I'm gonna bring those two things together in this next one.

17:22 So Bob had introduced us to Carbon, remember that?

17:26 - Yeah.

17:27 - Carbon is like sort of beautiful screenshots for colored code, right?

17:31 code, it's like a mock bow, little like shell or whatever editor.

17:36 Like you don't use screenshots of real editors.

17:38 You just create that with carbon@carbon.nav.sh.

17:42 And that's cool.

17:42 But those are generally static.

17:44 So Florian sent in this thing called term to SVG.

17:48 And it's a cool way to create animated terminal.

17:53 Yes.

17:54 So instead of going all the way to create like full on screencasts of your you can run this in your terminal and then you just do whatever you want to do in the terminal and it captures it perfectly into SVG and then you get convert that out to some kind of animated thing like I guess the SVG itself is animated so you just show that in the browser or wherever you want to put it. Isn't that cool? Yeah very cool. You basically just type term to SVG once you have it installed and it starts recording you do a bunch of stuff and and then there's a way to get out of its recording status.

18:28 So it's pretty cool.

18:29 It produces like lightweight, clean looking animations, or you can even do still frames if you want for like a project page.

18:37 So instead of, like carbon is cool 'cause I can put in the text and the code I wanna show up, but maybe it doesn't have, here is what the progress bar and then the install steps with the spinner look like, right?

18:49 It doesn't naturally capture what actually happens when that code or those terminal commands execute.

18:56 So this one, it has color themes, animation controls, all sorts of good stuff.

19:02 And yeah, it's pretty cool.

19:04 So there's a, probably if you want to, this sounds interesting, you want to check out the examples.

19:09 So there's a whole page of examples and there's a bunch of different stuff happening.

19:12 You can just look through there.

19:14 And I think there's also templates that configure how it records and stuff.

19:17 So there's a bunch of predefined templates that you can go play with to get started from.

19:20 - That'd be really cool for like a tutorial site or something to-- - Yes, exactly.

19:25 - Yeah. - Or even, or if you have a project, right?

19:28 Like if you're the maintainer of pipX, it'd be cool to use this to create a way to like show how awesome pipX is.

19:33 Like this step, then this step, and then boom, right?

19:35 Just put that right in your GitHub readme.

19:37 - Yeah, I love it when there's little animated things in the readme.

19:41 So when you go to GitHub, you just see that.

19:43 - Yeah, you and I, we spend an inordinate amount of time jumping into new projects and going, is it interesting?

19:51 Yes or no?

19:52 Why is it interesting, right?

19:53 And this kind of stuff is the thing that just goes, after 10 seconds I knew I wanted to learn about it, right?

19:59 It really makes a difference and it's easy.

20:01 - Yeah, very cool.

20:02 Definitely check this out.

20:03 - Yeah, for sure.

20:04 All right, yeah, so that's a good one.

20:05 People can check that out, a term to SVG.

20:08 Pretty cool.

20:09 All right, well that's it for our main items.

20:10 What else you got?

20:11 - I have one bit of extra news, is that pytest 5.3.0 was released the other day.

20:18 And it is mostly, there's some cool features.

20:21 And if you, you know, pytest nerds, definitely check it out.

20:25 But I wanted to bring it up because I think a lot of people that just use pytest and are using it with a continuous integration systems should pay attention to this because the JUnit XML output, they've changed the default.

20:37 So the default format, there's an XML output has, there's an old version and a new version.

20:44 The new version has some more information, but they wanted to make sure that people know about this.

20:48 So if you run it, you'll get a warning and it's not really a warning, it just says, it's just to make you aware that there's a particular format that's being deprecated.

20:57 So eventually in the 5.4 release, they won't support the old format.

21:03 So if you see this, I encourage anybody using pytest and continuous integration to read the change log and understand what's going on and make sure they're ready to either pin pytest or change their system.

21:15 - Yeah, it's a good thing to put on people's radar for sure.

21:18 - Okay, how about you, Michael?

21:20 Any extra spits?

21:21 - Yeah, I got a bunch for you.

21:22 Actually, a couple of things.

21:24 PyCon, PyCon's awesome, we love that each year.

21:28 And this year it's going to be in Pittsburgh for the first of its two years in that city.

21:33 And PyCon registration is now open.

21:36 You can go and register, get your ticket before it sells out.

21:38 - Oh, cool.

21:39 - Yeah, that comes to us from Jacqueline Wilson, so thank you very much for sending that in.

21:43 And then also I saw, I can't remember where I saw this, somewhere, actually I think somewhere funky like a flip board or something.

21:50 So Facebook has now decided that Microsoft's Visual Studio Code is their default development platform.

21:57 That's a little surprising to me.

21:58 - Yeah, interesting.

21:59 - Yeah, that's an article on ZDNet.

22:02 And they're also helping Microsoft improve the remote development experience in VS Code.

22:08 Cats, dogs, all live in the same place.

22:11 (laughing)

22:11 - Okay.

22:12 - Yeah, this is cool.

22:13 I suspect that things like Vim and Emacs and stuff probably have a strong representation there, but apparently it's all about Visual Studio Code over there now.

22:21 Anything else?

22:21 Yes, two more things. Very exciting.

22:23 So if the release schedule lines up correctly and the future extends as I expected, this should be Wednesday before Thanksgiving, right?

22:35 And that would mean the day or two after that is going to be Black Friday.

22:40 So I just want to point out that Talk Python Training is going to have a really awesome Black Friday sale.

22:46 get a whole bunch of stuff on buying all the courses, but also we're doing some special things to support the PSF and other stuff, some surprises in there that I suspect people won't guess at and there's no way people are gonna guess what is there.

23:00 So check it out over at training.docpython.fm, but you gotta act right away 'cause it's only gonna be there for like four days.

23:06 It's a big deal.

23:07 So check that out.

23:08 And also we have a new course coming, Python for the .NET Developer.

23:12 So, so many people are coming from C# in the .NET world over into the Python space, I thought it would be cool to create a course that kind of gives them a big hug and holds their hand and helps them step over that divide.

23:24 So it's like, do you know about ASP.NET?

23:28 Here's Flask, and here's how you use it in Python.

23:30 Do you know about any framework?

23:32 Here's SQLAlchemy, here's how you use it in Python.

23:34 Like all the things that they need or they love from C# and .NET, here's the Python equivalent and why it's awesome and how it works.

23:42 - Is that one that you did or did somebody else do that?

23:44 - No, no, I did that one.

23:45 - 'Cause you're like the perfect person for that.

23:46 - Exactly, I spent so many years doing C# and now I'm all about Python.

23:50 So exactly, I figured like why don't I try to think back to the way it was for me many years ago and like sort of extend that experience back to other people.

23:58 It's probably not gonna be out yet.

24:00 It may be out at the time that people hear this, but it's coming really soon, so I'll just put it out there as that.

24:05 - That's nice.

24:05 Hey, speaking of Black Friday, I do not have any insider knowledge, but Pragmatic Publishers often does a Black Friday sale It's usually fairly steep.

24:15 So if you've not picked up the Pytest book yet, and really if you're listening to this and you haven't read it yet, what's going on?

24:22 - Come on.

24:23 - If you haven't, maybe check out praguefrog.com and see if there's a sale.

24:29 - Definitely, I'm sure there will be.

24:30 It would be surprising if there weren't.

24:32 Awesome.

24:33 - How about a joke or two or three?

24:34 - I like three jokes.

24:35 - Okay.

24:36 - It's a good number.

24:37 So this first one is more of just a geeky STEM type of joke, but I think people will like it.

24:43 So I love soda drinks, you know, Coca-Cola, Dr. Pepper, root beer, things like that.

24:49 So this one, I try to not drink too much, but I do like it.

24:53 But here's how that world can clash together with math.

24:56 What do you get when you put root beer into a square glass?

24:59 - I don't know, what?

25:00 - Beer.

25:01 (laughing)

25:03 - I don't even get it, but it's funny.

25:05 - If you take root of beer and you square it.

25:07 - Oh, okay.

25:08 - Right?

25:10 Like the square root of beer, and then you put it in a square glass.

25:12 - Okay, that was bad.

25:13 What's your next one here?

25:14 - Okay, what do you call an optimistic front-end developer?

25:17 - I don't know, what do you call it?

25:19 - A stack half full developer.

25:21 - That is awesome.

25:22 - Okay, now also I was gonna tell a version control joke, but they're only funny if you get them.

25:30 - Get G-I-D, awesome.

25:32 Those are both good, I like them, yeah, great.

25:35 - Cool, well thanks again for having a nice conversation this week.

25:38 - Yeah, you bet, and thanks as always.

25:39 See you later, Brian. - Bye.

25:40 - Thank you for listening to Python Bytes.

25:42 Follow the show on Twitter via @pythonbytes, that's Python Bytes as in B-Y-T-E-S.

25:47 And get the full show notes at pythonbytes.fm.

25:51 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

25:55 We're always on the lookout for sharing something cool.

25:58 On behalf of myself and Brian Aukin, this is Michael Kennedy.

26:01 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page