Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #211: Will a black hole devour this episode?

Return to episode page view on github
Recorded on Wednesday, Dec 2, 2020.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:04 This is episode 211, recorded December 2nd, 2020.

00:09 I'm Michael Kennedy.

00:10 And I'm Brian Okken.

00:11 And we have a special guest, Matthew Feigert. Welcome.

00:14 Yeah, thanks so much for having me on.

00:15 Yeah, it's great to have you here. You've been over on Talk Python before, right?

00:20 Yeah.

00:20 Talking about some cool high-energy physics and all that kind of stuff.

00:25 Yeah, I looked that up last night just to try and remind myself that was Episode 144.

00:30 I was on with my colleagues, Michaela Paganini and Michael Kagan to talk with you about machine learning applications at the LHC.

00:38 Yeah, and you do stuff over with CERN at the Large Hadron Collider and things like that?

00:42 Yeah, yeah. So I'm a postdoctoral researcher at the University of Illinois at Urbana-Champaign.

00:47 And so there I split my time between working on the Atlas experiment and working as a software researcher at the Institute for Research and Innovation and Software for High Energy Physics, Iris Hepp. And so on Atlas, Atlas is this huge five-story tall particle detector that lives 100 meters underground at CERN's Large Hadron Collider that's just outside beautiful Geneva, Switzerland. And so there I work with a few thousand of my closest colleagues and friends to try and look for evidence of new physics and make precision measurements of physics we do know about. And then my IRISEP work is kind of focused on working in an interdisciplinary and inter-experimental team to try and improve the necessary cyber infrastructure and software for us to be able to use in upcoming runs of the large Hadron Collider and in what we call like a high luminosity run, which is going to be way more collisions than normal. Have you guys ever turned it out to full power?

01:43 Have you turned it to full power yet? No, so yeah, the design luminosity or the design energy of the LHC is at something called 14 tera electron volts, 14 TeV. And we've been running intentionally at a lower operating energy for the last couple of years, just a little bit below that. But in the late 2020s, we're going to- >> So you don't create a black hole and suck the entire earth into it and that kind of stuff? >> No experimental evidence of black hole creation yet. But the cool thing is that if we even did make a black hole at the LHC due to something called Hawking radiation, it would evaporate well before it could actually ever do anything interesting gravitationally. But yeah.

02:19 - Yeah, that's a good deal. - It's a really exciting, really exciting time. - No, I'm joking, but it's such a cool place, such cool technology.

02:25 I mean, that's right out of the edge of physics these days.

02:28 And the technology side is neat, too.

02:29 - Yeah, no, it's super fun.

02:31 - Cool, well, welcome over to Python Bytes.

02:34 - Yeah, it's great to be here.

02:35 - Yeah, it's great to have you.

02:37 Thanks for coming.

02:37 And Brian, I think, let's start with another one of my favorite topics.

02:41 - Farms?

02:43 - I love farming.

02:43 You know, you see the bumper sticker, no farms, no food.

02:46 I like food a lot, so I love farms.

02:49 No, no, but the farm stack.

02:51 We've heard of the lamp stack, other stacks.

02:54 Like lamp is not as useful as farm, right?

02:57 Farm sounds more useful.

02:58 So tell us about farm.

02:59 - So Aaron Bassett, he's, I'm not sure, I think he's one of the spokespeople for Mongo or something, like advocate or something like that.

03:09 Anyway, he's doing, he wrote this article, but they've also done, I think there's been some talks given, but this is a nice article.

03:20 It's called Introducing Farmstack, which is FastAPI, React, and MongoDB.

03:25 So I really actually appreciated the article and the code with it, because there's a GitHub, a little GitHub to-do CRUD app that they've put together.

03:38 And the article describes basically all of the pieces the application using a little to-do app.

03:46 But with FastAPI, you've got this interactive documentation mode where you can interact with the application just almost immediately.

03:57 You don't have to really do much to put it all together.

04:00 Then for all your endpoints, you can actually interact with them, send data, do queries.

04:06 There's a little animated GIF to show how that's done.

04:10 But the article then goes through and says basically how the endpoints and routes get hooked up, and then uses uVehicorn to set up an async event loop and get that going, shows how easy it is to connect to a database, and then defining models and how easy it is to set up a schema.

04:32 It goes through and then talks through the code discussion.

04:38 you do have to write code for the endpoints and really how easy those are with all of these pieces.

04:45 The React application is kind of a minimal React application.

04:49 I'm not sure why they kind of included that, but it's kind of a neat addition.

04:53 There's a React application that's running that just sort of shows some of the interaction with the CRUD app, and it gets updated while you're changing things through the interactive API.

05:07 And I just, I liked the demonstration of working through, working with an API and working through changing things and seeing it show up having a, like a React app at the other end.

05:21 It's kind of a fun way to kind of experiment with an API.

05:24 - This is a really neat thing.

05:26 And one of the other major stacks that's been used around Mongo is the mean stack.

05:29 And the farm stack is way nicer than the mean stack, not just because it uses Python and not JavaScript, but there's some interesting things here.

05:36 One of the examples is actually kind of blowing my mind in that it's an if statement using the walrus operator awaiting an asynchronous call in an API method.

05:48 Like the walrus operator, the await keyword, I've never seen those together and it's kind of like, it's inspiring.

05:55 - It's nice, it's good.

05:56 - Yeah, yeah.

05:57 - It's such succinct code as well.

05:59 - It's super nice.

06:00 I mean, it uses FastAPI, which is fantastic.

06:02 It's using Motor, which is MongoDB's officially supported Python async library, 'cause you need an async-capable library in order to do things against MongoDB.

06:14 You know, this actually comes from the developer blog at MongoDB.

06:18 There also are some ORM-like things, some ODMs, object document mapper stuff, that also supports async and await from MongoDB, so if you're more in the ORM style, you might check that out.

06:28 But other than that, this looks pretty neat to me, yeah.

06:30 - Yeah, and I do know that a lot of people use the ORMs, but I appreciated the example without an ORM for people, because you throw an ORM example in there and then people that don't use that particular one get lost, so.

06:44 - Yeah, Matthew, do you guys do anything with MongoDB?

06:47 Any of these kind of things, FastAPI?

06:49 - Yeah, I have some friends that do.

06:50 I personally myself am not too versed in Mongo, but I've heard it on the show and many times elsewhere.

06:58 So this is, I think also just kind of paging through the article as Ryan was talking about, it is pretty impressive.

07:05 So.

07:06 - It's really concise.

07:06 It's like, here's your four lines to completely implement the API.

07:10 - Yeah.

07:11 - Type of things, right?

07:12 Asynchronous, fast, like all the cool stuff.

07:13 Yeah.

07:14 Yeah, there was an example, a case study of MongoDB being used at the large Hadron Collider, but that was many years ago and I don't know if it still is.

07:22 So it's, I've completely forgotten where that is.

07:25 - Yeah.

07:26 - But yeah, yeah.

07:27 Cool, cool.

07:28 So next thing I want to talk about another programming language.

07:31 Last time, Brian, I went on and on, maybe the time before, two times ago, about .NET and C# because Anthony Shaw had done that work on Pigeon to get Python to run on .NET and we're like, well, why are we talking about C# on this project, right?

07:45 On this podcast?

07:46 Well, I want to talk about something even more advanced, AppleScript.

07:49 Wow.

07:52 Cutting edge.

07:52 Yes.

07:53 It's like the CMD, shell script of Apple.

07:57 It's, I don't, have you ever programmed an Apple script?

07:59 It's painful.

08:00 No, I've not.

08:01 It's like, you say like, tell this application that to like make a command.

08:06 Oh, it's, it's bad news bears.

08:08 Let me tell you.

08:08 So, so what I've come across is this thing called pie Apple script.

08:16 Now this is not brand new, but it's brand new to me.

08:18 And there's a lot of talk about max and people, people may be getting new max.

08:23 So I thought I would say, Hey, look, here's a cool way to automate your Mac or Macs within your company or whatever, with Python instead of this dreaded NS AppleScript.

08:33 - Okay.

08:34 - So basically it's a Python wrapper around NS AppleScript, allowing Python scripts or applications to communicate with AppleScript and Apple scriptable applications.

08:45 So apps for which they basically implement AppleScript and let you do that.

08:49 So scripts get compiled either from source or they can be loaded from disk.

08:53 They have these, some of these ideas are from AppleScript.

08:56 the standard run handler and user defined handlers can be invoked with or without arguments.

09:01 They're automatically converted.

09:02 The responses to and from AppleScript are automatically converted either from AppleScript to Python types like Python string versus AppleScript one or vice versa, right?

09:12 So you don't have to do the type coercion, which is cool.

09:15 And they're persistent.

09:16 So you can call your handle multiple times and it retains its state like AppleScript would.

09:21 And it also has no dependency on the legacy AppleScript library or the so-called flawed scripting bridge framework, which is limited to OSA script executables.

09:33 So that's pretty cool.

09:33 If you wanna automate things on your Mac, you obviously could use Bash, but if you're talking to some kind of application that implements one of these scripts, like for example, you wanna tell this other application to grab something out of the clipboard and then tell it to do something or something like that, right, like you couldn't reasonably do that with Bash, right, once it starts up, you kind of want to go back and forth with it.

09:56 So it sounds like AppleScript might be the thing to do.

09:58 Pretty cool, huh?

09:59 - Yeah.

10:00 - Yeah, I mean, not a whole lot to it.

10:02 Like if you've got a script, your Apple macOS stuff, do it with Python.

10:08 You don't have to do it with that AppleScript stuff.

10:10 - No, it's neat.

10:11 - Yeah, so Matthew, you probably brought some, something to do with physics, data science, I'm guessing.

10:16 What's your question here?

10:17 - Yeah, a bit.

10:18 So we currently live in this like really nice age of having awesome CI services and all these really nice metrics for all your GitHub projects and everything.

10:27 So, you know, if you're, I'm thinking of like coverage.

10:31 So if you're, you know, using pytest and, you know making sure that you're reporting your coverage you have all these really great services to also track your coverage and report that in iShiny badge.

10:41 But let's say you're developing some tool or some library and you have some sort of performance metric that you care about.

10:48 Let's say like how fast some, the speed of evaluation for certain expensive functions.

10:54 And you actually want to try and track that through the entire history of your code base.

10:57 And that's not something that's traditionally very super easy to do.

11:01 So recently I was really happy to find something called--

11:03 - So if you're making changes, so if you're going to be adding some feature or whatever, you are refactoring it so it's easier to write, but you're not sure if that makes it faster or slower, this would sort of give you that information from week to week or something like that?

11:16 - Exactly, yeah.

11:17 So you might go ahead and say, "Okay, well, I have some tests that make sure that this function evaluates in under some period of time if it's an expensive function for your test." But let's say you actually wanted to track across different parameterizations how that function actually is being performing and evaluating it in your whole code base.

11:39 So I've recently found this super cool tool written in Python called AirSpeed Velocity.

11:46 And so from the docs, ASV, Air Speed Velocity, is a tool for benchmarking Python packages over their lifetime.

11:54 So it deals with runtime, memory consumption, and even custom compute values.

11:59 And the results are then displayed in a super nice web frontend that's interactive and basically just requires like static web page hosting.

12:07 So it's pretty impressive. And just if you click on the docs, you can see that's developed by a community of people.

12:15 but led by Michael Doritboom.

12:19 I'm probably getting your name wrong, very sorry.

12:21 And Polly Burrton.

12:24 But if you look at some--

12:25 - He's the guy who was behind PyOxidizer at Mozilla.

12:29 - Oh, really?

12:30 Oh, okay.

12:31 - Yeah, I think so.

12:32 - That's a super cool project.

12:33 - Yeah, for sure.

12:34 - Yeah, and so I mean, if you look at the other people that are on the contributor list, you can spot a lot of names that are common in the SciPy and Jupyter ecosystem.

12:44 So you already know that this is a nice community built tool.

12:49 And then also, as some example cases, they give current projects that are using it, like NumPy and SciPy and Astropy.

12:56 So pretty well established projects.

12:59 And just as an example, if you click on the SciPy project and go to the interpolate function there, you can just look at a very nice visualization of the actual evaluation in time on the vertical axis across a whole bunch of parameterizations, such as CPython version and number of samples that are being run.

13:20 You can see this for the entire lifetime of the code base, and you can zoom in on any section just with the mouse.

13:26 Something I think that is super cool is if you're looking at the visualization of the plot and you see that, there's one commit where all of a sudden things go funky and the evaluation time just jumps up, you can just click on that node and it immediately opens up to that commit in GitHub, which is I think super awesome that you don't have to go and search through your commit history to figure out where that corresponds to.

13:48 It's just boom right there.

13:49 >> I'm looking, it shows the SHA from GitHub.

13:52 >> Yeah.

13:53 >> The unique identifier of the commit.

13:56 That's crazy.

13:57 >> Yeah.

13:58 >> Wow.

13:59 >> Yeah. A project that I'm working on, we've been interested in trying to have the metric tracking for some of our work.

14:07 This is something that I'm actively looking at how we might be able to deploy this for one of my projects with my co-authors.

14:14 But it's openly developed on GitHub. It's up on PyPI as well, so just pip install asv.

14:20 And then I think something that's kind of very cute and very kind of Pythonic is that when you go to the reporting dashboard for the different libraries that you're actually benchmarking, it will up at the top say the airspeed velocity of an unladen x. So the airspeed velocity of an unladen NumPy or an unladen SciPy. So, you know, keeping very true to the, you know, Python's roots there. >> There's some Monty Python, the show Zen in there for sure. >> Exactly, yeah.

14:50 >> This is impressive. I mean, Brian, how do you see this fitting into like testing and stuff?

14:54 >> I actually love this. I could use this right away. There's lots of, well, a lot of times it's It's not, yeah, performance is always something we care about and benchmarking systems and testing.

15:09 It's something you forget about sometimes, like running stuff and it still works, but like over time things slow down and it's good to know that.

15:21 - Yeah, and if this could just be automatic and just part of your CI, you just go back and see the updates, that'd be very cool.

15:27 - Definitely.

15:28 >> I don't think that this is something that at the moment and I'm happy to be corrected about this.

15:33 I don't think at the moment there is some way that this is currently being given as a CI service, but I think that this is something that you could set up and run for yourself pretty easily.

15:43 >> Yeah, you could probably plug it in.

15:45 >> Yeah.

15:45 >> Yeah, exactly. But you could probably do some web hook when there's a check-in, automatically kick it off and then save a result.

15:54 You could just hook into the GitHub actions and then have it just call you back.

15:58 and start your-- let's take a record of this or whatever.

16:02 Yeah, very cool.

16:03 This is a great idea.

16:04 Yeah, something else that I haven't really investigated yet but that I'm looking into is if this can also be used to do GPU benchmarking.

16:12 So let's say you have a library that also is going to be--

16:16 you can transparently use the APIs to transparently move from CPU to GPU.

16:21 Like you have something like Jax or TensorFlow or PyTorch, then this might be a nice way if it's based on those to be able to like benchmark your GPU performance as well.

16:31 - Yeah, well, and that's one of the things you might not test, right?

16:34 If it could run either way, you might just run it on your machine, whichever one of those it is and forget to try the other style, right?

16:41 - Exactly, yeah.

16:42 And I don't think there's too many CI services that are gonna generously give you some like really nice GPUs to be doing benchmarking on.

16:49 - Yeah, that's for sure, for sure.

16:51 All right, now for the next item, let me tell you about our sponsor.

16:54 This episode is brought to you by TechMeme, the Tech Meme Ride Home podcast.

17:00 They've been for two years recording episodes every single day.

17:05 So they're Silicon Valley's favorite tech news podcast and you can get them daily, 15 to 20 minutes, exactly by 5 p.m. Eastern, all the tech news you want.

17:15 But it's not just headlines, much like By The Mbites, actually, it's a very similar show, but for the broader tech industry.

17:20 You could have a robot read the headlines or just flip through them, but it has the context and the analysis all around it.

17:26 So it's like tech news as a service, if you will.

17:29 So the folks over at TechMeme, they're online all day reading to catch you up and just search your podcast app for the ride home and ride home and subscribe to the TechMeme ride home podcast, or just visit pythonbytes.fm/ride to subscribe.

17:45 I have a theory, a hypothesis about this.

17:48 I think that probably actually be a ton of work to put together a show daily on a time like that, but it's great that they're doing it.

17:55 Do you have any other hypotheses, Brian?

17:57 Yes. My hypothesis is that there's not enough examples out in the world of how people are using hypothesis in the field in real world applications.

18:07 So, I'm excited that Parsec put it together. So, Parsec...

18:13 Let's take a real quick step back just for people who don't know.

18:15 What is hypothesis?

18:17 Oh, okay, right. Hypothesis is a testing framework. Well, it's not really...

18:21 It attaches to other testing frameworks.

18:23 you can use it with unit test or pytest.

18:26 You probably should use it with pytest.

18:28 But it's a way instead of writing a declarative single test or test case, you can, it's a property-based testing.

18:37 So you describe kind of, it's not like, I expect one plus two equals three.

18:43 I expect if I add two integers, and they're both positive, that the result is going to be greater than both of them.

18:52 You know, you have like these properties that you describe what the answer is.

18:56 And there's a, there's a, the examples that Hypothesis and other, you know, tutorials on how to use Hypothesis have given are more of these like A plus B sort of things. They're simplistic things.

19:09 And I, and I do see a lot of value in Hypothesis and I know a lot of people are using it, but there haven't been a lot of good descriptions for really how it's being used.

19:19 like a real world example of how it's being used because I'm probably not going to, I don't have those little tiny algorithm things, I've got big chunks of stuff.

19:30 And Hypothesis does have to run the test many times.

19:34 So how do you do this effectively on a large project?

19:37 So I love seeing this article.

19:39 So Parsec is a client-side encrypted file sharing service.

19:45 I'd never heard of them before this blog, but it sounds cool.

19:49 - Cool, they described themselves as the zero trust file sharing service like Dropbox, where it's end to end encryption for Dropbox.

19:56 - Yeah.

19:57 - You could share the files, but it only matters if you actually have the key, right?

20:00 - Right.

20:01 Actually, I have no idea.

20:02 (laughing)

20:03 Sure.

20:04 - I suspect so, yeah.

20:05 It sounds like a cool service, actually.

20:07 - It sounds pretty neat.

20:08 But they, so they describe what, kind of what they're doing there, this and some of the problems.

20:13 It's a large four year old asynchronous Python project.

20:18 And then they describe this raid redundancy algorithm that they need, it's fairly complex with a bunch of servers and stuff, a bunch of data stores going on.

20:31 And what they need to test is they need to check things like if the blocks can be split into chunks and if the blocks can be rebuilt from the chunks that were split up before.

20:41 And then if you can rebuild them if you've got missing chunks.

20:44 And so this all sounds fairly, you know, yeah, I can understand how you could try to test that, but there's a lot of variables in there.

20:52 How big is the chunk size?

20:54 How many chunks?

20:55 How much stuff should be missing?

20:57 And all that sort of stuff.

20:58 And that, then they're thinking, yeah, hypothesis would be good for that.

21:05 They do, the normal tutorials talk about a stateless way to test with hypothesis, but they're saying that for them, the stateful method that is supported is very useful because they're asynchronous system and they describe how to do that.

21:23 It's actually a fairly complex description and it's kind of a lot to get through, but it's neat that the power's there.

21:30 So it does, you know, walks through how they, exactly how they set up a test like this.

21:37 And this is something I think the testing community of considering hypothesis has been missing.

21:43 So this is great.

21:45 They end with some recommendations, which it's great.

21:51 So the recommendation is for parts of your system, which parts should you throw hypothesis at?

21:57 >> That's a really good question because you don't want to throw it at everything.

22:00 >> Right. Because there is some expense to set it up and also to run everything.

22:04 So they describe it as if the piece you're testing is an encoder-decoder thing, like there's is you're splitting things into chunks and then rebuilding things.

22:15 It's a hypothesis is a no brainer for that because you can compare is my input the same as the encoded then decoded output.

22:26 The other cases if you have a simple oracle, simple oracle like it's simple to test the answer but it's complex to come up with the answer.

22:35 I'm not sure what that is, but in the case, some of the cases are, I've got a complex system and there's properties about the output that are easy to describe.

22:47 The other one is, I guess similar, is if it's hard to compute but easy to check.

22:53 - Well, one example that just jumps out at me right away is anytime you have a file format, I'm going to save this thing, be able to save and load these files.

23:01 Because all you've got to do is load up a whole bunch of random data, of data say save load, is it the same?

23:07 If it's not, that's a problem.

23:09 - Yeah. - Yeah.

23:10 Yeah, and actually I have talked with some people that have thrown this at some of the standard library modules just on the side to test because there's a lot of standard library stuff that's like kind of encoding, decoding sort of thing or two-way conversions.

23:29 - Yeah, cool. - Yeah.

23:30 This is super nice.

23:32 I'm gonna have to really dig into this article in more detail. I remember the first time I learned about Hypothesis was when one of the core devs gave a talk at SciPy 2019, and it just blew my mind then. And so this is so cool to see this very, very interesting application here.

23:49 Yeah. Yeah. It seems like there's a lot of uses in data science. Data science seems tough to test, like that scientific computation side, because slight variations, you might not get perfect equality.

24:00 >> Exactly.

24:01 >> Close enough. It's like, well, it's off, but it's like 10 to the negative 10th or something off.

24:08 That doesn't actually matter, but the equality fails.

24:10 >> Yeah. You end up using NumPy's approximation comparison schemes quite a bit in your pytest.

24:20 >> I can imagine. Very cool.

24:24 Next one, Brian, I told you about last time I talked about, I'm still waiting on my Mac mini, right?

24:31 I ordered the Apple, the M1 Mac mini maxed out, and I'm a little bit jealous.

24:37 My daughter is getting a new Mac mini, she doesn't, or Mac Air, she doesn't know about, but it's supposed to show up tomorrow, and mine's still weeks away, and I don't think that that's very fair.

24:47 But if you are an organization that depends on cloud computing, and you know what organizations don't these days, right?

24:55 that they almost all do.

24:56 It was just announced at reInvent that AWS is gonna be offering Mac instances as a type of VM.

25:03 So until now, you've been able to get Windows, Linux.

25:06 That's it.

25:07 So for all those people out there who are offering some kind of iOS app, even if they're not like a Mac shop, they still have to have Macs around 'cause you can't compile and sign your IPA, your Mac, whatever iPhone app format is.

25:20 You can't create those without a Mac.

25:21 So there's all these Macs that are around like continuous CI/CD or checking those things and whatnot.

25:27 So now you can go to AWS and say, I'll take a Mac Mini, please.

25:32 >> That's pretty cool. That's cool.

25:33 >> Yeah. You can do your test up there and they don't have M1 yet.

25:36 Those are the Intel ones, but the M1 chips are coming later.

25:40 So you'll be able to do it.

25:41 What's interesting about this offering from AWS is basically any Cloud service, you would imagine it's a VM, right?

25:47 But when you say I want one of these, you actually get a dedicated Mac Mini.

25:52 you get pure hardware.

25:54 - Well, that's why you can't get yours, 'cause Amazon bought them all.

25:58 - They did.

25:59 They had a huge truck full of them.

26:00 Well, they bought the Intel ones, so those were on sale, I bet, anyway.

26:03 But no, they have some interesting, what do they call it, Nitro?

26:09 I think they call it their Nitro service or something like that, which allows them to virtualize actual real hardware.

26:15 So this is pretty neat.

26:17 You can sign up.

26:18 The billing is interesting.

26:19 You have to pay for at least one day's worth if you get it, which I think is like $24.

26:24 If you're gonna run it continuously all the time, this is one pricey sucker.

26:29 Like the Mac Mini you can get now is $700.

26:33 This is $770 a month.

26:37 - Oh, okay.

26:38 - So if what you need is like a couple Mac Minis, you're probably, and you need them on all the time, you're probably better off just buying a few and sticking them in a closet, especially the M1s.

26:48 But if you just need one on demand every now and then, or you need to burst into them or something like that that could be interesting.

26:53 - Yeah, yeah, if you're back old school and you only release like once every three months.

26:57 - Well, there was some conversations like, well, if your data is already stored in S3 and you have like huge quantity of data and what you need to run is actually running like some video processing on the Mac, you could do it by the data instead of transferring that kind of stuff.

27:12 Things like that might be interesting, I don't know.

27:15 I would go ahead and throw out there also that this is all interesting.

27:18 I have links to this kind of stuff and whatnot.

27:21 Like the blog post announcing it and so on.

27:23 But there's also this thing called Mac Stadium.

27:26 And if you look at Mac Stadium, it's pretty interesting.

27:28 You go over there and say, give me a dedicated bare mini, a bare metal Mac mini in their data center, $60 a month.

27:36 So you can actually get like a decent one for a decent price over there.

27:41 So if you just want one running all the time, it might be good.

27:44 But the thing is, if you're already like deeply integrated to AWS, maybe this is a good thing.

27:48 Yeah, yeah, is there anything you?

27:51 Yeah, good.

27:51 I was just going to say this seems pretty interesting.

27:53 I mean, I know one of the reasons that I love using GitHub actions and Azure pipelines is the ability to be able to get access to Mac VM's for builds.

28:04 But if you I could also see this being really interesting and useful if you have like some very huge application or some like very large stack that you want to be able to be able to do CI or tests on that.

28:16 this could be really, really nice, especially if you don't just want to be like, you know, pounding and destroying like one one Mac over and over and over again. This is nice, especially if you have a distributed team. Yeah, which every team is basically a distributed. Yeah, welcome to 2020. One thing that's interesting about this is you can literally press a button or even just through the AWS, probably the Bodo API, you can just make a new Mac instantly.

28:43 Like within seconds, you can have a clean, pre-configured Mac.

28:46 You can create AMIs, the Amazon machine image, which are like, install a bunch of stuff and get it set up and then save it so I can respawn new machines from it.

28:57 Those are pretty interesting options that just having a Mac mini in the closet.

29:00 Push a button, make a brand new one, try this, throw it away, make it a different way, throw it away.

29:05 There are some use cases here that could be interesting.

29:07 That said, I won't be using it.

29:08 I'm just going to buy a Mac mini if I can ever get it.

29:12 All right, Matthew, what's this last one you got for us?

29:14 Yeah, I don't have any clever transition, but all right. So maybe, I don't know about you, but I end up having to deal with a lot of JSON serializations of different statistical models and different, and sometimes also getting CSPs of different data sets that I want to be doing analysis on. And your first instinct might just be to say, "Okay, I'm just going to open this up and pandas and start to get to work on it.

29:41 But if you kind of are used and comfortable to working in the Linux command line kind of ecosystem of data tools, you might be itching a little bit and want to kind of just, you know, peek inside at the command line level and kind of get to work there.

29:55 And so in that case, you might be really interested in this tool called Visadata.

30:00 So Visadata is written on--

30:02 - This is blowing my mind actually.

30:03 - Yeah, it's like when I saw this, my jaw was kind of on the floor.

30:08 So we'll make sure that this is linked in the show notes 'cause it has some really cool videos.

30:12 But so from the docs, so it's Visadata is described as data science without the drudgery.

30:18 So it's an interactive multi-tool for tabular data, combines clarity of spreadsheets with efficiencies of being at the terminal and also the power of Python 3 on a really lightweight utility that can handle millions of rows with ease.

30:33 I can attest to that personally.

30:35 I've opened up like four gigabyte CSV files before, and it just drops right in and starts asynchronously loading like a champ.

30:43 In addition to that, it supports kind of a really astounding number of file formats that it supports.

30:48 Currently on the website, it says it supports 42 different file formats.

30:52 So it supports things that you would expect, like CSV and JSON, but then it also supports things like Jira, I guess like whatever Jira uses for their sort of tabular stuff.

31:05 also can like read my my sequel and I guess it can also even deal with PNG the image file format which I was you know impressed by. So this is all openly developed. The output is a terminal right? Yeah like text. Yeah. Yeah. So this is all openly developed on GitHub by a guy named Sol Pawson I think. And if you go to the if you go to the visit data website. It also has plenty of links to live demos of him doing kind of interactive examples of visualizations. There's one lightning talk that he's given at I think PyCascades 2018 or something like that where he's able to just call up a CSV file of like 311 complaints in New York City and then through using visit data just kind of hone down onto certain boroughs and then be be able to do filter on different complaint types to be able to basically find complaints about rodents and then filter on rat complaints and then plot that inside a visit data still on the terminal to basically make a visualization of like rodent distribution in the New York City boroughs.

32:17 So I thought that was quite amusing and really cool.

32:21 It's also, this is a Python application.

32:24 So you might not wanna continuously install this in every single virtual environment you make.

32:31 So, I mean, it is up on PyPI, so you can just do pip install visidata.

32:35 But since it's an application, you probably might also want it just kind of as a generic tool on your machine.

32:40 So it's distributed through a lot of, you know, nice common package managers.

32:45 So if you're on Linux, they've got it on apt, as well as things like NX and GUX.

32:52 But I didn't see it on yum.

32:54 So if you're on Fedora or CentOS, you might be a little bit out of luck, you may have to do it manually.

32:59 It's of course on Homebrew and even Conda Forge.

33:02 And it's not listed there, but a very, very cool tool that's been featured on the show before, which is PipX by Chad Smith.

33:10 - Yeah, PipX is awesome.

33:11 - It's so good, I love it.

33:12 I tested this last night.

33:14 I just fired up a Python 3.8 Docker container and went ahead and installed PipX and then used PipX to install Visadata and was able to drop right into Visadata as expected.

33:25 So it's very, very cool.

33:28 And just the power that you can have with it, I think is worth checking out for anybody who is doing data analysis with Tableau data.

33:36 - This is super cool.

33:37 I love when people build these tools that are kind of, you don't really expect them to be so powerful.

33:41 And you talked about how you just dropped in and grabbed some random data and started answering questions.

33:45 And that's super neat.

33:46 - Yeah.

33:47 - Yeah, the number of inputs, and because it's an open source and because of all the other examples of data types, I think even if you have a different data type, it shouldn't be too hard to modify this to handle something different.

34:01 I do notice, I'm excited about it, it does have PCAP files for packet capture, these are for communication packets.

34:09 - Talking to all your devices and all your hardware at your company, right?

34:12 - Well, like even the Wi-Fi packets and cellular packets, that's how we debug those.

34:18 So it's very cool.

34:20 - Yeah, very cool.

34:21 And PipX is great.

34:22 I install a bunch of apps like Glances, which is a fantastic, like visualize the state, you know, like top, but way, way better.

34:29 The HCPy, which is great for, it's a better, but much, much better curl.

34:34 But the most important thing I install that way is a PyJoke.

34:38 So now I can type PyJoke on my command line and we're always right there.

34:43 So speaking of which, move on to our extras.

34:47 That's all of our main topics.

34:49 Brian, you got anything this week?

34:50 >> No, I did. I haven't dropped them in.

34:53 Where'd my extras go?

34:54 >> Well, you got it.

34:56 >> I just wanted to bring up that the PyCon 2021 is going to be virtual, and there's a website up, it's us.pycon.org/2021.

35:08 There's not a lot there yet, but you can check out what's going to happen.

35:13 It's not surprising that they have to start planning it, and there may as well plan it as a virtual event.

35:20 Just hoping that we would have live, but I understand.

35:24 >> Yeah. Hikon is my geek holiday.

35:27 I love, it's both work, but it's also just such a nice getaway to connect with everybody.

35:33 You, everyone else we know from the community, listeners.

35:37 I'm going to miss not having it.

35:38 >> Yeah.

35:38 >> Matt, do you attend? Sorry, Brian.

35:41 >> No, it's good that they're, I always check whenever they announce the date to make it doesn't overlap Mother's Day. >> Oh, yeah. That's not good.

35:49 >> Yeah. So I have unfortunately not attended PyCon yet in person or, I mean, well, it was canceled this year. So maybe I'll attend this year remote. But I'm a regular attendee of the SciPy conference, which this -- so this past year, SciPy 2020 was moved online. And I thought that that the organizers did a fantastic job of actually writing it online while still keeping kind of that SciPy community feel.

36:19 So that was helped a lot also by plenty of bad puns.

36:23 So I think that might be something that still comes through for PyCon 2021 maybe.

36:29 - Yeah, absolutely.

36:30 One of the live listeners, Mohammed, said, asked if it's gonna cost money or if it's gonna be free this year to attend.

36:38 Did you notice anything, Brian?

36:40 - I haven't looked.

36:42 I'm looking around and I don't know that it costs anything.

36:45 It's from what I can tell, I don't see any pricing.

36:48 What I saw was sponsor information to get sponsors to sign up to be part of whatever they're doing there.

36:54 But I can't tell.

36:55 - Yeah, not sure.

36:56 - Somebody knows, throw it in the chat or put it into the, you know, visit pythonbystadafm/211 and put it in the comments down there.

37:02 All right, I got a couple here.

37:04 First of all, we're trying out live streaming here and I think it's going pretty well.

37:08 Seems like it's working out.

37:10 There's a bunch of people watching.

37:11 So if you want to get notified and we happen to keep doing this, just visit pythonbytes.fm/youtube.

37:19 And it should have like the scheduled upcoming live stream.

37:21 You can like get notified there.

37:22 So maybe we'll keep doing this.

37:24 It's been fun.

37:25 Thanks for everyone out there who's watching right now.

37:27 And in addition to PyCon, which you just announced or mentioned the announcement of, that is the main way that the PSF is funded, but they're also doing a dedicated offering sort of fundraiser thing with six companies to help raise some money for the PSF and Talk Python Training is being part of that and 50% of the revenue of a certain set of our courses that are sold during the month of December goes directly to the PSF.

37:56 And people who buy those courses through the PSF fundraiser also get like 20% of a discount.

38:01 So there's a link in the show notes for people to take some of our courses and donate to the PSF.

38:06 If you'd rather just directly donate, that's fine.

38:08 But if you're looking to get some of our courses anyway, you can do it this way and support the PSF.

38:13 They're hoping to raise $60,000.

38:14 You know, hopefully we can do that for them and we'll see.

38:17 And Brian, you announced Big PyCon.

38:21 Another thing that got announced is Small PyCon, PyCascades, Cascades being the mountain range that connects Portland, Seattle, and Vancouver.

38:29 And traditionally this conference is cycled between those three cities.

38:32 I don't even remember anymore where it's supposed to be this year.

38:35 I think it's supposed to go back to Vancouver, but it's not going to Vancouver 'cause nobody's going anywhere.

38:39 So, Pi Cascades is online, and those do cost money.

38:42 It's $10 for students, $20 for individuals, and $50 for professionals to support that conference.

38:48 But I'll link to that one since that's one of our local conferences, if you will.

38:52 - Yeah, they're trying to push, they often push what's going on, what kind of try new things.

38:57 So, it's a neat conference.

38:59 - Yeah, yeah, I enjoy my time there as well.

39:01 All right, Matthew, what have you got for us?

39:02 Anything else you wanna give a shout out to?

39:04 - Yeah, just a few items.

39:06 So Advent of Code 2020 has started now.

39:09 It's day two, but there's still plenty of time to get involved with that if you want to.

39:13 And for those of you who might not know, Advent of Code is just an annual kind of coding challenge that takes place every December.

39:21 And it's just basically 25 days of like fun and interesting programming challenges.

39:26 So it's always a great opportunity to try and brush up on your Python and maybe learn about some interesting, you know, collections that you might not have known about in the standard library.

39:37 So that's going on right now, worth checking out, I think.

39:41 And then as I'm going to sneak in some very small physics related followup to Python bytes, episode 205, in which awkward arrays were talked about.

39:52 So the lead developer of awkward arrays is my friend and colleague, Jim Povarsky, who is one of my scikit-hep co-collaborators, as well as also a member of Iris up.

40:04 And as of today, which is recording December 2nd, Awkward V 1.0 is a release candidate is up on PyPI.

40:13 So by the time that this goes live, if you just do pip install awkward, you should get awkward 1.0 releases instead of having to do the--

40:21 - No more awkward one.

40:22 - Exactly, no more awkward one, no more awkward zero.

40:25 All that jazz.

40:26 - So good to have the actual install statement be awkward itself.

40:29 - Exactly.

40:30 So that's a nice little tidbit.

40:32 And I think there's some nice links in episode 205 if people want to learn more about Awkward.

40:38 But that's kind of a backbone of the Pythonic ecosystem for physics right now.

40:44 And then finally, I just want to give some kudos to Python Bytes as well, specifically for making full transcripts of the shows available to view on pythonbytes.fm.

40:54 Not only is this, I think, a cool idea in general, but I think this also makes the show more inclusive to the deaf Python community, which is definitely out there.

41:03 And one of my good friends and coauthors is deaf.

41:07 And I know that he definitely appreciates this.

41:10 So good job Ben, you guys, for being more inclusive of the wider community.

41:14 - Oh, that's so cool.

41:15 I didn't know anybody was utilizing it.

41:17 - Yeah, that's awesome.

41:18 Thank you.

41:19 I think it's absolutely critical for that 'cause the format is only audio, but a lot of folks have reached out and said they also appreciate it if they're English as a second language and they're not as good with English as well.

41:31 So that also helps I think, right?

41:34 They're like, what was I saying again?

41:35 What a weird word.

41:37 Awkward array, why would they talk about that?

41:39 It doesn't make sense.

41:40 - Yeah, transcripts and closed captioning is just more inclusive for everyone.

41:44 So that's awesome.

41:46 - Yeah, thanks.

41:47 All right, well, let's wrap it up with a joke, right?

41:51 - Yeah.

41:52 - All right, so you guys, I'm gonna need your help here.

41:54 I'm gonna let Matthew, I'm gonna let you pick.

41:57 Do you wanna be Windows or Apple?

41:59 - I'll be Windows.

42:00 - All right, Brian, you'd be Apple.

42:02 So the idea is like, the title here is how to fix a computer, any computer.

42:08 So instructions for Windows, go ahead, Matthew.

42:11 - So step one, reboot.

42:12 And then the flowchart goes to, did that fix it?

42:15 If no, proceed to step two.

42:17 Step two, format your hard drive and then reinstall Windows.

42:21 Lose all of your files and quietly leap.

42:24 - Brian, Apple doesn't have that problem.

42:26 There's some totally different solution there.

42:28 - Okay, for Apple, it's step one, take it to an Apple store.

42:31 Did that fix it?

42:32 If no, proceed to step two.

42:34 Step two is buy a new Mac, overdraw your account, and quietly weep.

42:39 (laughing)

42:40 - That's me right now.

42:41 All right, I got the Linux fix, it's so easy.

42:43 It's totally, like, you don't need those things.

42:45 So you learn to code in C++, you recompile the kernel, you build your own microprocessor out of spare silicon you have laying around.

42:53 You recompile the kernel again.

42:55 You switch distros.

42:56 You recompile the kernel again, but this time using a CPU powered by the refactored light from Saturn.

43:03 You grow a giant beard.

43:04 You blame Sun Microsystems.

43:06 You turn your bedroom into a server closet and spend 10 years falling asleep to the sound of worrying fans.

43:10 You switch distros again.

43:12 You abandon all hygiene.

43:13 You write a regular expression that would make any other programmers cry blood.

43:18 You learn to code in Java.

43:19 You recompile again, but this time while wearing your lucky socks.

43:22 Did that fix it?

43:23 No, proceed to step two, revert back to using Windows and Mac, or Mac, quietly weep.

43:28 (laughing)

43:30 There's really no good outcome here.

43:32 They all end in quietly weep.

43:33 - As a Linux user for the better part of a decade, I can neither confirm nor deny how accurate that last part is.

43:39 (laughing)

43:42 - Yeah, they all have their own special angle.

43:44 It just takes longer to get there with Linux to get to your destination, I guess.

43:49 - Yeah.

43:50 - All right, well, that's fun as always, And everyone watching on YouTube, thanks for being here live and everyone listening, just thank you for listening.

43:56 Matthew, thanks for joining us.

43:57 - Hey, thanks so much for having me.

43:58 This was really fun.

44:00 - Yeah, yeah.

44:00 Great for the items you brought.

44:02 Enjoy them.

44:03 And Brian, thanks as always, man.

44:04 - Thank you.

44:05 It's been fun. - Yep, yep.

44:06 See ya. - Bye.

44:07 - Thank you for listening to Python Bytes.

44:09 Follow the show on Twitter via @PythonBytes.

44:11 That's Python Bytes as in B-Y-T-E-S.

44:15 And get the full show notes at pythonbytes.fm.

44:18 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

44:22 We're always on the lookout for sharing something cool.

44:25 On behalf of myself and Brian Auchin, this is Michael Kennedy.

44:28 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page