Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #261: Please re-enable spacebar heating

Return to episode page view on github
Recorded on Thursday, Dec 2, 2021.

00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds. This is episode 261 recorded December 2nd 2021 and I am Brian Okken. I'm Michael Kennedy. I'm Shel Gentheman Welcome Shel. Could you let us know a little bit about yourself? Yeah, I'm Research oceanographer. So I studied the sea from space and I've been doing Oceanographic research for NASA for a little over 20 years. I do almost everything using satellite data So I never have to leave the comfort of my used to be office, but now office at home.

00:35 That sounds so fascinating.

00:37 Is it fun?

00:38 Super fun.

00:39 Cool.

00:40 It's like math and physics and computers all mushed together.

00:43 It's like all my favorite things.

00:45 And oceans.

00:46 Yeah, it's fantastic.

00:47 And oceans.

00:48 Yeah.

00:49 So that sounds like such a cool job.

00:51 Welcome to the show.

00:52 Well, Michael, what do you got for us to start?

00:54 Well, let's talk about our clone.

00:58 This one was sent in to us by Mark Pender.

01:01 Now, Rclone itself, I believe it's written in Rust or something.

01:06 It's not Python.

01:07 So the story here is not, oh, here's a cool thing created with Python, but it is a cool library that I think will be useful for Python developers.

01:16 Okay, so this Rclone thing syncs your files to cloud storage.

01:21 Let me basically see if I can summarize it.

01:23 So imagine you wanted to put some files in AWS S3 or you wanted to store something in Azure Blob Storage, or there's actually 40 different places where this can go.

01:34 So like Backblaze, Backup, Box, Citrix, ShareFile, Dropbox, Google Drive, let's see, some stuff with OpenStack, KeyCloud, all these different places and formats, even just WebDev and whatnot.

01:50 So if you want to either read or write files to that location, what you can do with RClone here is it will basically mount those different locations as just something on your hard drive, right?

02:02 So if you wanna write to S3, you can just write to a file, like a with open slash S3 slash wherever it goes, and then write to it with Python or set up some kind of cron job that moves stuff.

02:13 So if you're trying to move like large data for data analysis up to the cloud, so then you can connect it to a notebook or you're trying to move files that are the backend of your website or your API through S3 or somewhere, then you can just copy files over, sync different locations, like I said, mount it as a drive.

02:34 And it has a lot of cool support for things like if the file transfer gets interrupted, it'll fall back to the last one that was working and then continue uploading.

02:44 So it can be kind of interrupted and unstable and whatnot.

02:47 - This is so cool.

02:49 This is like, when I first moved to the cloud, it was so frustrating having to figure out whether I was using S3 or the, you know, the Google commands or the Amazon commands.

03:00 And all I wanted to do, get my data to where I could use it on top of.

03:05 - So with you, and sometimes it's like, well, how do I copy files here?

03:08 Well, here's our API.

03:09 Like, I don't want an API.

03:10 I want to go to the Finder or to the Windows Explorer and draggy droppy the file.

03:14 Can I do that?

03:15 They're like, no, no, no, you can't do that.

03:17 No way.

03:17 You can run our app maybe.

03:19 Yeah, so this is.

03:20 - This is so cool.

03:22 - Yeah, I'm glad you like it.

03:23 I think it'll allow people to move data around from, especially it seems relevant to scientists who need to put a bunch of data in the cloud and run it, but then they might be want that data locally and keep it in sync and stuff.

03:35 - And it's really frustrating when your expertise is in something else.

03:38 It's not a computer science.

03:40 And like everything I pick up is because I'm only forced to learn it.

03:43 And I don't want to learn the Amazon API and I don't want to learn the Google API.

03:47 This like gives me maybe one tool that I can just be cloud agnostic and move my data around around in a way that I'm already comfortable with.

03:54 Yeah, I agree. Yeah, here's the thing I was looking at.

03:57 Yeah, so the virtual backends wrap local and cloud file systems and apply encryption, compression, chunking, hashing, and joining.

04:04 And it looks after your data, preserves the timestamps, verifies checksums all the time, transferred over limited bandwidth, intermittent connections.

04:14 It can be restarted, checks the integrity of your files, all those kind of things.

04:18 So, you know, like if you're out, I know you don't leave the house anymore, If you're out doing research and like on a boat and you needed to like, you had this rickety connection, you know, maybe you could get stuff uploaded well this way.

04:30 So I think it's neat.

04:31 - How do you like configure it?

04:33 You have to put in all your cloud stuff?

04:35 - Yeah, I suspect you, when you set it up, you have to give it like, let's see, it's for your Amazon, that's Amazon Drive.

04:41 I forgot that that existed, okay.

04:43 Let's see.

04:48 Yeah, you've got to give it like your AWS keys and stuff of course.

04:52 But yeah, they have a whole configuration section on what you give it here to set it up.

04:56 It looks like you create a config file for it, I think.

04:58 But yeah, pretty neat.

05:00 Brian, what do you think?

05:02 - Well, so I'm trying to figure out, like even for something, for a mental model, is this like a Dropbox without version control or is it a completely different space?

05:13 - Well, I mean, it does have some tie-ins to there, right?

05:16 It's got like Backblaze and things like that, which is just a pure backup system.

05:22 I think it's just trying to match, like how do I move files around to the cloud?

05:26 And you can also, you can move it between the cloud, right?

05:29 You can mount two places and copy from one to the other.

05:31 Like I can copy from Citrix share file over to Box, neither of which I really know how to do.

05:37 - Oh, it even has Dropbox as one of the configs.

05:39 So, but different, this is actually pretty cool.

05:43 I like it.

05:44 - Yeah, very cool.

05:46 Let's see, Kim out in the live audience says, I like this very few people really need to know or care that S3 doesn't have real files and directories, for example.

05:54 And Sam says, it's funny, my group was just talking about how to transfer a huge amount of training data to our compute resources earlier today.

06:02 I'm guessing that's machine learning training. Very cool.

06:04 When you still have to go to Amazon or Google and set up the bucket, right?

06:09 So you're not spared that particular pain.

06:12 Just like try to click public until it's public, but not too public.

06:16 That's my approach.

06:18 (laughing)

06:19 You still have to do that, but this seems like a really nice solution.

06:23 - Yeah, for sure it does.

06:25 I guess, over to you, Brian.

06:27 - Yeah, so this has been suggested several times by several listeners, so thank you everyone that sent this in.

06:34 Oh, I'm on the wrong thing, aren't I?

06:37 I wanted to talk about Check Wheels.

06:39 So Check Wheels is a, or Check Wheel contents.

06:44 So the idea around it is that there's, So I'm often using flit and it kind of does all this for me.

06:51 But there's other backends that you can use for building wheels.

06:55 And if you configure something wrong, it might get the wrong stuff in there.

07:03 So by wrong stuff, you might have like a PI cache in there or you might deliver your tests with your wheels and that's just extra space.

07:12 You don't necessarily need that.

07:14 Maybe your documentation doesn't, should be there, but maybe it shouldn't be depending on that.

07:18 I don't think that actually, I went out of tangent with the documentation.

07:21 I don't think this checks for that.

07:23 It's just a pip installable tool, and then you can run check wheel contents.

07:28 You can give it a wheel, but wheels are often long.

07:31 When I've been trying it out, I've been just giving it my dist directory and it just looks all the wheels in there and checks things.

07:38 What does it check for though?

07:39 It's checking for things like making sure that you don't have any PYC or PYO files in there because you shouldn't have those in your wheels.

07:47 Checks for duplicate files, 'cause maybe you've got, I don't know, copies of your directories or something.

07:52 And there's actually, I don't know, 10, 12, 13, 14, 15 checks or something like that.

07:58 I'm counting really quickly.

08:00 But there's, what I really love about, one of the things I like about this is there's a lot of things that you, like if you configured it totally wrong and your wheel's empty, it'll check for things like that.

08:11 And yeah, you probably could test this and try it, but it'd be nice to actually have something in your pipeline to automatically check for these things.

08:20 And it's really fast.

08:21 The other thing I like is the read me for this project lists of has a very good description of all the checks and why something like that could go wrong.

08:32 So if for instance, you happen to have your tests in there but you don't want them in there, well, how do you fix that?

08:40 Or it also says, if you actually do want your tests in there how to go about putting it in there so the check passes.

08:46 So, interesting project.

08:48 - Yeah, this looks really neat.

08:49 I think if you're going to be creating a package, you definitely don't want to be releasing things that are not intended to be in there.

08:58 I was looking through it.

08:59 I wonder if it's possible to say, check for certain files, make sure that they don't get in there.

09:05 Like I'm thinking like a settings file that has some sort of key, like an AWS key like we were talking about or something.

09:12 - Can you, so I don't make lots of packages.

09:16 So what's a wheel?

09:17 When you're using that term, like what does that mean?

09:20 - It's the thing that you pip install.

09:23 It's the, like they used to be just tarballs.

09:26 They used to be tar nut cheesies and whatever.

09:29 But what we do now for the most part is, or hopefully, is wheels are not just, it's for, if it's just pure Python, it'll be the same for everything and hopefully it will be.

09:42 but it can also specify that it runs on Python two or three and that some of those sorts of things can be built into the name and what operating system because if you're building on like, say, just simplifying the world, a couple versions of Unix and Linux and maybe Windows and Mac, and then also the new Mac with a different architecture, those will all be different wheels.

10:11 But when you, so when you pip install it, PyPI and pip will download the correct wheel for your operating system.

10:18 And that makes it so that when you're installing something, none of, you don't have to compile anything.

10:23 It just brings it all down.

10:25 So it's a cool format.

10:27 - Yeah, it's especially important for the scientific community because there's so many weird libraries that have to get compiled with things like Fortran as we were joking about.

10:36 And so wheels will basically contain the pre-compiled versions.

10:39 so you don't have to have like a Fortran compiler on your machine to pip install it or whatever.

10:44 It just downloads and unzips really quickly without all that steps.

10:47 - I was told a simple mental model of the difference of old and new is the old style with setup tools and stuff would often have a whole bunch of stuff that you download and then you run setup to like build some things and redo things.

11:03 Whereas a wheel is closer to mostly just a zip file that just unpacks things and throws it in your packages.

11:11 - Nice.

11:12 - And Sam also adds, you can also package extension modules in wheels, which is their greatest strength.

11:18 Very cool.

11:19 - Cool.

11:20 - All right, Brian, is that it for the check wheel contents?

11:23 - Yeah, I'm done there.

11:24 - Right on.

11:25 All right, Shell, take it away.

11:26 - All right, so I thought we would talk a little bit about weather and climate data in Python.

11:34 And we're really trying to get more Python programmers in weather and climate research.

11:39 And the data, I think, it used to be really hard to get weather and climate data.

11:44 It was in these really weird, obscure formats that only scientists knew how to read and they only wrote Fortran routines to read them.

11:52 But now with Python, it's becoming really, really easy to get these data.

11:56 So the first thing is like, where do you get the data?

11:58 So I'm just gonna show the open data at Amazon, at AWS, but really, Google has the equivalent in the Earth Engine And Google has all sorts of open data sets.

12:08 And that means that they're free egress.

12:11 So most of these you can get, you know, you can access data for free.

12:14 And Microsoft has the planetary computer and they're building up the same thing.

12:18 And like, you can see lots of people are putting data on here.

12:22 Like NASA has a space act agreement.

12:24 There's the NOAA, which is our weather agency, the big data program.

12:27 And so like you can look for data.

12:31 And one of the biggest data sets that I work with is ERA-5.

12:36 And if you just sort of type in here and it brings up the data set and you can click on that and see they have it in these two different formats.

12:43 So one is ZAR and one is NetCDF.

12:46 And most people in sort of data science work with, you know, SQL databases or maybe they're doing CVS files or tabular data.

12:56 So weather and climate data is a little different because it's three dimensional.

13:00 And so there's these different data formats, and really almost all of the weather and climate data now is currently in this net CDF format.

13:07 The goal is, let's just write a Python library and make it so you don't care about the format, right?

13:14 The data formats, the people who produce the data should care about it, but as a user, what we want is we want anybody to be able to use it and do anything they can think of.

13:22 And so that's the sort of X-Array.

13:24 So X-Array is a Python library that is designed for sort of three-dimensional structured data.

13:32 And all the data has labels and it has these things called data sets so that it organizes your data for you.

13:39 And to read it, you just sort of say open data set.

13:42 Nice.

13:43 And it understands these formats?

13:44 Yeah.

13:45 And like, I'm going to bring up a little example here, but this ERA-5, I mean, this is like, I think it's 35 terabytes of data.

13:54 So I took this off of the AWS.

13:57 why did it take it off?

13:58 I ran it on AWS and I, I sub-sampled it.

14:01 Where are you going to put it?

14:01 Right.

14:02 Like, I mean, it used to be that like to get this data set, you had to write a script and then you would download it for like three months and now it's just on AWS, which is like mind blowing, right?

14:14 Like I log on and a few minutes later, I'm actually have access to all this data, which is so cool.

14:20 So like with X-ray, I'm going to run this cell and basically I just import X-ray as XR.

14:26 To read the data, I just say like open data set.

14:29 That's it. And it figures it out.

14:31 And it'll read almost, it'll read a lot of different formats.

14:36 And then it just has your data.

14:38 And so this is like a really big data set and it tells you all about it.

14:42 And you can look at the different data that it has.

14:46 And, you know, sort of the goal with this is to make it really, really easy for anybody.

14:51 Like, let's say you want to look at, you know, sales patterns in San Francisco, or you want to work at ship traffic, or you want to look at how weather is evolving at your location.

15:01 Like you don't need to know about the data anymore.

15:03 Yeah.

15:04 Fantastic.

15:04 Just, just know how to work with NumPy like, stuff in your notebook and that's all you got to know.

15:11 Yeah.

15:11 Yeah.

15:12 It's all built around pandas and NumPy.

15:15 And like if you want to like let me find a really easy example.

15:22 Like what if I want to plot the data set.

15:24 You know I just type dot plot right.

15:27 Oh wow.

15:27 And then it like labels everything and you understand what you're looking at and what day it is.

15:32 And you can use cell and I cell and just sort of like pandas.

15:36 It almost looks like an ocean right there.

15:38 It's latitude longitude and then I guess temperature right.

15:42 Yeah, and so this is like you just typed plot and it actually tells you exactly what you're doing and what it's plotting and what the color bar. So what are these different colors mean? And you know, you could do a spatial plot like this where you do it in time.

15:59 or let's just pick a particular latitude and longitude.

16:03 The nice thing is that you can actually just tell it your latitude and longitude, and you can use Google Map to look up your latitude and longitude and then plot it and it says, "Oh, I'll make a time series." >> That's pretty cool.

16:14 >> Wow. I remember just struggling so much getting into programming and having to work with custom file formats.

16:21 Out of research projects, you're like, "What do you mean I have to read this binary file?

16:25 This is going to be so hard. Okay, here we go." - Yeah, and then like you wanted to read a different binary file, like start from scratch, write all that code again.

16:35 And like X-Ray sort of took all of the backend work that all the people at the data archives did with like getting everything in the same format and labeling all the data nicely.

16:44 It sort of took all that work and just said, well, we'll write one library that builds on all of that and can read anything.

16:50 - Yeah, awesome, great recommendation.

16:52 A couple of pieces of real-time follow-up.

16:54 Sam Morley out in the live stream says, "X-ray is great.

16:58 I did an example of using it to open a net CDF file in my book and I'm learning about his book, Applying Math with Python Practical Recipes for Solving Computational Math Problems Using Python Programming and its Libraries." That's awesome.

17:11 - That looks like fun, actually.

17:13 - Yeah, it does.

17:14 - Yeah, and X-ray linked to like SciPy and it has a lot of statistics and math built into it.

17:20 So you can actually compute trends in one line and all of that.

17:23 - Yeah, nice.

17:25 Also, I have one other piece of follow-up here, Brian.

17:27 I don't wanna panic you all, but right here in Portland, we have Panic, the software company.

17:33 And I just wanna give a quick shout-out to this thing called Transmit here.

17:37 This is what I actually use to get stuff up into and out of S3.

17:41 And it also will let you talk to Backblaze, Box, Dropbox, Azure, Google Drive, all these places as well.

17:48 And it's basically like an old-school FTP program where like on one half it has your computer and the other half it has whatever cloud storage is that you're working with there.

17:58 And maybe you could even put the other half not just your computer, but somewhere else as well.

18:01 So if you want just like a UI, not something like R clone, but just a UI, I'd strongly recommend this thing.

18:07 They don't sponsor the show or anything, but I definitely love it.

18:10 I use it all the time.

18:11 - Neat. - Neat.

18:12 All right.

18:13 Am I up next actually?

18:15 I guess I am. - I think so.

18:16 - Yeah, I think so.

18:17 I am, I am.

18:18 Number four would be, I want to talk about this announcement from JetBrains being one of the bigger full companies, tool builders for the Python world.

18:28 It came up with this thing called JetBrains remote development and buried at the end of this is actually what I think is the lead got quite buried here, but we'll see.

18:36 So they introduced something that I was not aware of called remote development.

18:42 So the whole idea of this is basically what if instead of running like PyCharm don't take for this works for any of the IntelliJ stuff, but let's say PyCharm instead of running PyCharm locally on your machine, you could just give it an SSH destination, let's say, and it will go over there and run PyCharm the server or the sort of logic bits over there, but just have a light front end to your computer here.

19:08 So like a lightweight, if you're on some really wimpy laptop and you wanted to access like a better server at work or in the cloud, or in like Shell's example, near some massive dataset instead of far away from some massive dataset so you could just directly talk to it and so on.

19:26 So yeah, it's super cool.

19:27 You just basically give it some SSH thing.

19:31 They also say it's good for things like if your laptop gets stolen, what data goes with it?

19:36 You know, if you just keep the data somewhere else, right?

19:39 then like just revoke the SSH key and nothing's bad.

19:43 You can also set it up so that it'll create pre-configured environments.

19:47 Like when you connect to it, it'll automatically give you something with like, let's say, Conda set up and all the right libraries pre-installed in that one weird C thing.

19:57 You got to apt install to make sure it works.

19:59 Like it starts with that just all configured from different things.

20:02 So anyway, that seems all pretty cool to me.

20:04 I thought it was pretty neat.

20:05 - That does look neat.

20:06 I think it's free if you set up your own server, but then I think it costs money if they provide you the server, right?

20:13 So kind of just like firing up a VM for you on your behalf.

20:16 All right, you ready for the buried lead?

20:17 Scroll, scroll.

20:18 So here you can see as an example, just like connect over SSH, or you can go to JetBrains space and they'll create one for you, right?

20:25 But here's the buried lead.

20:27 They announced this thing called JetBrains Fleet, which is, as far as I can tell, unrelated.

20:31 I think it'll connect one of these things, but is another thing.

20:35 So if you click down at the bottom, or is there something about learn more?

20:38 And if you go to this, it is a complete rewrite of the whole IDE story over at JetBrains.

20:45 And basically think VS Code, but from JetBrains.

20:48 - Yeah, I'm interested in watching this.

20:51 I just heard about this last week.

20:53 And they're doing it an invite only sort of a, not invite only, but you have to like-

20:58 - Early access get approved sort of thing.

21:00 - Yeah, get approved sort of thing.

21:01 They're trying to limit, basically limit the feedback so that they can deal with the feedback.

21:07 - Yeah, so it's like super fast to open.

21:09 It doesn't have a project structure in the same sense that like PyCharm or IntelliJ would.

21:14 It just opens files and it doesn't even have the IDE features unless you click this little like make it smarter button and then it'll like fire up all the high end stuff that takes five seconds to start.

21:26 The other thing that's cool is you can see on the screen right here is there's like three people typing all at the same time.

21:30 Actually, no, there's five people typing.

21:32 So it's like Google Docs where you can all like collaborate on it in parallel, like right within it.

21:38 So I think those are all super neat developments in the whole editor space, which we all write a lot of code and kind of deal with these tools.

21:46 - Editor as a service is something that is happening and it is a hard thing for me to wrap my head around because my brain thinks I want all my editor stuff locally, but there's a lot of times where you don't.

21:58 - Yeah.

21:58 - You just like the group Cody.

22:00 - Yeah, I know.

22:00 I think that's really neat as well.

22:02 I think that would be really valuable to some people on teams instead of, you know, we've all been in those screen share meetings.

22:08 Like, no, could you go over there?

22:09 Could you type this?

22:10 No, no, no, no, not after that, inside the parentheses.

22:12 Like, please, no.

22:13 - That's exactly what's going to happen.

22:17 No, no, no, to the left.

22:18 No, a little more to the left.

22:20 - Exactly.

22:21 And so I think this is great.

22:21 - Wait, not a friend.

22:22 (laughing)

22:25 - Yeah, let's see.

22:27 Bunch of people out there really like this.

22:30 RJL and Sam and so on.

22:32 But Kim has an interesting comment.

22:34 We've come full circle-ish back to talking to the one mighty mainframe over a lightweight terminal circa 1985 or, you know, for me, like '95 and like X11, X Windows.

22:46 Like, is your X Windows set up so you can talk to the server?

22:49 Yeah.

22:50 - Yep, I was thinking the same thing.

22:52 - Yeah, definitely.

22:53 - But these are interesting ideas.

22:55 You know, for me personally, I love to use PyCharm for working on projects.

22:59 But if I've got just a JSON file or even a Python file and I just want to look at the file, I probably won't open it in PyCharm 'cause it's gonna create all this project goo that's gonna be stuck in that folder and it's gonna expect, gonna complain, there's no interpreter.

23:12 I just want to look at it, you know?

23:12 And so tools like this, I think are gonna be really neat.

23:15 - Yeah.

23:16 - Yeah, and Brandon's support, suggesting something crazy out there like mobs might run in and no, mob programming where you're like working as a group.

23:24 I think it's fun.

23:25 - Yeah, and I'll be, we should play with this though.

23:29 I think it'd be fun to see what all the interactions feel like and stuff.

23:34 I totally agree.

23:35 All right, over to you.

23:36 I'm trying to remember how I came across the XY problem.

23:42 I was doing some research last week and I think I was down some rabbit hole of link, follow link, follow link sort of thing.

23:50 I ran across this problem, the XY problem, and probably everybody else knows about this already, but the concept was new to me.

24:01 - I don't know the XY problem.

24:02 - Okay.

24:03 - And I studied math, come on.

24:04 (laughing)

24:05 - Well, so it isn't really that mathy, but so the XY problem is essentially, you're trying to solve problem X, and you think of a solution Y that would help work to solve that.

24:23 and you get down to trying to solve all the details of Y and you get stuck.

24:28 So you ask about Y, what you're really trying to do is X.

24:32 And that's sort of nebulous.

24:33 An example kind of highlights it.

24:35 So, and we've got this example in the show notes that I pulled out of one of the links, is how do I, if somebody asks, how do I get the last three characters of a file name?

24:46 And somebody says, oh, you just like do, and this is a shell command.

24:50 You just do like, if it's in the variable foo, you just do dollar curly bracket foo, and then do a colon and then negative three, describes the last three characters.

25:00 But also, why do you want the last three characters?

25:04 Is it because you are trying to do, trying to pull off the extension?

25:09 Somebody goes, yeah, that's what I'm trying to do.

25:10 And I'm like, oh, well, then you don't want the last three characters 'cause it might be a two character or a four character extension.

25:16 So teach them how to do the real problem.

25:18 And in one of the, I'm gonna link to a couple a couple of like forum answers and stuff in there because I think it's interesting to it's there's a lot of verbiage around the XY problem that sort of blames the asker for asking a stupid question.

25:35 And I think it's important to not do that because we do this all the time.

25:40 We break problems in software.

25:41 We break problems down.

25:43 If I want to do A, then I need to do B and C.

25:46 But to do B, I got to do D and E.

25:48 And then, and then also F and G.

25:50 And then way down into the rabbit hole, I get to get into the X and Y problem.

25:56 But how far back do you back up to give enough context to somebody else?

26:01 So it's hard to avoid, you'll run into it.

26:05 And then I really like, there was one forum that had some great advice, both on asking questions and on answering questions.

26:12 So when asking questions, state the problem that you're trying to solve, but also state the higher level thing that you're trying to achieve, if appropriate.

26:22 And then also how that fits into the wider design.

26:25 And then it also brought up, if you've thought of other solutions that you've eliminated for some reason or another, go ahead and list those because somebody might give you one of those as an answer and you've already eliminated that.

26:41 So give the reason why.

26:43 And then I think what's most important is giving answers to what XY problems or giving answers to problems.

26:49 Because although I think everyone that's on this podcast and also listening is probably an expert in some fields and a novice in other fields.

26:59 So we're gonna be on both sides of the fence.

27:01 So when answering questions and you think, oh, somebody is just trying to get the extension.

27:06 I'll just tell them how to do that.

27:08 That's not necessarily helpful.

27:10 So there's a great three-part thing to do.

27:13 and our example follows those, is go ahead and answer the question directly, but also ask some questions about the problem.

27:22 Say, just curious, why are you trying to do this?

27:25 Is it because you're trying to do this other thing?

27:28 If so, the thing I just told you might not be appropriate.

27:31 And then once you've figured out really what the real problem is, then you can help and give the final answer.

27:39 So it isn't helpful to just say, oh, you're probably getting the extension.

27:43 go ahead and just do that.

27:45 Anyway, I thought this was an interesting thought process around answering and asking questions.

27:51 - Yeah, absolutely.

27:52 It seems to be very relevant to Stack Overflow type places.

27:55 'Cause you're gonna post, you're looking for help.

27:57 You say, I'm trying to do this, but a lot of times people will, and it'll give you very specific answers.

28:02 And the answer could be, well, why don't you just do this library that already understands that format?

28:07 Like Shell mentioned earlier, like why don't use X-ray instead of trying to understand how to parse this thing, just use that.

28:14 Oh, well, that's way better, thank you.

28:16 - I see that a lot on Stack Overflow, that exact.

28:20 It reminds me also of my, like when I went to school and you're trying to ask a question to your professor or to get help on anything, right?

28:27 You're like, this is my problem.

28:28 They're like, what really is your problem?

28:31 Please tell me about it.

28:32 And like, that's what you're asking, right?

28:34 Like, tell me what the actual problem is.

28:37 And if you can do that clearly, you're gonna get a much better answer.

28:41 - Yeah, absolutely.

28:42 - And a lot of people just don't, I mean, it's also just a different perspective thing.

28:47 They know that they have the toolbox of things they know how to solve and ways they've solved them.

28:52 And if a new problem, and this is a related thing, is people sometimes don't even think that there's a really simple solution out there.

29:00 Like, oh, that tool you're using, it already has a flag that does exactly what you want, but you didn't know the flag was there, so.

29:07 - It took me, when I started learning Python, And I was so used to Fortran 77 where there was never any help.

29:13 They just don't even try.

29:16 That when I started learning Python, it took three or four months before I finally just said anything I want to do, someone has done better.

29:24 Yes.

29:25 And they are out there.

29:26 I just have to find out how to ask the question correctly to find them.

29:30 Because it's true.

29:31 Like everyone is working.

29:32 You know, most people have tried to solve the same problem.

29:35 There's someone out there who's worked on the same problem in all likelihood.

29:38 - Yeah, there's so many libraries with pip or Conda that you can, if you knew it existed, it would do the thing you want.

29:45 - Now we knew it existed.

29:46 - Exactly. - Yeah, exactly.

29:47 - All right.

29:49 - Okay, so I guess I'm, am I next?

29:52 - You are next.

29:53 - Okay, so what I wanted to show this library that is called Kerchunk.

29:59 - It's a great name.

30:01 - Yeah.

30:02 - Brand new, so can you see my snail screen?

30:04 - Yeah, yeah, we see the snail.

30:05 So we had this problem where like as NOAA and NASA, everyone's starting to throw all these net CDF files or all these different files onto the cloud.

30:16 And then it turned out that access in S3 was really, really slow.

30:21 And so people got really frustrated 'cause like the cloud's supposed to be fast, right?

30:26 This is gonna transform science.

30:27 We're gonna do it better now.

30:29 - That's the promise, yeah.

30:30 - That's the promise.

30:32 But the grass isn't always greener.

30:35 So this is this library that I think has really maybe some broad applications.

30:40 It's being developed right now.

30:42 And the idea behind it is like we have all these data formats that we're sort of stuck with.

30:48 There's lots of data, but sometimes it's slow on S3.

30:51 So is there a way that we can fix this?

30:53 And the idea is that you create a reference file system.

30:58 And so you do this by going to each of your files and just taking the data that you need for that file, like just the metadata, so like what size is it, what its dimensions and coordinates are, what variables does it contain?

31:11 So you just take those little bits and you pull them out into a JSON file.

31:15 And so then you have this reference file that just contains the important information, but it's really small.

31:21 And so that makes it faster to access.

31:23 And then you construct this JSON file and I have some benchmark tests in here, it, then you construct a mega JSON file and you basically virtually aggregate all of your data so that in one call, again, you can just get access to everything.

31:42 And because you might not need actually the data, you might need to know, well, what timeframe is this?

31:48 So do I need to read in that file or not?

31:50 Right.

31:51 Yeah.

31:52 And in some ways, because you're doing a lot of what one of the things with X-Ray back to that other library is it does the lazy loading.

32:00 So like this is a 16 terabyte data set that I'm loading here, but I'm just loading the data about the file.

32:06 I'm not actually loading any data until I need to touch it.

32:09 And so I can load this giant data set in a little bit over, you know, less than two minutes by doing this virtual aggregation with Kripchunk.

32:18 And so all it's doing is it's reading these aggregated JSON files.

32:23 And right now it works for three or four different types of data sets.

32:27 So if you have big collections of data that are going on to S3, they have lots of different little files.

32:35 This is a way to sort of virtually aggregate them into one big data set that you can then subset.

32:41 Oh, that's really cool.

32:43 It seems like this is one of those that comes as part of the FS-spec project, which we talked about pretty recently as well.

32:50 Yeah.

32:51 - So this is part of FS spec and it's Kirchhank, it was just released and it's a unified way to represent compressed data formats and it creates this virtual dataset.

33:02 So that's where it's located.

33:04 - Yeah, super cool.

33:06 See, Kim has a question.

33:07 Do you keep the individual JSON files with the data?

33:10 - You can, so the nice thing about this, the data can be anywhere.

33:14 And again, this is the idea to make data invisible and easy to access so that you don't have to care what format it's in or where it's at.

33:21 You can, as long as they make the little, you can either create them yourself and just keep the little JSON files public.

33:28 And then you just make the one aggregated JSON file public.

33:32 And then anybody could actually use that JSON file to access the data this way.

33:37 - Yeah, fantastic.

33:38 This looks really helpful for working with large data.

33:41 Yeah. - Yeah, I think it's cool.

33:42 - Yeah, it looks awesome.

33:43 All right, Brian, does that bring us to the extras?

33:46 - Yeah, I guess it does. - I guess it does.

33:48 How many extras you got today?

33:50 I just have one entertaining extra, I thought.

33:53 As some people have amusingly noticed, I am attempting to grow my hair out.

33:59 And I went to Florida last week, and it's very humid in Florida, and I looked like a cotton swab.

34:07 It just like poofed.

34:09 Anyway, it was amusing to me.

34:11 - You should have sent us some pictures or something.

34:14 - Yeah.

34:15 - Maybe those are the pictures you don't really want out there, but yeah.

34:17 - Yeah, so I wish I could have seen like, 'cause I was at Disney World and we were doing like rides and stuff.

34:22 And I really wish I could have seen like the flowing hair on the roller coaster or something like that.

34:30 - Perfect.

34:31 - How about you? - I love the hair, nice.

34:33 Let's see what's got, Shell's got first.

34:36 - Okay.

34:37 - So what are extras?

34:37 Just something that we did last week?

34:39 - Well, just whatever you want.

34:40 Also just give a shout out to while we're here before we call it.

34:43 - I think I'm pretty good.

34:45 I'm really excited.

34:46 like NASA starting a big transformation to open science, which is exciting.

34:51 They started a new, they announced just last month, a new $40 million initiative to try and help scientists move to open practices.

34:59 And Python's a big part of that.

35:01 'Cause a lot of this was the open community that Python helped develop over the last decade and all of the tools that now is making, it's not just science easier, it's making it easier for more people to participate in science.

35:14 I think there's a lot of synergies and similarities between the scientific goal of spreading knowledge and publishing your work and so on and open source.

35:23 Yeah, because it used to be like scientists, like you would share your knowledge, right?

35:27 You'd publish paper.

35:28 Yeah.

35:29 And that was it.

35:30 And if you like, that's what graduate, like I remember in graduate school, you would go through and they'd be like, okay, derive the equations in this paper because they wouldn't show you all the steps.

35:39 And you would do that.

35:40 And then if you wanted to code it up, you would just open up a new window and start coding.

35:45 And now, you know, people are starting to publish their code so that you can actually reproduce their results and then build on them and move faster.

35:53 - The whole reproducible science thing as well.

35:56 Fantastic.

35:57 - Yeah.

35:58 - Yeah, awesome.

35:59 Sam in the audience says, "Yes, more open reproducible science is great for everyone." - Yeah.

36:04 - All right, I got some extras as well, as you can imagine.

36:07 - Surprise.

36:09 I don't remember when I was going on, maybe this was actually in talk Python, but I was going on and on that Visual Basic 6, just the, I wanna drag a few things on the screen and write a little bit of code, made it so easier for people to build apps.

36:21 Robert Livingston out there said, you know what, Kojo, Kojo, X-O-J-O or Zojo, I don't know, is this replacement thing.

36:30 So if you're trying to build some desktop apps and you wanna do a bunch of draggy droppy stuff, boy, if it worked with Python or somebody could build a Python integrated thing behind those events there.

36:40 I would love to try to work on some integration between those things, but currently no.

36:45 There's a little demo where in like six minutes, seven minutes, they build a web browser, which is kind of neat.

36:50 So very visual basic feeling.

36:53 - So is it Python?

36:54 It's not Python.

36:55 - No, it's not Python.

36:56 It's more VB6 feeling.

36:58 I don't know if it's actually VB6, which is even worse.

37:00 It's sort of, kind of, but not exactly.

37:02 I just did a webcast, "10 Reasons You Love PyCharm," even more in 2021 with JetBrains and Paul Everett, we just did five reasons.

37:11 So I'll link to that, people care about that.

37:13 And then who doesn't love a little good tech shock and awe and being, I don't know, outrage, I guess is the word I'm looking for.

37:23 So Microsoft Edge is this browser that's sort of Chrome based, and they just announced like a Linux version and it runs on macOS, which all these things surprised me.

37:33 And there was getting a lot of traction and there's this whole thing where Microsoft, the team at Edge just added a like a buy now pay later thing built into the browser from some third party company not as an extension but like integrated into the browser that you can't not get when you go shopping it says would you like to use this like for payment program.

37:57 It's almost like adding like payday loans like baked into the browser.

38:01 It's insane.

38:02 It's terrible.

38:03 I know it's such a bad idea.

38:05 So there's an Ars Technica article that says, "Users revolt as Microsoft bolts on short-term financing App into Edge as like 30% borrowing." And one of the quotes is, "This all feels extremely unnecessary for a browsing experience." And the comments are, you go to the comments, they are really, there's 256 comments, which is an awesome number of comments for the moment.

38:29 But there's just almost nothing but like, why is it?

38:34 This is unbelievable to me.

38:35 I can't believe this is so, it just makes it feel so shady and trashy, right?

38:40 Like the next thing you're gonna do is get like bail bonds offerings inside your browser if you get, it's your browser, just weird stuff.

38:47 So anyway, I thought people might enjoy just reading through this and taking a little bit of that in.

38:53 - It must work, right?

38:55 'Cause we all have this experience where you, I mean, this has been going on for 20 years, like with their browser, where we're used to install all this stuff on your machine They have to delete it all.

39:06 And then that was ruled illegal.

39:07 So they had to take it, they had to separate them out.

39:10 And they just keep finding ways to get back in.

39:14 - Yeah, there's some really interesting stuff.

39:16 You know, they're now sort of putting ads in the start menu and stuff.

39:21 And then the ads are forced to open in Edge, not your default browser.

39:24 It's just like, there's layers of like, really, like why are you doing it?

39:27 It makes me happy that I'm not using Windows 11 at the moment.

39:30 Whereas I've been actually looking forward to using, say like the new terminal and oh my post shell on Windows and stuff, which looks amazing.

39:37 So I think there's this sort of like different groups.

39:39 So this is definitely a different group than say the VS Code group of people.

39:43 - This is again gonna take us back to 1995 and we're just gonna be using a terminal window to access anything so we don't get annoyed by all of this.

39:50 - There's no ads in the Linux browser.

39:53 - No ads in the Linux browser.

39:55 - Yeah, exactly.

39:56 - Now, if they could just get the ad companies to be able to just collect your credit card information and then instead of showing you the ad, just buy it for you and stick you up on a payment plan.

40:07 - Wouldn't it be great if that was just shared?

40:09 Like we already know who you are, just click here if you want it.

40:12 Okay, great.

40:13 - Or just send it to you anyway and just charge you later.

40:15 (laughing)

40:16 - Free returns.

40:17 - Exactly, so I feel like this almost could be the joke, but I've got a different joke for you.

40:22 - Oh, okay.

40:23 - All right, so the joke for this week comes from a solid source, XKCD, as you may know.

40:28 This is about workflows and changing software.

40:31 So here's the one that says workflow, and it's just in the change log or some sort of conversation flow, maybe a GitHub release or something.

40:38 It says, "Changes in version 10.17.

40:42 "The CPU no longer overheats "when you hold down the space bar." And then there's a frustrated user comment.

40:48 It says, "Long time user four writes, "this update broke my workflow.

40:51 "My control key is hard to reach, "so I hold the space bar instead, "and I've configured Emacs to interpret "a rapid temperature rise as pressing control." (laughing)

41:01 The admin writes, "That's horrifying." The user writes, "Look, my setup works for me.

41:05 Just add an option to re-enable spacebar heating." (laughing)

41:10 - Oh, I remember like enabling all the weird Emacs things that only you would know about.

41:14 (laughing)

41:15 - Exactly, exactly.

41:17 And the subtitle is, "Every change breaks someone's workflow." I love it.

41:23 - Yeah, actually, and it's interesting because Python's even like more so like that because of the introspection and everything's really open unless you really work hard to make it.

41:35 I mean, you can't really hide too much stuff with Python.

41:38 So even if you have a comment around a function or an access point to say, this is not part of the API, this is subject to change, you can change it and it will break somebody because somebody has reached inside and used the thing you told them not to use.

41:55 - Yep, those double underscores and single underscores, They're just there to slow you down, but they don't--

42:00 - That's just there so you notice what you're not supposed to do.

42:02 Those are where the interesting parts are.

42:04 - Exactly.

42:05 They wouldn't give me the feature, but I can just do it right here.

42:09 All right, well, I think that's it, Brian.

42:10 - Yeah, it was a good episode.

42:12 So thanks everybody for showing up.

42:14 - Yeah. - Thanks everyone.

42:15 - Yeah, thanks, Cheryl, for being here.

42:17 Great to have you on the show.

42:18 - Thanks, Michael.

42:19 Thanks, Brian.

42:20 Take care. - Bye, everyone.

Back to show page