#152: You have 35 million lines of Python 2, now what?

Published Tue, Oct 15, 2019, recorded Wed, Oct 9, 2019

0:00

0:26:01

Sponsored by DigitalOcean: pythonbytes.fm/digitalocean

Michael #1: JPMorgan’s Athena Has 35 Million Lines of Python 2 Code, and Won’t Be Updated to Python 3 in Time

With 35 million lines of Python code, the Athena trading platform is at the core of JPMorgan's business operations. A late start to migrating to Python 3 could create a security risk.
Athena platform is used internally at JPMorgan for pricing, trading, risk management, and analytics, with tools for data science and machine learning.
This extensive feature set utilizes over 150,000 Python modules, over 500 open source packages, and 35 million lines of Python code contributed by over 1,500 developers, according to data presented by Misha Tselman, executive director at J.P. Morgan Chase in a talk at PyData 2017.
And JPMorgan is going to miss the deadline
Roadmap puts "most strategic components" compatible with Python 3 by the end of Q1 2020
JPMorgan uses Continuous Delivery, with 10,000 to 15,000 production changes per week
"If you maintain a library that other developers depend on," the post states, "you may be preventing them from updating to 3. By holding other developers back, you are indirectly and likely unintentionally increasing the security risks of others," adding that developers who do not publish code publicly should "consider your colleagues who may also be using your code internally."

Brian #2: organize

suggested by Ariel Barkan
a Python based file management automation tool
configuration is via a yml file
command line tool to organize your file system
examples:
- move all of your screenshots off of your desktop into a screenshots folder
- move old incomplete downloads into trash
- remove empty files from certain folders
- organize receipts and invoices into date based folders

Michael #3: PEP 589 – TypedDict: Type Hints for Dictionaries With a Fixed Set of Keys

Author: Jukka Lehtosalo
Sponsor: Guido van Rossum
Status: Accepted
Version: 3.8
PEP 484 defines the type Dict[K, V] for uniform dictionaries, where each value has the same type, and arbitrary key values are supported.
It doesn't properly support the common pattern where the type of a dictionary value depends on the string value of the key.
Core idea: Consider creating a type to validate an arbitrary JSON document with a fixed schema
Proposed syntax:

    from typing import TypedDict

    class Movie(TypedDict):
        name: str
        year: int

    movie: Movie = {'name': 'Blade Runner',
                    'year': 1982}

Operations on movie can be checked by a static type checker

    movie['director'] = 'Ridley Scott'  # Error: invalid key 'director'
    movie['year'] = '1982'  # Error: invalid value type ("int" expected)

Brian #4: gazpacho

gazpacho is a web scraping library
“It replaces requests and BeautifulSoup for most projects. “
“gazpacho is small, simple, fast, and consistent.”
example of using gazpacho to scrape hockey data for fantasy sports.
simple interface, short scripts, really beginner friendly
retrieve with get, parse with Soup.
I don’t think it will completely replace the other tools, but for simple get/parse/find operations, it may make for slimmer code.
Note, I needed to update certificates to get this to work. see this.

Michael #5: How pip install Works

via PyDist
What happens when you run pip install [somepackage]?
First pip needs to decide which distribution of the package to install.
- This is more complex for Python than many other languages
There are 7 different kinds of distributions, but the most common these days are source distributions and binary wheels.
A binary wheel is a more complex archive format, which can contain compiled C extension code.
Compiling, say, numpy from source takes a long time (~4 minutes on my desktop), and it is hard for package authors to ensure that their source code will compile on other people's machines.
Most packages with C extensions will build multiple wheel distributions, and pip needs to decide which if any are suitable for your computer.
To find the distributions available, pip requests https://pypi.org/simple/[somepackage], which is a simple HTML page full of links, where the text of the link is the filename of the distribution.
To select a distribution, pip first determines which distributions are compatible with your system and implementation of python.
- binary wheels, it parses the filenames according to PEP 425, extracting the python implementation, application binary interface, and platform.
- All source distributions are assumed to be compatible, at least at this step in the process
Once pip has a list of compatible distributions, it sorts them by version, chooses the most recent version, and then chooses the "best" distribution for that version
It prefers binary wheels if there are any
Determining the dependencies for this distribution is not simple either.
For binary wheels, the dependencies are listed in a file called METADATA. But for source distributions the dependencies are effectively whatever gets installed when you execute their setup.py script with the install command.
What happens though if one of the distributions pip finds violates the requirements of another? It ignores the requirement and installs idna anyway!
Next pip has to actually build and install the package.
it needs to determine which library directory to install the package in—the system's, the user's, or a virtualenvs?
Controlled by sys.prefix, which in turn is controlled by pip's executable path and the PYTHONPATH and PYTHONHOME environment variables.
Finally, it moves the wheel files into the appropriate library directory, and compiles the python source files into bytecode for faster execution.
Now your package is installed!

Brian #6: daily pandas tricks

Kevin Markham is sending out one pandas tip or trick per day via twitter.
It’s been fun to watch and learn new bits.
The link is a sampling of a bunch of them.
Here’s just one example:

    Need to rename all of your columns in the same way? Use a string method:

    Replace spaces with _:
    df.columns = df.columns.str.replace(' ', '_')

    Make lowercase & remove trailing whitespace:
    df.columns = df.columns.str.lower().str.rstrip()

Extras

Michael:

Switched to Adobe Audition
Azure Databricks drops Python 2
Better Jupyter in VS Code
macOS Catalina (so far so good)

Jokes:

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:04 This is episode 152, recorded October 9th, 2019. I'm Michael Kennedy.

00:09 And I'm Brian Okken.

00:10 And this episode is brought to you by DigitalOcean. Check them out at pythonbytes.fm/digitalocean.

00:16 Get $50 credit for new users.

00:18 Now, we may have touched on this concept of legacy Python before, Brian. Have we covered it?

00:23 Yeah, I think we have.

00:24 We definitely have.

00:25 So we know that there are companies out there that say it's really tricky for us to upgrade to Python 3 because.

00:32 And sometimes that's because, I don't know, just they don't put the resources into it, right?

00:36 Like they would rather work on features rather than going back and rewriting old code to do the same thing.

00:42 But so it's not so old. Things like that.

00:44 Other times, it's because they have a ton of Python code.

00:48 And we're hearing more and more stories of these companies that have been like head in the sand, waiting until the very, very last minute to make those migrations.

00:57 And they're just like, all right, finally, somebody has raised it to the level that like it has to be dealt with, right?

01:03 Yeah.

01:03 Well, it turns out that banks use a lot of Python code, as we know.

01:07 And I've heard of Bank of America using a ton of code and having a lot of people working on some Python projects.

01:13 But JPMorgan, JPMorgan Chase, they use maybe even more.

01:18 They use a ton of Python.

01:19 So there's an article that's based on this presentation by Misha Selman, who is the executive director at JPMorgan Chase about this.

01:28 It was given at PyData 2017.

01:30 So they've been working on it.

01:31 But the problem is they have 35 million lines of Python 2 code.

01:35 Oh, that's a lot.

01:36 In terms of Python code, that's kind of ridiculous, right?

01:40 That's an insane amount.

01:41 And so they've got a lot of Python code that has to be converted to Python 3.

01:45 And this is from their Athena trading platform, which is at the core of their business operations, right?

01:51 So they got a late start to migrating Python 3.

01:54 And people are pointing out this could be a security risk for them, right?

01:57 Like we saw what happened with Equifax and some outdated things there.

02:02 Like who knows what the risks are?

02:04 I think it's probably less than something like the web frameworks that were out of date at other places.

02:09 But yeah, they have a lot of stuff that has to be migrated.

02:13 And internally, they use Python for pricing, trading, risk management, analytics, and even machine learning.

02:19 So just to look at some stats from this project, the feature set utilizes 150,000 Python modules, over 500 open source packages, 35 million lines of Python code contributed by 1,500 developers.

02:35 Okay.

02:36 So they got a big team.

02:37 That's a huge scale.

02:38 And by the way, I wonder how much JPMorgan Chase is contributing back to those 500 open source projects.

02:44 Hopefully some.

02:45 All right.

02:45 Now, it says they're going to miss the deadline, right?

02:48 That most of the strategic elements are going to be in place by Q1 2020.

02:53 But they can't do it all.

02:55 And I know it's probably like a good roadmap for folks, right?

02:58 They don't have to upgrade it all and then release that new thing, right?

03:02 They can upgrade elements at a time.

03:03 Yeah.

03:04 And there's a lot of great stories on how folks have done that.

03:06 I think probably the Instagram project was the most awesome one I've seen where they didn't even branch, right?

03:13 They just found a way to seamlessly move from Python 2 to 3 while still running on 2 and then finally flipping the switch.

03:18 Here's another one I thought you would find interesting, though.

03:21 They have some other stats.

03:23 You know, on your projects, how often do you commit code?

03:26 It's like once a week, once a day, once an hour.

03:29 Yeah, several times a day.

03:30 Yeah.

03:30 I'm kind of the same.

03:31 I do that.

03:32 And you guys don't really release stuff.

03:34 But like say the Python Bytes website or the Talk Python training site, you know, those probably do some form of website release every other day.

03:44 Some sort of deploy, restart, like run through the whole deployment process.

03:48 So JP Morgan Chase uses continuous delivery with continuous integration, continuous delivery with 10,000 to 15,000 production changes a week.

03:57 That's amazing.

03:59 It's like mind blowing, isn't it?

04:00 Yeah.

04:01 Yeah.

04:01 So they're on it, I guess.

04:03 But it's just such a project of massive scale that it's hard to get your mind around and hard to find analogies.

04:10 So I'm sure there's a few other projects like this in the world, but it can't be many.

04:13 No.

04:14 Well, that's like one a second or faster.

04:17 It's constantly deployed.

04:20 It's got to be microservices and other stuff, right?

04:22 Otherwise, just like how would you go to the website?

04:24 How would you use the services?

04:24 Anyway, quite incredible.

04:27 All right.

04:27 Well, what you got for our next one?

04:28 This is just kind of a cool little tool called Organize.

04:31 And it was suggested from Ariel Barkin on Twitter.

04:35 And I took a look at this and I'm going to start using it right away.

04:39 So it's a Python-based file management automation tool.

04:43 And the idea is people are lazy with how they save files and download files and whatever.

04:50 And on my Mac, for example, all the screenshots just show up on the desktop.

04:54 And then, you know, occasionally I'll just take everything and lump them into a clutter folder or something.

05:00 But this is a tool where you can give it rules.

05:02 It's an EAML file.

05:04 And say, have it do things like move all your screenshots from the desktop into a screenshots folder.

05:10 Or look through all your downloads to look at the incomplete downloads that you canceled or something.

05:16 They're still sitting there.

05:17 And just trash those if they're older than, you know, a few days old or something.

05:21 Like doing things like removing empty files from certain folders like your download or desktop or other places.

05:28 One of the examples is to organize your receipts and invoices into date-based folders.

05:33 Which is pretty cool because there's macros involved that you can look at the file touch time and figure out what date and extrapolate the dates and stuff.

05:42 And yeah, I always, when I'm paying bills or something, I save the receipt to just wherever in the downloads folder or something.

05:49 And having this, just running this every once in a while could clean it up and put everything in its place.

05:56 It's pretty cool.

05:56 It's super cool.

05:57 You could just put it on like a cron job that runs every five minutes or every minute or something, right?

06:01 It just goes, boop.

06:02 It's got to be super quick.

06:03 Just looks at the files and a few folders and then does some text matching.

06:08 It's one of those like, you know, automate the boring stuff sort of things that somebody thought, you know, everybody has this problem.

06:13 So, yeah, it's nice.

06:14 Yeah, I like it.

06:15 I have the same problem with receipts and stuff.

06:17 I'll get them an email or as a PDF attachment or actually it's just an email that I'll print it to PDF so that I can save it for taxes.

06:25 And they just like clutter up.

06:27 Yeah, it's, I could totally see just using that.

06:30 The rules seem like they're rich enough to do that.

06:32 So, yeah, it looks really good.

06:33 Yeah.

06:33 Super cool.

06:33 All right.

06:34 Speaking of cool, let me just tell you about DigitalOcean.

06:37 So all of our services run on DigitalOcean.

06:39 Audio you're listening to now somehow flowed through the DigitalOcean servers to get to you.

06:44 And they've got all sorts of great options out there.

06:47 They're simple but powerful.

06:49 There's not knobs to run absolutely every little edge case, right?

06:54 You set up the main server that you want to work with.

06:57 You have spaces.

06:58 You have hosted databases in MySQL and Postgres.

07:01 And you even have caching like Redis and things like that.

07:05 So super nice.

07:06 Check them out at pythonbytes.fm/digitalocean and get $50 credit for new users.

07:12 Highly recommended.

07:13 Now, this next one is a fun one.

07:16 And it took me a minute to realize what this was about, Brian.

07:19 So I realized there's this new PEP, PEP 589.

07:24 And it allows you to define typed dictionaries.

07:28 Like define a type that represents a dictionary.

07:30 Well, it turns out there was already a way to do that, which is why I was confused.

07:35 Because there's PEP 484, which has been around for a while, which lets you create a dict of K, V.

07:41 Which is like, here's a dictionary of arbitrary keys.

07:44 And it has maybe integers.

07:46 Or it has user objects.

07:47 Or whatever, right?

07:48 So you can define these uniform dictionaries, which is kind of interesting.

07:54 But this new PEP, it lets you go much farther.

07:59 It's proposed by Juka Letozalo.

08:02 And it's actually sponsored by Guido Van Rossum.

08:06 So remember recently we spoke about Guido.

08:08 And we had this philosophical debate of like, well, he's all about typing these days.

08:13 But originally typing was like explicitly left out of the language.

08:16 What's the story?

08:18 So here's another typing thing that he's participating in, which I think is interesting.

08:21 So this is accepted.

08:22 It's scheduled for 3-8.

08:24 So all sorts of interesting stuff.

08:26 And it's coming down the line, right?

08:28 Soon, actually.

08:29 So what it lets you do is imagine you have an arbitrary JSON document or an arbitrary Python

08:34 dictionary, really.

08:35 But it's super easy to think of like, well, somebody sends me a JSON request and I want

08:39 to treat it as if I know what's happening here.

08:42 It lets you actually specify the shape of those things, both the keys as well as the values and

08:51 potentially nested documents, right?

08:53 So you might have a JSON object that's got like some values.

08:56 One of those values might be a list of other JSON documents.

08:58 You can describe that with this type dict thing.

09:01 So the way it works kind of caught me off guard at first, but I think I like it.

09:05 So what you do instead of just saying, you know, there's a dictionary of like string comma

09:10 user, you actually create a class which derives from typed dict, okay?

09:15 Okay.

09:16 And then it has fields.

09:17 It looks a lot like data classes a little bit.

09:19 So you might have like a name colon stir and a year colon int in this thing that is not

09:25 actually the dictionary, but it is the type that validates the dictionary.

09:29 All right.

09:30 Oh, okay.

09:31 And then you can say it is one of those, right?

09:33 So I say the example they give is there's a movie.

09:35 So you say movie colon capital M movie is the name of the class.

09:38 And then it's just a dictionary, but the dictionary has the name, which is a string value and a

09:43 year, which is an integer value and so on.

09:45 And then you can actually validate it.

09:47 And the valid, the like static type checker, like mypy and so on.

09:51 Well, if you say movie of director, it'll say, no, no, no, you can't set this value into this

09:58 dictionary because it doesn't have a key called director.

10:00 Or if you try to set the year to the string 1982 when quotes, it'll say, no, no, then this

10:06 is a string at expected an integer.

10:08 But the errors come at the type checking time, right?

10:10 This is a type checking time.

10:12 Although, you know, it's totally reasonable that things like PyCharm and VS Code would add

10:17 edit time checking for this as well.

10:19 Cause they do for all the other type stuff.

10:20 Yeah, but it's not a runtime.

10:21 It's not a runtime thing.

10:22 Yeah.

10:23 All the typing stuff.

10:24 Okay.

10:24 And this is definitely that way.

10:26 So you're not like re-implementing the dictionary.

10:28 You're not creating a dictionary type that is like different.

10:32 You create a type, which then talks about just a plain dictionary.

10:37 So quite interesting actually.

10:39 Yeah.

10:40 It does take a little while to look at it and go, does this make sense?

10:42 But yeah, it does.

10:43 Right.

10:43 Imagine you're getting, you're writing an API and somebody's submitting like a JSON post to

10:48 you and you want to know, is it valid?

10:50 Right.

10:51 You could use this basically to validate your schema or at least describe the schema you

10:54 expect.

10:55 Yeah.

10:55 Neat.

10:56 It is neat.

10:56 Speaking of APIs and new web things, your next one is one of those, right?

11:00 I got carried down that rabbit hole.

11:03 No, that's cool.

11:04 The next one, I was just enticed by the name.

11:07 So there's a package called gazpacho.

11:09 It's just great.

11:11 It's fun to say.

11:12 It's fun to eat.

11:13 But anyway, gazpacho is a web scraping library.

11:17 And the goal of it is to replace requests and beautiful soup for most web scraping projects.

11:25 And I got to tell you, I was going to do, I have some web scraping projects that I wanted

11:29 to do.

11:29 And I know that requests and beautiful soup are easy to use and are super powerful.

11:33 But that one use case where you're just grabbing, like you're just doing a get, then you parse

11:41 it and then you find some stuff in it and separate it out.

11:44 That's so common that this is basically it's optimizing for that.

11:49 There's an example article that I'll link to also that uses gazpacho to scrape hockey data

11:54 for the use of fantasy sport use.

11:57 But it's just a really simple interface.

11:59 You import from gazpacho, you import get and soup as a class.

12:04 And you can use those to grab some HTML and parse it, find some stuff in there.

12:10 It's just a handful of lines of code and you've got a web scraper on your hands.

12:15 So I like it.

12:16 I think I'll give it a shot.

12:18 But I tried it out and I wanted to bring this up because I tried it out and I ran into a

12:22 problem that I was getting these certificate errors.

12:26 Have you ever gotten certificate errors when you're trying to parse things or pull things

12:29 down?

12:30 Yeah, just once or twice.

12:31 And it's the kind of thing where you bounce off the walls of Stack Overflow until you get

12:36 it fixed and you forget how to fix them.

12:38 But yeah, so what did you do?

12:40 I did the same thing, went to Stack Overflow.

12:42 And apparently in within the, and I don't know if this is just a Mac thing or not,

12:48 but on Macs at least, when you install Python, you also, in the install directory in applications,

12:53 Python 3, whatever, there's a file called installcertificates.command.

12:58 And you just have to run that.

13:01 And then it has the list of certificates or something.

13:04 I don't know how certificates work, but it makes it so that you can access SSL stuff from

13:10 Python.

13:11 So yeah.

13:12 Ran into that today.

13:12 That's right.

13:13 I'm glad you're linking to it.

13:14 So now we'll have it for forever.

13:16 Yeah.

13:17 Yeah.

13:17 That's cool.

13:18 It's nice.

13:19 Spacho is like two to three times faster than Beautiful Soup, which is pretty sweet.

13:24 I like that.

13:25 It also does a lot less.

13:26 So that makes sense.

13:27 Yeah, for sure.

13:28 It's a more focused thing.

13:29 Yeah.

13:30 That's like the 80% case though, right?

13:31 You just need to go do simple things.

13:33 That's what I'm going to use it for.

13:34 So the last thing I want to cover for our main items is pip.

13:38 So remember, actually, we spoke about PyDist, P-Y-D-I-S-T?

13:42 Yeah.

13:42 This is like a private PyPI as a service, I guess.

13:46 It's kind of the way I would describe it.

13:49 So right now, I think they, before we had talked about this and we're like, well, it just

13:56 in beta doesn't seem to have any pricing or anything like that.

13:59 So they have pricing and a little bit more details.

14:02 They've more or less launched at this point.

14:04 And so this article is not about this, but it was written by the folks who run that.

14:09 Just that's the connection back to the previous thing.

14:11 And it talks about how pip install works.

14:15 So for this section, I just want to talk to you real quick about when you say pip install

14:21 certify, like it did in that previous article you just mentioned to fix your certificates.

14:25 What do you do?

14:26 How does it work?

14:26 All right.

14:27 So it walks you through all the steps and all the decisions and whatnot that pip has to make

14:32 when you say pip install some package.

14:34 So the first thing it has to decide, well, first, I guess it does the package exist, right?

14:39 And then it needs to figure out which distribution of the package to install.

14:44 Because we have eggs, we have wheels, we have source.

14:49 We have all these different types of distributions.

14:52 There are seven different kinds of distributions.

14:54 But the most commons are either source distributions or binary wheels.

15:00 So focus on those, right?

15:01 So source distribution is just, here's your Python code and maybe the C code that comes with it.

15:07 And as part of the setup, we're going to run a compiler against the C code to make sure that

15:11 that's compiled in your machine, right?

15:13 Super easy to write, not so easy to make sure it works everywhere, not just works on my machine,

15:19 right?

15:19 Because you've got to have compilers and all the platforms.

15:22 And oh, yeah, what about that old version of Windows that was a minimal install and doesn't

15:26 have GCC or Visual Studio or whatever?

15:29 So wheels are a little bit more safe and also faster.

15:33 But that means they have compiled C code, which has to be, you have to have multiple ones for

15:39 different platforms, right?

15:40 So Windows versus macOS or something.

15:43 Yeah.

15:43 The benefit is stuff installs fast, right?

15:45 So like NumPy takes about four minutes to compile from source.

15:50 So if you did a source dist of NumPy, pip install might be slower than you would otherwise

15:55 expect, right?

15:56 So anyway.

15:57 Yeah.

15:59 The four minute pip install, yes.

16:00 Yeah, that's before you even hit the dependencies, right?

16:03 That's just the primary thing.

16:05 Yeah.

16:05 Okay.

16:05 So it has to figure out which one of those are.

16:07 And there's actually a known URL.

16:10 So like pipi.org slash simple slash package name is where you would go.

16:15 So you could go to that slash request, for example.

16:19 And there is a huge just flat, it's like an, it's a weird API.

16:24 It's like HTML list of like a bunch of wheels with platform names and tarballs and all

16:30 sorts of stuff.

16:31 So it starts out my going there to figure out what is here.

16:34 What can I find?

16:36 So first it determines what system you're on and what's compatible with things.

16:43 So like if you have a binary wheel, there's actually a PEP that talks about how you figure

16:47 out which one that is.

16:48 And then if it's a source gist, like, well, you just assume it works.

16:52 So once it has that, then it'll try to get the best and it prefers wheels.

16:56 And then it has to figure out the dependencies.

16:58 So for binary wheels, there's a file called metadata that has a list of those.

17:02 So that's cool.

17:03 You can just look at that.

17:04 If it's a source distribution, it figures it out by running the setup.py.

17:09 So that's interesting.

17:11 So to run setup.py to actually figure out what dependencies it has to install, you know,

17:15 go do that.

17:15 And then you might have two dependencies.

17:18 You might have a thing and you might depend on, let's say, BeautifulSoup.

17:22 But you also have some other library that also depends on BeautifulSoup if you follow the

17:26 dependency tree.

17:27 And they might even specify versions.

17:29 So you might wonder, well, what happens if one depends on one version and the other depends

17:33 on the other?

17:34 Turns out it just installs it anyway.

17:36 Let's take the latest.

17:37 That's going to be fine, right?

17:38 It's different than like a requirements file that has like different dependency, like pinned

17:43 versions.

17:44 Like there's a slight difference there.

17:45 So finally gets it, builds it, installs it.

17:49 And then it has to figure out where's the path?

17:51 Am I going to install it to a virtual environment?

17:53 Am I going to install it into the system or the user path?

17:57 Things like that.

17:58 So you can look at sys.prefix to figure out which one of those are.

18:02 And there's some environment variables.

18:03 And finally, it copies it over in the right place.

18:07 And your package installed.

18:08 Oh, before it considers your package installed.

18:09 Also converts the source files into PYC bytecode files.

18:13 So they don't have to get parsed again.

18:15 Then your package is installed.

18:16 Okay.

18:17 Yeah.

18:18 So anyway.

18:18 Simple.

18:19 Yeah.

18:19 So if you're wondering like what happens as part of the pip install stuff, there's a

18:22 lot of details.

18:23 And I didn't cover all of it.

18:24 But like, you know, as much as I thought made sense.

18:27 I was just curious.

18:28 I was going to try to find one of those complicated packages.

18:31 Yeah.

18:32 That I knew had to be compiled.

18:33 Because I went to a couple of mine.

18:35 And they're just Python codes.

18:37 So there's just one per version.

18:39 One wheel.

18:40 But like NumPy, for instance.

18:43 I know it's got some compiled code in it.

18:45 It's got like, I lost count.

18:47 It's like 15, 16, 17 different wheels for each version.

18:51 Yeah.

18:51 Requests has got a ton as well.

18:53 Yeah.

18:53 It's interesting.

18:54 It is interesting how that works.

18:55 I'm glad it all works.

18:56 I don't have to think about it too much.

18:58 I don't have to think about it either.

18:58 But it turns out there's like a lot of conversation in there about some stuff that is not totally

19:04 solved even today.

19:05 Right.

19:06 about trying to resolve the dependencies in a totally predictable way before you start

19:11 installing anything and stuff like that.

19:13 So it's worth checking out.

19:14 It's a hard problem.

19:15 Yep.

19:15 For sure.

19:16 But want to finish up with a cool trick?

19:17 Like a zoo trick?

19:18 A zoo animal trick?

19:19 Oh, yeah.

19:20 I'm just zoning today.

19:22 So Kevin Markham, he runs, what's the thing he runs?

19:26 Data School.

19:27 Data School.

19:28 Plus a super nice guy.

19:30 Well, he's doing something neat that's called Daily Pandas Tricks or Tricks and Tips or

19:35 something like that.

19:36 But anyway, we've got a link to it.

19:38 He's sending out a little tip or trick about pandas every day on Twitter.

19:44 And the page we're linking to has a whole bunch of them already built in.

19:48 And I like the notion of just trying to fit something.

19:52 Often they're little screenshots, but they're still pretty small.

19:55 A little lesson of how to do something cool.

19:58 I just picked out one, which is like, let's say you wanted to rename all of the columns

20:03 in a data frame the same way, like to replace all the spaces with underscores or something.

20:08 And he just shows you how to do that in a little thing.

20:11 I think that's neat, especially for something, for a package like pandas, there's a whole bunch

20:16 of stuff you can do with it to have a way to just see a little extra new thing every

20:22 day to say, that's something I might use.

20:24 I'll keep looking at that later or something.

20:26 So I don't think we've talked about it before, and I think it's a cool thing he's doing.

20:30 So I wanted to highlight it.

20:31 Yeah, it's definitely a cool thing he's doing.

20:33 And pandas is one of those things where it's not always obvious.

20:37 All the little magic that you can do, right?

20:39 Like if you want to go to the columns and do string operations, just dataframe.columns.str.apply

20:47 your operation, right?

20:48 Like that's, after you use it for a while, it's obvious, but maybe not, not right away.

20:52 It definitely isn't to me.

20:53 Pandas feels a little like magic to me.

20:55 I'm looking at this going, I would not have guessed that.

20:57 Exactly.

20:57 It's not obvious.

20:59 But once you know it, it's like, well, of course that's better than like, there's this saying

21:03 that if you find yourself looping over things in like NumPy or pandas, you're probably doing

21:08 it wrong.

21:08 One of the nice fun things I think is if you get really good at something, you'll start

21:14 learning the things that you shouldn't do, but that are fun.

21:16 And some of Kevin's tips are, you can do this.

21:21 It's sort of fun, but don't because it's confusing to other people.

21:25 But anyway, here's the trick.

21:26 Nice.

21:27 It's neat that he's including those.

21:29 It's clever, but too clever sometimes.

21:30 Cool.

21:31 All right.

21:31 So do you have any extras to share?

21:33 Oh, not only that we just got finished with our first Python West meetup.

21:38 And last night, and it was both exhausting and really fun.

21:42 So thanks for helping out with that.

21:44 Yeah, you bet.

21:45 Good job putting it together.

21:46 It came out really well.

21:47 Everyone seemed to have a great time.

21:49 There was a totally good turnout.

21:51 I was blown away that it was actually, you know, basically sold out, not sold out, but booked

21:55 out on its very first run, which is crazy.

21:58 And people out there listening, if you want to come and give a talk at the meetup and you're

22:04 willing to find your way to Portland, shoot a message to Brian or me.

22:08 Yeah.

22:08 Let us know.

22:09 That'd be cool.

22:09 Yeah.

22:09 Would be cool.

22:10 And then before anybody asks, it was not recorded.

22:12 So yeah, you have to be here.

22:15 Yeah.

22:15 How about you?

22:16 You got some news to share?

22:17 I got all sorts of stuff.

22:18 A few really quick things.

22:20 One, I upgraded to macOS Catalina yesterday.

22:23 And so far, so good.

22:25 No major problems.

22:26 All the Python things seem to be working.

22:28 So if you're wondering, I did hear that someone out there was having trouble with Miniconda.

22:32 I don't use Miniconda.

22:34 So I have no idea about that.

22:35 Maybe do a Google search if that matters to you.

22:37 Also, Brian, I switched to working with Adobe Audition.

22:40 I've been using Audacity and GarageBand.

22:44 Finally broke down and paid the $30 a month for Adobe Audition.

22:48 And wow, is it worth it.

22:50 It is so good.

22:51 What has been wrong with me to not do that?

22:53 I just didn't want to learn new software.

22:54 It's not so much about the money.

22:55 It's just like, I don't want to learn new hotkeys.

22:57 I already know the hotkeys.

22:59 But it's so super good.

23:00 The reason I bring it up on the show instead of after is if you hear like weird artifacts

23:04 or something odd in the audio, call our attention to it.

23:08 Because there's all these dials and knobs that can like do things like chop off the S's

23:11 at the end of words if you turn them too far and stuff like that.

23:14 So hopefully things sound better.

23:16 If they don't, let us know.

23:18 And then the two Python related things.

23:20 Really quick, Azure Databricks also is dropping support for Python 2.

23:25 So just one more brick to fall for Legacy Python.

23:28 A Python death clock continues to toll for those who hang on to their Python 2.

23:34 And the folks over on the VS Code team, Rong Liu in particular, just announced that at PyCon

23:43 China, they just revealed a cool new Jupyter UI variable explorer and Telesense stuff for basically

23:50 running Jupyter's inside of VS Code.

23:54 So if you're a VS Code user and you care about Jupyter, check that out.

23:57 Very cool.

23:57 Yeah, absolutely.

23:58 Absolutely.

23:58 Well, that's it for the stuff.

24:00 I got a story for you, a joke maybe.

24:02 Yes, please.

24:03 This one comes to us from maybe an unexpected space.

24:06 It comes to a person on Twitter, goes by the sarcastic pharmacist, sent us this actually

24:10 really good joke and a nice comment.

24:12 And the theme is that it's hard to distinguish between what is like super easy in programming

24:19 and what is like nearly impossible for people who are not doing the programming themselves.

24:25 So this is actually an XKCD article 145.

24:28 It's got a programmer, a woman sitting there working at her desk and there's like a manager

24:34 type who comes up and is issuing feature requests.

24:37 Okay.

24:38 The manager, I'm going to think of one of the people from office space maybe.

24:41 And it comes over and says, when a user takes a photo with the app, it should check whether

24:47 they're in a national park.

24:48 And the woman says, sure, easy.

24:50 Easy GIS, look up, give me a few hours.

24:52 Oh yeah.

24:52 And it should also check whether the photo is a bird.

24:54 She says, I'll need a research team in five years.

24:58 The subtitle is, in CS it can be hard to explain the difference between the easy and the virtually

25:03 impossible.

25:03 Yeah.

25:04 So there you go.

25:07 Yeah.

25:08 I don't know.

25:08 That resonates a lot with me at least.

25:10 Yeah.

25:10 We'll probably get a bunch of the image people telling us that it's like five minutes now with

25:15 all the new image libraries to do a bird.

25:17 Yeah.

25:17 But that's now, right?

25:19 Like we probably should, I should see if there's a date for this just to be fair.

25:23 They don't have dates on these.

25:24 That's kind of funky.

25:25 All right.

25:26 Anyway.

25:26 Well, there's probably some algorithm that figures out the number of the XKCD and maps

25:30 it back to a date.

25:30 But yeah.

25:31 Yeah.

25:32 But that's funny.

25:32 Cool.

25:33 All right.

25:33 Well, great to chat with you as always.

25:35 Thank you.

25:36 Yep.

25:36 Bye.

25:37 Thank you for listening to Python Bytes.

25:39 Follow the show on Twitter via at Python Bytes.

25:41 That's Python Bytes as in B-Y-T-E-S.

25:44 And get the full show notes at Pythonbytes.fm.

25:47 If you have a news item you want featured, just visit Pythonbytes.fm and send it our way.

25:51 We're always on the lookout for sharing something cool.

25:54 On behalf of myself and Brian Okken, this is Michael Kennedy.

25:57 Thank you for listening and sharing this podcast with your friends and colleagues.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript