WEBVTT

00:00:00.620 --> 00:00:05.240
This is Python Bytes, Python headlines and news delivered directly to your earbuds.

00:00:05.240 --> 00:00:09.420
It's episode 10, recorded Monday, January 23rd, 2016.

00:00:09.420 --> 00:00:12.400
This is Michael Kennedy. I'm here with Brian Okken.

00:00:12.400 --> 00:00:14.760
Hey, Brian. You ready to talk about Python for the week?

00:00:14.760 --> 00:00:15.960
I'm very ready, yes.

00:00:15.960 --> 00:00:17.960
Yeah, we got some really cool stuff.

00:00:17.960 --> 00:00:21.360
And we have a pretty interesting data story to wrap things up

00:00:21.360 --> 00:00:24.560
where people are out rescuing data with Python, I'm pretty sure.

00:00:24.560 --> 00:00:28.400
So let's save that for the end and let's get started with time.

00:00:29.120 --> 00:00:35.680
We had mentioned in episode 7 that Matplotlib 2.0 was coming out, and it is.

00:00:35.680 --> 00:00:38.060
It's official now as of January 17th.

00:00:38.060 --> 00:00:46.060
And there's an article from a website called Black Arbs called Advanced Time Series Plots in Python.

00:00:46.060 --> 00:00:50.500
And I thought it was a nice focused tutorial.

00:00:50.500 --> 00:00:55.920
Matplotlib does a whole bunch of stuff, and it's a little bit overwhelming if you only use it once in a while.

00:00:55.920 --> 00:00:58.120
So this is a focused tutorial.

00:00:58.240 --> 00:00:59.280
Tutorial looks good.

00:00:59.280 --> 00:01:06.620
It uses pandas, numpy, Matplotlib, of course, and a library called Seaborn that I had never heard of before.

00:01:06.620 --> 00:01:11.740
It's a statistical data visualization add-on, I think, for Matplotlib.

00:01:11.740 --> 00:01:12.440
Looks good.

00:01:12.780 --> 00:01:22.780
But the article starts with an empty XY chart and gets some financial data from Yahoo and does some manipulation of it to plot it out.

00:01:22.880 --> 00:01:26.800
But then, now, the defaults aren't ever exactly what you want.

00:01:26.800 --> 00:01:38.940
So this tutorial goes through how to add, like, shaded areas for, in this example, it was for times where there were recessions.

00:01:39.300 --> 00:01:43.360
If there's something special about part of your data you want to highlight, there's a way to do that.

00:01:43.360 --> 00:01:48.080
Adding chart titles and axis labels and styling the legend.

00:01:48.080 --> 00:01:49.440
I never knew you could do this.

00:01:49.440 --> 00:01:52.360
The different formatting for the XY tick labels.

00:01:52.360 --> 00:01:57.120
And if, like, this chart had both positive and negative numbers.

00:01:57.620 --> 00:02:05.420
And if you want to draw a zero line or a line anywhere that's a horizontal line, it shows how to do that.

00:02:05.420 --> 00:02:06.560
Turning on data points.

00:02:06.560 --> 00:02:15.160
And one of the things I didn't even know you could do, a couple things, were to put chart annotations to describe specific plot points.

00:02:15.160 --> 00:02:19.820
And then put arrows pointing to that piece of the data.

00:02:19.820 --> 00:02:23.340
And then adding logos and watermarks over the top.

00:02:23.340 --> 00:02:26.580
And it's a nice, short, concise tutorial.

00:02:26.580 --> 00:02:27.220
I like it.

00:02:27.220 --> 00:02:28.140
Yeah, it's really nice.

00:02:28.140 --> 00:02:30.220
And it's super approachable.

00:02:30.220 --> 00:02:33.200
It just shows you, like, here's the code steps from here to there to there.

00:02:33.200 --> 00:02:36.140
Get some financial data from Yahoo and do some graphs.

00:02:36.140 --> 00:02:40.540
And, yeah, if you need to graph stuff, right, Python is getting better and better.

00:02:40.540 --> 00:02:41.800
So very, very nice.

00:02:41.800 --> 00:02:42.320
Yeah.

00:02:42.320 --> 00:02:45.420
Speaking of pictures, Instagram has a lot of pictures.

00:02:45.420 --> 00:02:46.320
Yeah.

00:02:46.320 --> 00:02:47.020
Yeah.

00:02:47.020 --> 00:02:54.040
And they actually run a significant amount of traffic through Django and through Python.

00:02:54.420 --> 00:02:59.200
They use a pretty interesting way of configuring their servers.

00:02:59.200 --> 00:03:03.040
It's not at all unlike the way I do for my Talk Python websites as well.

00:03:03.040 --> 00:03:04.340
In fact, Python bytes as well.

00:03:04.340 --> 00:03:07.180
And so let me just quote from their little article here.

00:03:07.180 --> 00:03:09.520
And it says, how we run our web server.

00:03:10.140 --> 00:03:19.180
Instagram's web servers run Django in a multi-process mode with a master that process that forks itself a dozens of times to create worker processes that actually handle the requests.

00:03:19.180 --> 00:03:30.940
And they're doing that with MicroWSGI in a pre-fork mode so they can leverage shared memory across the master and the 20, 50, whatever worker processes that it controls.

00:03:31.060 --> 00:03:33.680
Because they're all kind of running the same code for the most part.

00:03:33.680 --> 00:03:38.980
So why do you need, you know, 20 copies of Python loaded into memory and things like that, right?

00:03:38.980 --> 00:03:42.620
So they're thinking about how can they get their code to run faster?

00:03:42.620 --> 00:03:49.360
How can they run more of these forked processes and basically add more concurrency per server?

00:03:50.040 --> 00:03:52.540
And so they came up with a pretty radical idea.

00:03:52.540 --> 00:03:55.840
They said, what if we just turn off garbage collection in Python?

00:03:55.840 --> 00:03:58.600
That seems bad, right?

00:03:58.600 --> 00:03:59.900
It does seem bad, yeah.

00:03:59.900 --> 00:04:04.680
So they wrote an article called Dismissing Python Garbage Collection at Instagram.

00:04:04.680 --> 00:04:12.420
What they did is they had a thing where they would monitor the memory of the worker processes and I believe restart them if the memory got too out of control.

00:04:12.420 --> 00:04:15.540
I don't know why they're doing that, but for whatever reason they did.

00:04:15.540 --> 00:04:21.760
And I think that they found that the shared memory was much, much less shared than they expected.

00:04:21.760 --> 00:04:30.600
So they said that each worker process was like 225 megs of memory usage and they're only sharing like 140 megs, even though they're basically all doing the same thing.

00:04:30.600 --> 00:04:39.000
So they went to look and sort of do some profiling and they thought that there was some Linux copy on write behaviors happening due to reference counting.

00:04:39.000 --> 00:04:40.820
So they said, all right, let's turn off reference counting.

00:04:40.980 --> 00:04:48.960
That seems like really bad because for those of you guys who don't know, Python's GC or sort of memory management works in two modes.

00:04:48.960 --> 00:04:57.120
There's a direct reference counting deterministic thing that when the last reference to an object goes away, it's typically deleted instantly.

00:04:57.120 --> 00:05:03.420
But the problem with reference counting for garbage collection and memory management is you can get cycles.

00:05:03.760 --> 00:05:05.900
So there's a GC that manages cycles.

00:05:05.900 --> 00:05:06.420
Okay.

00:05:06.420 --> 00:05:09.480
And so what happened is they tried turning off reference counting.

00:05:09.480 --> 00:05:10.100
That didn't help.

00:05:10.100 --> 00:05:11.720
They said, all right, well, let's turn off GC.

00:05:11.720 --> 00:05:19.960
And they said when they turned off garbage collection, they were still able to run about the same, you know, only like accepting sort of memory leaks from cycles, I guess.

00:05:19.960 --> 00:05:25.020
And they said they successfully raised the shared memory from 140 megs to 225.

00:05:25.020 --> 00:05:35.060
I guess that was a little bit the wrong number, but they dramatically increased the shared memory and they were able to drop the memory usage per server by eight gigs.

00:05:35.060 --> 00:05:38.840
And they saved 25% RAM on their entire Jenga fleet.

00:05:38.840 --> 00:05:49.820
And in fact, they actually got it to run 10% faster due to the way the rearranged memory done by garbage collection actually lines up with CPU caches and things.

00:05:50.020 --> 00:05:56.420
So they wrote this really detailed article with like lots of stats and the command you can run to do these things.

00:05:56.420 --> 00:06:12.300
And they said, actually, for us, for our weird ultra high performance, ultra heavy load servers, it makes sense to turn off Python's garbage collection and basically set up a custom shutdown thing that doesn't collect too much memory on the way out the door.

00:06:12.300 --> 00:06:13.440
Isn't that interesting?

00:06:13.440 --> 00:06:15.000
It's very interesting.

00:06:15.000 --> 00:06:17.620
I didn't even know you could turn off the garbage collection.

00:06:18.500 --> 00:06:23.100
There's apparently a way that you can do it, but it's not simple.

00:06:23.100 --> 00:06:23.320
Yeah.

00:06:23.320 --> 00:06:29.700
And they ran into problems like they turned it off and other libraries would ensure that it was on and turn it back on and things like that.

00:06:29.700 --> 00:06:30.660
They're like, no, I want it off.

00:06:30.660 --> 00:06:33.760
And so they had, I think they had to like recompile some things.

00:06:33.760 --> 00:06:36.080
You know, it's not like a flag.

00:06:36.080 --> 00:06:37.840
But anyway, it's pretty interesting.

00:06:37.840 --> 00:06:39.120
It's definitely an interesting story.

00:06:40.220 --> 00:06:41.520
Yeah, it's a thing.

00:06:41.520 --> 00:06:41.520
It's a thing.

00:06:41.520 --> 00:06:44.640
It questions some assumptions about how things work.

00:06:44.640 --> 00:06:48.520
And it's sometimes, it definitely does not sound pretty.

00:06:48.520 --> 00:06:53.620
But, you know, maybe you're optimizing that last 10, 20%.

00:06:53.620 --> 00:06:55.160
It's not going to be pretty.

00:06:55.160 --> 00:06:58.100
But this is a pretty interesting look at what Instagram is doing.

00:06:58.200 --> 00:07:03.960
I just like to look at these large scale deployments and people have handling tons of requests, what they're doing.

00:07:03.960 --> 00:07:04.760
It's always interesting.

00:07:04.760 --> 00:07:05.520
Yeah.

00:07:05.520 --> 00:07:12.060
And also somebody else out there might be thinking, hey, maybe we could turn it off and try to learn from Instagram.

00:07:12.060 --> 00:07:13.020
Yeah, absolutely.

00:07:13.020 --> 00:07:14.140
They detailed it out there.

00:07:14.140 --> 00:07:15.400
They have a nice engineering blog.

00:07:15.400 --> 00:07:17.140
They put interesting stuff up there frequently.

00:07:17.640 --> 00:07:20.960
But Brett Cannon, no, I forget where Brett Cannon works, but.

00:07:20.960 --> 00:07:24.940
Brett Cannon works with Dino Veland over at Microsoft.

00:07:24.940 --> 00:07:31.620
I think the Azure Data Science team and their Brett Cannon is a Python core developer.

00:07:31.620 --> 00:07:32.620
Yeah.

00:07:32.620 --> 00:07:36.180
And he's, we've learned a lot of interesting things from Brett.

00:07:36.180 --> 00:07:39.640
But he's got a blog called snarky.ca.

00:07:39.640 --> 00:07:43.880
And he put up my experience with type hints in mypy.

00:07:44.060 --> 00:07:52.280
I was attracted to this article because I have sort of been watching this type hint thing with a little skepticism.

00:07:52.280 --> 00:07:55.860
So I'm not really sure if it's going to help, how it can help me.

00:07:55.860 --> 00:07:58.380
So I was interested in reading this.

00:07:58.380 --> 00:08:12.180
He took a project that he supports and then used both mypy and type hints to try to, who's hoping to improve the quality of it.

00:08:12.820 --> 00:08:14.740
And it's kind of an interesting project.

00:08:14.740 --> 00:08:16.720
It's a, I didn't even know what CLA was.

00:08:16.720 --> 00:08:20.460
So it's a CLA enforcement bot for the Python Software Foundation.

00:08:20.460 --> 00:08:23.780
And that's the contributor license agreement.

00:08:23.780 --> 00:08:32.680
So the idea is if somebody pushes or submits a pull request into one of the Python projects,

00:08:32.680 --> 00:08:41.100
this robot goes out and makes sure that the person that is pushing or requesting the pull request has assigned CLA.

00:08:41.340 --> 00:08:43.480
It sounds like it's obvious why this is needed.

00:08:43.480 --> 00:08:49.760
But it's a small enough project to run this experiment on of adding type hints.

00:08:49.760 --> 00:09:00.500
And it's not a very long article, but he points to his pull request for this, the changes he made, and the lessons learned.

00:09:00.820 --> 00:09:02.660
I didn't know what mypy was before.

00:09:02.660 --> 00:09:12.480
So mypy is a static analysis tool that makes sure you're, I think it just makes sure your type hints are correct or uses the type hints for linting.

00:09:12.480 --> 00:09:13.820
Yeah, I think so.

00:09:13.980 --> 00:09:19.940
And, you know, the PEP that you talked about, originally, I think that was all Python 3 stuff.

00:09:19.940 --> 00:09:27.740
I'm pretty sure that that all works on Python 2 now, at least with some of the tooling, that the type hint ideas are being backported,

00:09:27.740 --> 00:09:34.800
with the idea being that if we can take a lot of this legacy Python code, give it some static type definitions,

00:09:34.800 --> 00:09:41.880
and then use tooling more intelligently with more information to check the upgrade to Python 3.

00:09:41.880 --> 00:09:47.380
So actually, it can provide some structure to the older code, actually.

00:09:47.380 --> 00:09:51.860
So it's, you know, I always thought of type hints as being this thing that only Python 3 gets,

00:09:51.860 --> 00:09:53.540
and it's kind of this new deal.

00:09:53.720 --> 00:10:01.140
But, you know, it may turn out to be most important in crossing that Python 2 to Python 3 chasm, which is pretty interesting.

00:10:01.140 --> 00:10:02.580
It'd be an interesting benefit.

00:10:02.580 --> 00:10:04.060
You know what else is interesting?

00:10:04.060 --> 00:10:11.060
The second largest, the second most active contributor to this mypy project, Guido Von Rossum.

00:10:11.060 --> 00:10:11.820
Oh, really?

00:10:11.820 --> 00:10:12.560
Yeah.

00:10:12.560 --> 00:10:14.660
He's really doing a lot with this.

00:10:14.660 --> 00:10:17.700
There was a couple hiccups that he ran into.

00:10:17.700 --> 00:10:24.440
Apparently, there's some of the 3.6 isn't fully supported, but I'm sure that'll get supported fairly quickly.

00:10:24.440 --> 00:10:30.300
What I really wanted to highlight was, I'm going to quote from his conclusion.

00:10:30.300 --> 00:10:38.460
After this experiment, he asked himself, would he bother using typing and static typing in new Python 3 code?

00:10:38.460 --> 00:10:40.000
And essentially, he said yes.

00:10:40.000 --> 00:10:45.280
Once it supports f-strings, apparently, at the time, it doesn't.

00:10:45.280 --> 00:10:47.040
I don't know if it's fixed yet.

00:10:47.040 --> 00:10:52.140
I think it only supported Python 3.5 and f-strings, of course, 3.6, which is shiny new.

00:10:52.140 --> 00:10:59.980
But he says, when I design an API, I already have to think about what type of objects would be acceptable.

00:10:59.980 --> 00:11:03.280
So quickly writing down my assumptions doesn't hurt anything.

00:11:03.280 --> 00:11:07.520
It's relatively quick, and it benefits anyone having to work with my code.

00:11:07.520 --> 00:11:12.700
But I also wouldn't contort my code to fit within the confines of type hints.

00:11:13.000 --> 00:11:17.220
For example, if type hints force me to write cleaner code, then that's great.

00:11:17.220 --> 00:11:26.400
But if something is so dynamic that it can't have type hints, then that's fine, and I'll happily use typing.any as an escape hatch.

00:11:27.100 --> 00:11:35.800
In the end, I view type hints as enhanced documentation that has tooling to help verify that the documentation about types is accurate.

00:11:35.800 --> 00:11:40.340
And for that use case, I see type hints worth doing and not at all a burden.

00:11:40.340 --> 00:11:44.520
And he also mentions that it didn't make his code less readable.

00:11:44.520 --> 00:11:46.560
It only enhanced the readability.

00:11:46.840 --> 00:11:54.800
And since I'm all for making your code more readable, this article actually gives me a more positive light on using types.

00:11:54.800 --> 00:11:55.920
Yeah, that's cool.

00:11:55.920 --> 00:11:59.380
I do periodically put type hints in my Python 3 code.

00:11:59.380 --> 00:12:02.460
And I find that I usually do it when I'm looking at a function.

00:12:02.460 --> 00:12:09.220
I'm like, you know, I don't really remember the context in which this is called and what is being passed in here.

00:12:09.280 --> 00:12:12.580
So let me make it really explicit at the function signature level.

00:12:12.580 --> 00:12:16.240
And then the tooling can, of course, tell me if we're doing it wrong somewhere else.

00:12:16.240 --> 00:12:21.040
But it's kind of a reminder to me if it's not obvious what's going on.

00:12:21.040 --> 00:12:23.280
But I don't do it, like, universally.

00:12:23.280 --> 00:12:26.660
It's more like in those places where I'm knowing the type is going to really help.

00:12:26.660 --> 00:12:37.360
Yeah, and I'm still spending, I guess, too much time straddling the legacy Python 3 world that I'm still having trouble with that chasm.

00:12:37.360 --> 00:12:38.100
Yeah, sure.

00:12:38.620 --> 00:12:40.700
Well, the mypy works on Python, too.

00:12:40.700 --> 00:12:41.280
So there you go.

00:12:41.280 --> 00:12:45.080
You know what else is quite pervasive throughout Python?

00:12:45.080 --> 00:12:46.100
Underscores.

00:12:46.100 --> 00:12:47.980
The underscore is special in Python.

00:12:47.980 --> 00:12:48.920
Is it special?

00:12:48.920 --> 00:12:50.160
It is special.

00:12:50.160 --> 00:12:56.900
You know, there's a lot of things that are special or, you know, maybe a more common way to say it would be idiomatic, right?

00:12:56.900 --> 00:12:57.700
Or Pythonic.

00:12:57.700 --> 00:13:03.040
I think the use of the underscore is one of these symbols that shows up a lot.

00:13:03.040 --> 00:13:04.340
And they actually have meaning.

00:13:04.820 --> 00:13:11.620
But people coming from other languages don't necessarily know all the nuanced behavior of the underscore.

00:13:11.620 --> 00:13:13.980
Or not behavior, meaning of the underscore.

00:13:14.720 --> 00:13:21.280
And so there's this article called Understanding the Underscore of Python, which is kind of a cool title.

00:13:21.280 --> 00:13:22.520
It gives you some examples.

00:13:22.520 --> 00:13:31.140
It says if you write, if you come across some Python code, you might see something like for underscore in range 10 or dunder in itself.

00:13:31.380 --> 00:13:32.180
Things like that.

00:13:32.180 --> 00:13:32.920
You see it all the time.

00:13:32.920 --> 00:13:34.820
But these underscores have meanings.

00:13:34.820 --> 00:13:43.000
And they said, look, there's basically five cases with some sub cases thrown in about when you might have special meaning with this underscore.

00:13:43.000 --> 00:13:45.160
The first one is if you're in the REPL.

00:13:45.160 --> 00:13:48.600
If you're in just the, you know, just type Python, go into the REPL.

00:13:48.600 --> 00:13:52.380
Whatever you type, the last value is always underscore.

00:13:53.520 --> 00:13:53.880
Right.

00:13:53.880 --> 00:13:57.840
So if you call a function and you forget to store its value, you can just say underscore.

00:13:57.840 --> 00:14:01.040
And that's whatever that function returned, for example.

00:14:01.040 --> 00:14:03.100
I didn't know that before reading this article.

00:14:03.100 --> 00:14:03.720
That's pretty cool.

00:14:03.720 --> 00:14:04.340
Yeah.

00:14:04.340 --> 00:14:04.520
Yeah.

00:14:04.520 --> 00:14:05.140
It's really handy.

00:14:05.140 --> 00:14:09.540
If you did something that took like 20 seconds, you're like, oh, I forgot to put that in a variable.

00:14:09.540 --> 00:14:10.780
You can just, you still have it.

00:14:10.780 --> 00:14:11.420
It's called underscore.

00:14:11.420 --> 00:14:12.380
That's cool.

00:14:12.380 --> 00:14:13.840
But that's not really in code, right?

00:14:13.960 --> 00:14:18.340
The first example I gave you, four underscore in range of 10.

00:14:18.340 --> 00:14:20.360
That's kind of the, I don't care.

00:14:20.360 --> 00:14:25.700
I have to put something here, but I want to explicitly make the point that I don't care about this variable.

00:14:25.700 --> 00:14:31.060
So there's interesting places where that might come up, right?

00:14:31.060 --> 00:14:32.980
This four underscore is an example.

00:14:32.980 --> 00:14:34.660
Maybe I just want to do a loop 10 times.

00:14:34.660 --> 00:14:35.900
I don't actually care about the number.

00:14:35.900 --> 00:14:36.880
I just want to do it 10 times.

00:14:36.880 --> 00:14:39.460
So that's one way to do it.

00:14:39.460 --> 00:14:40.440
If you have a...

00:14:40.440 --> 00:14:42.600
So that's just a convention, right?

00:14:42.860 --> 00:14:49.080
So what's happening there is the underscore is just a character, which is a valid variable name?

00:14:49.080 --> 00:14:50.100
Yes, exactly.

00:14:50.100 --> 00:14:50.560
Okay.

00:14:50.560 --> 00:14:54.200
Except for the linters, the linting tools know about it.

00:14:54.200 --> 00:14:55.380
Oh, okay.

00:14:55.380 --> 00:15:02.780
So for example, in like in PyCharm, if you have a function that takes a parameter and you're not using that parameter,

00:15:02.780 --> 00:15:07.460
then it'll like make it gray and give it a little squiggly saying, here's an unused argument.

00:15:07.460 --> 00:15:11.980
Examples come to mind in the __init__() in Pyramid.

00:15:12.080 --> 00:15:15.840
There's like a sort of startup thing that passes a configuration settings thing.

00:15:15.840 --> 00:15:19.660
And if you don't care about the configuration settings thing, well, what are you going to do?

00:15:19.660 --> 00:15:26.580
Well, you can just put an underscore and then you don't get warnings that that variable's unused.

00:15:26.580 --> 00:15:27.780
Now, let's see.

00:15:27.780 --> 00:15:32.880
If you're doing tuple unpacking and you've got a tuple of five things, you want the first and the fourth or whatever.

00:15:32.880 --> 00:15:36.620
You can put some underscores in there to make it unpack correctly all over the place.

00:15:36.620 --> 00:15:38.340
So you have this, I don't care variable.

00:15:38.340 --> 00:15:42.420
And the linters actually will not warn you that underscore is not used.

00:15:42.420 --> 00:15:43.320
Okay, cool.

00:15:43.320 --> 00:15:44.200
That's cool.

00:15:44.360 --> 00:15:46.440
And then you have like giving meaning to variables.

00:15:46.440 --> 00:15:50.880
Underscore variable is like a, some people say private.

00:15:50.880 --> 00:15:53.400
I think of it more of as a protected variable.

00:15:53.400 --> 00:15:58.160
So if I've got a class and has a field or attribute like underscore name or whatever, right?

00:15:58.160 --> 00:16:01.040
It's hints to consumers of that class.

00:16:01.040 --> 00:16:01.900
They should leave it alone.

00:16:02.060 --> 00:16:03.480
But technically it's still accessible.

00:16:03.480 --> 00:16:05.260
I think of that more as protected, right?

00:16:05.260 --> 00:16:08.880
It still leaves it accessible to derived classes, but it does tell people stay away.

00:16:08.880 --> 00:16:09.540
Yeah.

00:16:09.540 --> 00:16:14.260
I also think of that as like, I don't make any promises that I might not change this behavior.

00:16:14.260 --> 00:16:14.900
Yes.

00:16:14.900 --> 00:16:15.500
Yeah.

00:16:15.500 --> 00:16:16.060
Yeah.

00:16:16.060 --> 00:16:16.760
Yeah, exactly.

00:16:16.760 --> 00:16:17.220
All right.

00:16:17.220 --> 00:16:20.580
And you can have double underscore in a class, which will actually rename it.

00:16:20.580 --> 00:16:23.060
So it is much harder to get a hold of like name mangling.

00:16:23.060 --> 00:16:24.180
So that's like real private.

00:16:24.180 --> 00:16:30.480
And if you have a conflicting thing, you can have, like if you have a variable, you want to call it in,

00:16:30.660 --> 00:16:33.080
well, there's a keyword called in, like key in dictionary.

00:16:33.080 --> 00:16:35.320
You can just say in underscore.

00:16:35.320 --> 00:16:39.400
SQLAlchemy has like and underscore or underscore in underscore.

00:16:39.400 --> 00:16:45.540
So it can have things that look very similar to restricted keywords, but, you know, avoids them.

00:16:45.540 --> 00:16:52.360
And then you have the whole Python data, data model with like __init__(), dunder new, dunder representation,

00:16:52.360 --> 00:16:54.360
string, so on, like all the magic methods.

00:16:54.360 --> 00:16:57.180
And then there's a few that I didn't really know about.

00:16:57.180 --> 00:17:00.620
Well, one, I guess, in the internationalization of strings.

00:17:00.620 --> 00:17:02.160
There's some special meaning there.

00:17:02.160 --> 00:17:02.940
It's kind of complex.

00:17:02.940 --> 00:17:03.840
I'm not going to go into it.

00:17:03.840 --> 00:17:09.300
And then in Python 3.6, you can use it as like where you would have commas for separating for digit grouping.

00:17:09.300 --> 00:17:10.960
You can have underscore.

00:17:10.960 --> 00:17:15.580
So like one underscore zero, zero, zero, underscore zero, zero is a million in Python 3.6.

00:17:15.580 --> 00:17:16.200
Yeah.

00:17:16.200 --> 00:17:20.740
And this is good also just for reading other people's code to understand how they're using it.

00:17:20.740 --> 00:17:21.140
Yeah.

00:17:21.140 --> 00:17:24.980
Because you might just think, oh, that's a weird naming convention that that person used here.

00:17:24.980 --> 00:17:29.920
But like they're trying to communicate something very specific by using underscore in all these circumstances.

00:17:29.920 --> 00:17:31.860
So I think it's good to like bring it up.

00:17:31.860 --> 00:17:33.340
People maybe come from other languages.

00:17:33.340 --> 00:17:33.880
Yeah.

00:17:34.220 --> 00:17:38.760
Especially people that come from Perl because Perl has a very special meaning for underscore.

00:17:38.760 --> 00:17:41.140
So it's good to know that it isn't that.

00:17:41.140 --> 00:17:41.700
Yeah.

00:17:41.700 --> 00:17:46.840
You know, one of the things that I think is frustrating is if there's like a super old PyPI package.

00:17:46.840 --> 00:17:50.200
It hasn't been updated in five years, but it's got like that perfect name.

00:17:50.200 --> 00:17:53.420
There should be some active project that could sort of inherit its name.

00:17:53.420 --> 00:17:57.500
There was a PEP that came out on the 12th by Donald Stuffed.

00:17:57.500 --> 00:18:07.440
And it's in draft status currently, but I'm sure that with probably a few tweaks by people that are really in the Python core people, it'll probably go through.

00:18:07.440 --> 00:18:09.440
But it's called package.

00:18:09.440 --> 00:18:13.500
It's PEP 541 package index name retention.

00:18:14.400 --> 00:18:22.360
And it's basically just like what you said is given a package name in the index, it's a flat namespace.

00:18:22.360 --> 00:18:24.560
The unique names are a finite resource.

00:18:24.560 --> 00:18:34.060
So as the package index is growing and growing older, we need to have a way to deal with things like old packages that aren't updated anymore.

00:18:34.060 --> 00:18:38.260
Somebody that just like squatted on it, but hasn't done anything with it.

00:18:38.760 --> 00:18:45.460
Or, you know, there's other various things or conflict of two people that legitimately want to use it.

00:18:45.460 --> 00:18:47.880
How do we resolve those conflicts?

00:18:47.880 --> 00:18:55.680
Yeah, like an example comes to mind for me is there's a cool package for kind of a crummy problem called Suds.

00:18:55.680 --> 00:19:02.340
So consuming soap web services in Python is not pretty because they have all their namespaces.

00:19:02.340 --> 00:19:13.900
So it's just like soap is like a big gnarly XML format for web services that's like preceded or was sometimes more popular within enterprises than JSON HTTP services.

00:19:13.900 --> 00:19:19.140
So there's a cool package called Suds that has like a really great API for consuming those on the client side.

00:19:19.140 --> 00:19:21.240
But it's kind of old and outdated.

00:19:21.240 --> 00:19:23.280
It only supports Python 2.

00:19:24.080 --> 00:19:35.360
So a guy whose username is Jerko created, no reflection on him personally, created a project called Suds-Jerko.

00:19:35.360 --> 00:19:42.640
So if you wanted to have in the package name and code is just Suds, but it's like basically that thing upgraded for Python 3.

00:19:42.640 --> 00:19:49.980
And he couldn't use Suds because even though it was not being maintained, it still owned it on PyPI.

00:19:49.980 --> 00:19:51.680
Yeah, that's a tough one though.

00:19:51.680 --> 00:19:56.160
And even if something's not updated, it's hard to know if people are using it or not.

00:19:56.160 --> 00:20:09.960
Some of the stuff that was addressed in this proposal is things like checking out other uses of the word like on CPAN or npm or GitHub.

00:20:09.960 --> 00:20:18.200
And so like if it's a very popular GitHub repository, maybe we should let it go through or something.

00:20:18.200 --> 00:20:24.000
I actually wanted to put in a package name to expect in.

00:20:24.000 --> 00:20:26.020
It was already used by somebody else.

00:20:26.020 --> 00:20:28.040
But then I went out and looked and it was legitimate.

00:20:28.040 --> 00:20:33.560
There's a lot of other tools that were more like that tool than what I wanted to use.

00:20:33.560 --> 00:20:38.080
So it would have been bad for me to try to throw a hissy fit and grab that.

00:20:38.080 --> 00:20:39.480
Yeah, it's interesting.

00:20:39.480 --> 00:20:43.700
Yeah, it's cool that they're proposing a way to look at prior art and existing things.

00:20:43.700 --> 00:20:47.000
I'm actually a little surprised it didn't already exist, but I'm glad it's happening.

00:20:47.000 --> 00:20:48.180
Yep.

00:20:48.180 --> 00:20:49.900
It's definitely a good thing.

00:20:49.900 --> 00:20:52.620
I mean, we're about to come up on 100,000 PyPI packages.

00:20:52.620 --> 00:20:56.720
Like the naming is starting to become something scarish, right?

00:20:56.720 --> 00:20:57.680
All right.

00:20:57.680 --> 00:21:05.460
So did you ever read William Gibson books like Neuromancer and some of these crazy futuristic cyberpunk type books?

00:21:05.460 --> 00:21:06.120
No.

00:21:06.180 --> 00:21:06.620
Yeah.

00:21:06.620 --> 00:21:12.880
I kind of feel like we're living in this sort of sci-fi future in a weird way.

00:21:12.880 --> 00:21:16.660
There was the craziest article that I saw I've seen in a while come out.

00:21:16.660 --> 00:21:18.960
And I'll just read you the title.

00:21:18.960 --> 00:21:26.640
Hackers downloaded U.S. government climate data and stored it on European servers as Trump was being inaugurated.

00:21:27.220 --> 00:21:28.360
So here's the deal.

00:21:28.360 --> 00:21:33.280
I'm going to try to make this as politically neutral as possible.

00:21:33.280 --> 00:21:38.160
I just want to talk about the data side of this in a pretty interesting way.

00:21:38.160 --> 00:21:41.660
Like who owns and controls data, right?

00:21:41.660 --> 00:21:45.980
Does the president, does the country, does the world own this data?

00:21:46.140 --> 00:21:55.200
Like the U.S. has an incredible amount of data about the world at the Environmental Protection Agency, at NASA, Department of Energy, and so on.

00:21:55.200 --> 00:22:06.260
And there had been some rumors that when Trump took office, he was going to reduce, cut back, or even fire some people that worked on climate change.

00:22:06.780 --> 00:22:18.620
People knew this a few months before, maybe a month before there, where people are looking around saying, you know, these websites, like the EPA, for example, has a bunch of data on it.

00:22:18.620 --> 00:22:20.340
And they're like, what if that gets deleted?

00:22:20.340 --> 00:22:29.440
There was actually a meeting of 60 programmers and scientists at the Department of Information Studies at UCLA.

00:22:29.840 --> 00:22:41.140
And their goal was to find as many of these websites with government data on it and start to scrape it and download it and exfiltrate it from the country.

00:22:41.140 --> 00:22:43.320
Isn't that crazy?

00:22:43.320 --> 00:22:51.720
So what they did is they all got together and they were working like 22 hours a day for like the four or five days leading up to the inauguration.

00:22:51.720 --> 00:22:53.220
And they were just downloading.

00:22:53.220 --> 00:22:59.380
They had like a huge spreadsheet of hundreds of sites where they'd go in and they'd try to get the data and they would pull it off.

00:22:59.380 --> 00:23:13.580
So examples include like web pages dedicated to Department of Energy's solar project initiatives or the Energy Information Administration stuff about fossil fuels compared to renewable energy and fuel cell research and those kinds of things.

00:23:13.580 --> 00:23:26.900
And it turns out there's actually these sort of what they call volunteer data rescue events in Toronto and Philly and Chicago and Indianapolis and Michigan over the last weeks trying to scrape hundreds of thousands of pages off the Internet.

00:23:27.700 --> 00:23:29.660
So you might say like people are being crazy.

00:23:29.660 --> 00:23:30.220
Come on.

00:23:30.220 --> 00:23:37.780
But it turned out exactly at noon has Trump was sworn in and UCLA was actively working that that group was working.

00:23:37.780 --> 00:23:41.600
All the climate change related data on Whitehouse.gov disappeared.

00:23:41.600 --> 00:23:43.600
But they had gotten a lot of it.

00:23:43.600 --> 00:23:44.280
Okay.

00:23:44.680 --> 00:23:55.340
And so there's this company called Page Freezer and I think they're a Canadian company and they basically create snapshots of web pages a little bit like Internet Archive or the Wayback Machine.

00:23:55.340 --> 00:23:57.860
And it's actually a commercial service.

00:23:57.860 --> 00:24:03.360
So these guys start to participate and they're storing all of this data somewhere in Europe.

00:24:03.360 --> 00:24:12.020
The reason for storing it in Europe because there's no like there'd be not a way for the government to force it to shut down or something.

00:24:12.560 --> 00:24:13.160
Yeah, exactly.

00:24:13.160 --> 00:24:13.780
Exactly.

00:24:13.780 --> 00:24:20.460
So I just I thought that was like that's the craziest headline around programming I've seen this week.

00:24:20.460 --> 00:24:20.740
Yeah.

00:24:20.740 --> 00:24:22.480
And I wanted to include that.

00:24:22.480 --> 00:24:34.640
Now, I do know that there's there's so again to try to not be too political on this, but there's aren't there legacy versions of the old versions of the Whitehouse website?

00:24:34.640 --> 00:24:35.360
There there.

00:24:35.360 --> 00:24:35.960
There.

00:24:35.960 --> 00:24:43.700
I mean, probably on like on the Wayback Machine and stuff, but I it could not it could be that it doesn't necessarily store the data linked in it.

00:24:43.700 --> 00:24:44.260
You know what I mean?

00:24:44.260 --> 00:24:48.400
Like like the 100 meg CSV file or whatever.

00:24:48.400 --> 00:24:50.980
And that was the kind of stuff they were actually going after.

00:24:50.980 --> 00:24:51.420
Hmm.

00:24:51.420 --> 00:24:52.640
Well, it's good.

00:24:52.640 --> 00:24:53.540
Good to have it saved.

00:24:54.880 --> 00:24:56.340
It's it's good to have it saved.

00:24:56.340 --> 00:24:58.100
So we'll we'll see where it goes.

00:24:58.100 --> 00:25:07.920
But I just thought this was a really interesting and I'm sure Python played a pretty key role in a lot of screen scraping initiatives with beautiful soup and scrapey and things like that.

00:25:07.920 --> 00:25:08.240
Yeah.

00:25:08.240 --> 00:25:09.040
All right.

00:25:09.040 --> 00:25:11.140
So I think we're going to leave it there for this week.

00:25:11.140 --> 00:25:13.600
That's a lot of a lot of cool news for everyone, Brian.

00:25:13.600 --> 00:25:14.620
Definitely cool.

00:25:14.620 --> 00:25:15.180
All right.

00:25:15.180 --> 00:25:16.460
Well, thanks for talking to me this week.

00:25:16.720 --> 00:25:17.000
You bet.

00:25:17.000 --> 00:25:19.180
Thanks for sharing the stories and I'll catch you next time.

00:25:19.180 --> 00:25:22.220
Thank you for listening to Python Bytes.

00:25:22.220 --> 00:25:24.780
Follow the show on Twitter via at Python Bytes.

00:25:24.780 --> 00:25:27.680
That's Python Bytes as in B-Y-T-E-S.

00:25:27.680 --> 00:25:31.060
And get the full show notes at Python Bytes dot FM.

00:25:31.060 --> 00:25:35.440
If you have a news item you want featured, just visit Python Bytes dot FM and send it our way.

00:25:35.440 --> 00:25:38.160
We're always on the lookout for sharing something cool.

00:25:38.160 --> 00:25:41.540
On behalf of myself and Brian Okken, this is Michael Kennedy.

00:25:41.540 --> 00:25:45.160
Thank you for listening and sharing this podcast with your friends and colleagues.

