#145: The Python 3 “Y2K” problem

Published Sat, Aug 31, 2019, recorded Wed, Aug 28, 2019

Sponsored by Datadog: pythonbytes.fm/datadog

Special guests

Michael #1: friendly-traceback

via Jose Carlos Garcia (I think 🙂 )
Aimed at Python beginners: replacing standard traceback by something easier to understand
Shows help for exception type
Shows local variable values
Shows code in a cleaner form with more context
3 ways to install
- As an exception hook
- Explicit explain
- When running an app

Matt #2: Pandas Users Survey

Most use it almost everyday but have less than 2 years experience
Linux 61%, Windows 60%, Mac 42%
93% Python 3

Anthony #3: python3 “Y2K” problem (python3.10 / python4.0)

with python3.8 close to release and python3.9 right around the corner, what comes after?
both python3.10 and python4.0 present some problems
- sys.version[:3] which will suddenly report '``3.1``' in 3.10
- a lot of code (including six.PY3!) uses sys.version_info[0] == 3 which will suddenly be false in python4.0 (and start running python2 code!)
early-to-mid 2020 we should start seeing the next version in the wild as python3.9 reaches beta
easy ways to start testing this early:
- python3.10 - a build of cpython for ubuntu with the version number changed
- flake8-2020 - a flake8 plugin which checks for these common issues-

Michael #4: pypi research

via Adam (Codependent Codr)
Really interesting research paper on the current state of Pypi from a couple authors at the University of Michigan: "An Empirical Analysis of the Python Package Index" - https://arxiv.org/pdf/1907.11073.pdf
Comprehensive empirical summary of the Python Package Repository, PyPI, including both package metadata and source code covering 178,592 packages, 1,745,744 releases, 76,997 contributors, and 156,816,750 import statements.
We provide counts and trends for packages, releases, dependencies, category classifications, licenses, and package imports, as well as authors, maintainers, and organizations.
Within PyPI, we find that the growth of the repository has been robust under all measures, with a compound annual growth rate of 47% for active packages, 39% for new authors, and 61% for new import statements over the last 15 years.
In 2005, there were 96 active packages, 96!
MIT is the most common license
(Matt) I saw this and was surprised at most commonly used libraries. What do you think the most common 3rd party library is?

Matt #5: DaPy

“Pandas for humans” - Matt’s words
Has portions of pandas, scikit-learn, yellowbrick, and numpy
Designed for “data analysis, not for coders”

Anthony #6: python-remote-pdb

very small over-the-network remote debugger
thin wrapper around pdb in a single file (easy to drop the file on PYTHONPATH if you can’t pip install)
not as fully featured as other remote debuggers such as pudb / rpdb / pycharm’s debugger but very easy to drop in
fully supports [breakpoint()](https://www.python.org/dev/peps/pep-0553/) (python3.7+ or via future-breakpoint)
access pdb via telnet / nc / socat
I’m using it to debug a text editor I’m writing to learn curses!

Extras:

Michael:

Matt

http://bit.ly/psxgb - My new course on Machine Learning with XGBoost

Anthony:

https://github.com/DRMacIver/hecate “like selenium webdriver for the terminal”

Jokes:

Michael: Two mathematicians are sitting at a table in a pub having an argument about the level of math education among the general public.

The one defending overall math knowledge gets up to go to the washroom. On the way back, he encounters their waitress and says, "I'll add an extra $10 to your tip, if you'll answer a question for me when I ask it. All you have to say is 'x-squared'." She agrees.

A few minutes later the populist mathematician says to his buddy, "I'll bet you $20 that even our waitress can tell us the integral of 2x." The cynic agrees to the bet.

So the schemer beckons the waitress to their table and asks the question, to which she replies "x-squared". As he begins to gloat and demand his winnings, the waitress continues, "Plus a constant."

Anthony: I had a golang joke prepared, but then I panic()d

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:04 This is episode 145, recorded August 28th, 2019.

00:09 I'm Michael Kennedy, and Brian is away on vacation.

00:13 Yeah, I was on vacation last week, now Brian's gone.

00:15 But don't despair, we have two special guests.

00:18 We have Matt Harrison.

00:19 Hello.

00:20 Welcome, Matt.

00:20 And Anthony Sotili.

00:23 Welcome.

00:23 First time here on the show.

00:25 Nice to have you here.

00:25 Nice to be on the show.

00:26 Yeah, it's great to have you.

00:28 I'm looking forward to talking about all these things with you.

00:30 Now, before we get on to our topics, let me just say real quickly, the show is brought to you by Datadog.

00:35 Check them out at pythonbytes.fm/Datadog.

00:38 More on that later.

00:39 But I want to focus first on something that's going to help people learning Python or teaching people who are learning Python.

00:50 And that's this project called Friendly Traceback.

00:54 Matt, you do a lot of teaching.

00:56 What is your experience with folks, you know, one of their first programming experiences running into like a traceback crash?

01:02 Like, is it really clear for them?

01:03 Most of my training is with experienced technologists to understand what a traceback is.

01:10 So I do do some with kids in elementary school, that sort of thing.

01:16 But my issue is I like the idea here.

01:19 But in my courses, I teach explicitly have them hit errors and teach them how to read the traceback and recover them.

01:28 How to recover, yeah.

01:29 So I like the idea.

01:30 The other thing that I have sort of an issue with is that you have to install it, right?

01:36 So having someone who's a beginner install something, I don't know.

01:40 What do you think about that?

01:41 That's an interesting question because you – well, let me tell folks what it is real quick.

01:45 So this one comes to us from Jose Carlos Garcia, I think, because his Twitter name is in Elite Speak.

01:50 So that's my attempt to understand it back into English.

01:54 So thank you, Carlos, for sending that in.

01:56 And the idea is it's really aimed at beginners, as you kind of hinted at, Matt.

02:01 And let me just give folks a sense of what it is.

02:04 So normal traceback will have, like, in reverse order, the call stack.

02:07 And then maybe the line of code.

02:10 And then the error message, like the error type.

02:13 And then possibly the error message, right?

02:15 Well, what this does instead is it will catch the exception type, like an index error.

02:23 And then it has a little help message and an example, right?

02:26 So it says, oh, this is a Python exception.

02:27 You got an index error.

02:28 That means the list index is out of range in this case.

02:31 And this error occurs under these circumstances.

02:34 Here's usually what happened.

02:35 Here's usually the cause.

02:37 And then one thing I really like about it is it first shows the line of code where it happened.

02:41 And then it actually shows the local variables.

02:44 So like in the example there, it has a function called get last item, and you're passing a list.

02:49 And it says the list that was passed was 1, 2, 3.

02:53 And then here is actually like three or four lines of code where this error is happening and what the index value was and all that.

02:59 So you could actually diagnose this without even stepping into a debugger.

03:04 You could just say, oh, I see.

03:05 The index is 3, but actually it's 0, 1, 2 by just looking at the output on the screen here.

03:10 And that I think is pretty cool.

03:11 Yeah.

03:12 I think it's definitely useful.

03:14 I looked over the documentation.

03:16 I thought, this is really cool.

03:17 My issue with beginners is how to walk them through something.

03:22 And installing is always a pain when they're beginners.

03:27 I also, the other thing that's really cool about it is it looks like it has some hooks.

03:32 So you can define your own exceptions.

03:34 You can customize how you handle those.

03:38 So I think that that looks kind of cool as well.

03:40 I mean, I can imagine maybe if you're in a environment at work where you have support people.

03:49 And they might have to look at your code or look at your servers and take problems and resolve them.

03:56 This might be something that could aid them.

03:58 Right.

03:58 The fact that it catches the local variables in production would be kind of nice.

04:02 Yeah.

04:03 I think it really comes down to, are you using any external packages?

04:07 If there's a pip install or a condo install, right?

04:10 You could wrap this up to include friendly traceback, right?

04:13 But if you're using none of that, then all of a sudden, yeah, this is like another burden, right?

04:17 Yeah.

04:18 Yeah.

04:19 This actually looks a lot like the pytest tracebacks.

04:21 And I sometimes find that those are useful and sometimes not so useful.

04:25 Yeah, exactly.

04:26 I think it depends.

04:27 Like they said that there's a lot of tools to make exceptions better for advanced developers.

04:31 This is not that.

04:32 This is something else.

04:33 One of the things I thought that was cool about this is there's three ways to like integrate

04:37 it into your app.

04:38 You can install it as an exception hook.

04:41 So all exceptions in the application are caught, which is cool because I mean, I knew that that

04:46 was possible, but I didn't really, I've never really played.

04:49 I'm like, oh, look how easy it is.

04:50 You just take a function and assign it to this callback globally and any error will go through

04:56 it.

04:56 That's cool.

04:56 So you can do that at the top of like your startup or your app.

04:59 You can actually in a try except block, you know, say, explain friendly traceback explain,

05:05 and it'll do that on demand.

05:08 But both of those require you changing your code.

05:10 You can also use it from the outside.

05:11 When you run an app, you can say dash M friendly traceback, then the script file.

05:16 And it'll, that way you can run it on code.

05:19 That's not modified to be like friendly, I guess.

05:21 Unfriendly code.

05:22 I don't know.

05:23 Anthony, let's wrap this up.

05:25 What are your thoughts on this thing?

05:26 Looks good.

05:27 I'm actually teaching my brother how to program right now.

05:29 And he was pretty overwhelmed by the first stack trace that he saw.

05:32 And I think if he would have saw something like this, it would have changed his perception

05:37 about errors and maybe taught him something more than I had to teach him after the fact.

05:41 Exactly.

05:42 I mean, yeah, I totally, I think it's awesome.

05:45 I really, really liked the idea.

05:47 I do take Matt's point as valid as like, it's now a step preceding actually writing code

05:53 that they've got to go through.

05:54 So depending on how much control you have over their environment, if you can like get this

05:58 in place for them, then maybe it's no big deal.

06:00 But yeah.

06:00 Anyway, worth considering.

06:02 Matt, what do you got next for us?

06:04 So recently the Pandas developers released the Pandas user survey from 2019.

06:10 So this came out last week.

06:11 They had a call on Twitter and they had about 1200 responses.

06:18 Yeah, that's cool.

06:19 I was surprised they got so many folks to participate.

06:21 That's really, those are really solid numbers for statistics.

06:24 Yeah.

06:24 I think we've got a link to the survey there, but some of the things that stood out to me

06:30 is that more than half of the people who responded have been using Pandas for less than two years.

06:36 Pandas has been out for quite a while now, but it seems to be one of, from what I see,

06:43 it's one of the key drivers of growth.

06:46 It's just sort of a central component of sort of the data analysis, data science space.

06:51 And it looks like they're getting a lot of new users there.

06:55 Yeah.

06:55 And so, you know, we talked about the incredible growth of Python before,

06:59 and a lot of that has to do with this big inflection point where Python was largely adopted,

07:04 moved to by data scientists.

07:06 Do you think this is like an indicator of that?

07:09 Like there's all these new folks coming into the Python ecosystem, and they're often coming into the data science space.

07:14 And so, hence, they haven't been here for many, many years?

07:17 Yeah, I think so.

07:18 I mean, that's the sexy thing right now, data science.

07:21 And it kind of didn't exist before.

07:24 So there's a big push for AI, ML, that sort of thing.

07:27 But if you look at, you know, the number one tool that data scientists use,

07:32 it's often called Python.

07:33 And I would say that Pandas is probably the number one tool of those people who are using data science.

07:40 Cool.

07:41 What else is in the survey?

07:42 That was really interesting is they have the numbers of the operating system.

07:45 So I sort of geek out on that sort of thing.

07:47 But this isn't what I thought I would have thought at all, because some definitions of data scientists are they're a statistician who uses a MacBook in San Francisco.

07:57 Yeah, you definitely do better data science if like the back of your laptop glows.

08:02 An Alienware, maybe, but certainly an Apple.

08:05 No, I'm just kidding.

08:07 Yeah, so the numbers, actually, they have about 60% of their users use both Linux, 60% use Windows and 42 Mac.

08:17 So I wouldn't have thought that.

08:19 I mean, obviously, those don't add to 100.

08:21 So I imagine you've got a lot of deployments on Linux, that sort of thing.

08:25 But people tend to forget that Linux sort of rules the enterprise world.

08:30 And I think this might be, you know, further indication that Pandas adoption or Python's adoption

08:36 is not limited to just startups and people hacking around on their MacBook.

08:40 I'm always super surprised by the percentage of Windows users when you consider Python.

08:45 And I guess it makes sense because it's really easy to get started with Python on Windows.

08:49 But it's the number just blows me away every time.

08:51 Yeah, absolutely.

08:53 Steve Dower did a great talk at PyCon called Python is OK on Windows, actually, or something like that.

08:58 And he had some really interesting...

09:00 Yeah, it was really good.

09:00 And it had some interesting statistics.

09:02 I feel like the Windows Python developers are somewhat in this realm of like the dark matter developers

09:09 in that you know they're using it because it keeps showing up in these surveys.

09:12 But you go to PyCon and there's many more MacBooks toting around and whatnot.

09:17 But yeah, it's definitely a good thing to remember.

09:19 And, you know, honestly, Matt, the thing that surprised me most here is how high Linux is in this group.

09:24 Yeah, I imagine that's deployments.

09:26 But the other interesting number here is Python 3 percentage.

09:31 And Python 3 percentage is 93%.

09:34 So legacy Python goodbye there, I guess.

09:37 The data scientists move on to the latest and greatest.

09:40 Yeah, the data scientists are leading the way with ditching legacy Python.

09:45 I mean, the whole Python 3 statement came out of that space, which is pretty cool.

09:49 I think that also has to do with less legacy code, as well as the models and the technology are changing so fast.

09:56 You don't keep building on the same code.

09:58 You're like, forget that.

09:59 We're going to go to TensorFlow because this whole thing is slow and wrong, right?

10:02 Mm-hmm.

10:03 Yeah, like you said, I think they're leading the way.

10:04 Well, speaking of Python 3, I think Anthony has something around that as well.

10:09 Yeah.

10:10 I do indeed.

10:11 Yeah.

10:11 Let's talk about the Y2K problem that I kind of stumbled upon recently.

10:16 The YP3 problem?

10:18 Yeah.

10:20 Yeah, okay, what is this?

10:21 Python 3.8, close to release, and Python 3.9, right around the corner.

10:25 There's the question that comes up, which is, what is going to come after?

10:28 And we really have two main choices, which would be like Python 3.10 or Python 4.0.

10:34 But both of these present problems just because of their version number.

10:38 So there's a significant amount of code out there that's using the sys.version and sys.version info variables in the sys module.

10:46 Right, and trying to just take that string and go, is there a Python 3 or 2 in here or something like that, right?

10:52 Yeah, there's a lot of slicing or checking the first character of that string, and it presents a number of different problems.

10:59 The most common one that I've seen is when you access sys.version and look at the first three characters, and that works fine.

11:07 It's like 2.7 or 3.6 or whatever.

11:10 But as soon as the second minor version of Python becomes 10, you're suddenly reporting Python 3.1 again.

11:16 All right, sorry, this doesn't support Python 3.1.

11:18 You need at least version 3.5.

11:19 You're like, no, it is 3.5.

11:22 There's an even worse situation where if you check, even if you're doing it correctly and using sys.version info,

11:29 if you only check the first number and only check a quality, so like say sys.version info 0 equals equals 3,

11:35 that's a perfectly fine check if you're checking if it's exactly Python 3.

11:40 But as soon as Python 4 happens, that condition is going to be false.

11:44 And guess what?

11:45 You're going to run Python 2 again.

11:47 Yeah, some of the maintainability libraries like 6 have this in there, right?

11:51 Yep.

11:51 Yeah, 6 is broken if you change the version to 4, which is a little scary,

11:56 given it's one of the most installed libraries, as we'll see later.

12:00 Yeah, well, you know, I guess that makes sense because 6 is not divisible by 4, so it's probably fine.

12:05 No, actually, this is really tricky.

12:07 You know, it reminds me of Windows 10, right?

12:09 Like, if you look at the Windows operating system numbers, we had Windows 7, which was XP,

12:15 Windows 8, which was Vista, and then Windows 10.

12:18 And the reason they don't have a Windows 9 is exactly this.

12:21 Like, so many people were doing substring searches for Windows space 9 for looking for 95 or 98.

12:28 And so if they had a 9 that was beyond, you know, Vista or whatever.

12:32 Yeah, even Oracle Java had that problem.

12:34 Yeah, it's, so they just said, you don't forget it, we're going to 10.

12:38 But it doesn't sound like skipping to 4 is going to make this better, probably worse.

12:42 I don't know, which do you think is the worst way to go?

12:44 I think skipping to 4 is going to be worse.

12:46 Most of the things with the 3.10 release will just be, like, slightly broken.

12:52 But trying to run Python 2 code in Python 3 is way more broken.

12:57 Yeah, for sure.

12:58 So I haven't followed this, but I recall, and maybe my memory is just fading me,

13:03 that there was talk that there would never be a Python 4.

13:07 So has that changed?

13:09 As far as I know, that hasn't changed.

13:12 I think the jury's still kind of out on that one.

13:15 Like, from what I understand, there was talks of, like, Python 4 just being the next version of Python 3.

13:22 But I don't think anyone has definitively chosen whether it'll be 3.10 or 4.0 next.

13:27 Yeah.

13:28 Yeah.

13:28 I mean, we're at this crossroads, right?

13:30 Guido has expressed a dislike of double-digit second version numbers.

13:34 But everyone is tired of this two versus three debate.

13:37 We don't want to kick it up a notch, right?

13:39 So where do you go from there, right?

13:41 We'll just release 3.9.9.9.9.9 forever.

13:44 That's right.

13:46 Yeah, it'll be fine.

13:47 But yeah, this actually is coming up pretty quickly.

13:49 So 3.9 will reach beta.

13:52 According to the PEP, we'll reach beta sometime in 2020.

13:55 And usually when the next version releases on beta, they start developing the version afterwards.

14:02 And so we'll start seeing 3.10 in the wild.

14:04 But I made a couple easy ways that you can start fixing these problems before they're a problem, I guess.

14:10 One of them, I pre-built a version of Python 3.10.

14:13 Well, it's actually 3.8, but with a fake version number.

14:16 And you can run that directly on Ubuntu today.

14:19 And I made a Flake 8 plugin which checks for these common issues called Flake 8 2020.

14:24 Nice.

14:25 Yeah, that's really cool.

14:26 Yeah, it makes it into your...

14:27 So does that suggest that you use version info instead?

14:30 Yeah, it makes the proper suggestion when it detects which thing that you're using incorrectly.

14:35 Cool.

14:35 Yeah, super.

14:36 And that's great.

14:36 Now, before we get to the next one, let me just tell you all quickly about Datadog.

14:40 They're a long-term supporter of the show.

14:42 And Datadog is a modern cloud-scale monitoring platform that brings all your metrics and logs and distributed traces together.

14:49 So basically, it will auto-instrument all the popular frameworks, Django, Flask, Postgres, whatnot.

14:55 And you can actually trace your requests and your performance across different service boundaries.

15:01 So not just what is your Python code doing or what is your database doing, but like all together in one coherent thing, which is cool.

15:07 If you go to a free trial with them, you'll get a cool Datadog t-shirt.

15:12 Just visit pythonbytes.fm/Datadog to get started.

15:16 Now, Anthony, you had hinted that we may come back to popular packages.

15:20 And some folks out of the – I think they're associated with the University of Michigan, but they also have their own consulting project, these two folks.

15:28 They did some interesting research on the current state of PyPI.

15:33 Now, sometimes people use BigQuery.

15:35 You can ask interesting questions like, well, what are the most common user agents downloading from PyPI?

15:40 Or what are the most common packages or whatever?

15:43 These folks went all in and they downloaded all of the packages from what I can tell.

15:48 Like all of them.

15:50 And then they started analyzing all sorts of stuff about them.

15:55 So they started saying, look, we downloaded 178,592 packages, which has roughly 1.7 million releases and 77,000 contributors.

16:09 And they also analyzed something that was pretty interesting is the connections or the interconnectivity or dependency graphs of these various things.

16:17 And they found there's 157 million import statements within these packages.

16:23 And then, yeah, they just did a bunch of analysis.

16:26 This is basically like an academic research paper.

16:29 So the thing I'm linking to is actually a PDF.

16:32 So, you know, look for a download you're going to get rather than a website they set up.

16:36 But, yeah, it's pretty interesting what these guys put together.

16:39 What do you see that caught your attention going through this?

16:42 I went to their what I read the paper and then I looked at they actually have a GitHub project.

16:47 And I wanted to actually pull the data from the GitHub.

16:49 But sadly, the data is not there.

16:51 It says coming soon because I wanted to do some analysis on it.

16:54 Ah, bummer.

16:54 So my question is, what do you think is the most common third-party library?

16:58 And it wasn't what I thought it would be.

17:01 I mean, my guess was just going to be like six because you see that everywhere.

17:04 Okay.

17:05 All sorts of projects.

17:06 Yeah, it's super, super low level.

17:08 Yeah, I would have guessed requests.

17:09 Yeah, that would have been my guess as well, I think.

17:11 But the most common was NumPy, which surprised me.

17:16 Those data scientists, they're really dominating.

17:19 I don't know.

17:20 They definitely are.

17:21 Wow.

17:22 How interesting.

17:22 So certainly, so many of these libraries that are in the data science space do seem to all focus in on NumPy as the foundation, don't they?

17:31 Yeah, sort of built around that as well.

17:33 Yeah, I wonder how much more commonality there is, like more shared foundation there is in the data science space rather than, say, the web space.

17:41 Where you've got Flask, Django, Pyramid, Bottle, Molten, whatever.

17:46 And they all kind of have their own foundation.

17:48 So that breaks up their potential high radius.

17:52 Yeah, yeah, yeah.

17:53 That was definitely interesting, NumPy.

17:55 I wouldn't have guessed that, but I guess it does make sense.

17:58 So some interesting things that I saw was within PyPI, they said they find that the growth of PyPI itself, all the packages, has been robust under all measures.

18:10 With an annual compound growth of 47% year over year for the number of active packages.

18:17 And 39% for new authors.

18:20 And 61% for new import statements.

18:24 So I guess that means Python packages are becoming more dependent on each other.

18:28 Yeah, that doesn't make sense.

18:31 Again, when I'm doing a training, I will go to PyPI, the Python package index.

18:37 And I point them at that number.

18:39 And I'm at it right now.

18:41 And it says 193,830 projects right now.

18:44 And I think that's pretty mind-blowing.

18:48 But also, like you said, you've got 47% growth in there.

18:52 39% for new authors, right?

18:55 Yeah, that's incredible.

18:56 Apparently, it's somewhat straightforward for someone to come into Python, make a package, and start contributing and sharing it with the community.

19:05 Yeah.

19:06 I think the new authors is the most impressive stat from there.

19:09 Like, it means that people are coming into the community and, like, building stuff for other people, which is great.

19:15 That's a really good point.

19:16 Yeah, absolutely.

19:16 That's a super positive number.

19:18 That's really high growth when you're talking about you already have 77,000 authors, right?

19:23 Yeah.

19:23 Some other real quick stats I thought was interesting.

19:26 They have the number of active packages, which is a much smaller number than the total packages.

19:31 But in 2005, you could go to PyPI, and you could literally just kind of browse all the active packages.

19:38 There were 96.

19:39 Yeah.

19:40 So in the early days, it was useful to have it, but it was not quite as amazing as almost 200,000 now.

19:46 Before PyPI, there's this cheese shop, which I think was the predecessor of that.

19:51 And it was sort of a single web page, and it had, like, here are the categories, right?

19:56 And so on that web page was the list of packages.

19:58 But yeah, this is crazy.

20:00 Yeah.

20:01 So the cheese shop, you're telling me, is kind of like Yahoo for packaging.

20:06 Yeah.

20:07 All right.

20:09 Final stat from this analysis.

20:10 The most popular license for packages in the Python space is MIT.

20:15 They've got all the lists, all of them listed there.

20:18 That's pretty cool.

20:18 All right, Matt, what's this next one?

20:20 Speaking of data scientists, you got another one for us.

20:23 Yeah.

20:23 So speaking of data scientists and sort of, I guess, the proliferation of, Michael mentioned proliferation of web frameworks,

20:31 I came across a new project that I hadn't seen before recently called DAPI, D-A-P-I.

20:38 DAPI.

20:39 DAPI.

20:41 And it sort of labels itself as pandas for humans.

20:49 And so I just think this is interesting now.

20:54 We're going to be able to do that.

20:55 And I recall, you know, when Django came out, at that time, there was another popular web framework called Turbo Gears.

21:03 And there is sort of a faction in the Python community of like, are you a Turbo Gears person or you're a Django person, right?

21:10 And they both sort of had their pluses and minuses, right?

21:14 But I think, and Turbo Gears has sort of morphed into what we see as pyramid these days.

21:19 But I see that there's been benefit.

21:21 I think in general competition is good.

21:23 And there's been benefit from that.

21:25 This is an interesting library.

21:27 It looks like it's sort of is pandas-esque.

21:30 It's got portions of pandas in it, but it also has scikit-learn in it, yellow brick, which is a visualization tool for machine learning,

21:39 and NumPy as well.

21:40 And it says explicitly on there that it's designed for data analysis, not for coders,

21:45 which I think that's trying to say that maybe pandas is a little too complicated

21:51 and that data analysts maybe need something a little bit more simple than that.

21:56 But I think it's interesting that there's now a proliferation and people are using Python.

22:01 And maybe they're saying, oh, this, we want to use Python, but we want to use maybe something simpler.

22:06 And there's a proliferation there.

22:08 Yeah, I think it's super interesting.

22:09 One of the things it seems to do is also leverage the simpler startup idea, kind of like you talked about before.

22:16 Like a lot of folks say, well, you get started by setting up a Jupyter server and installing, you know, pandas and NumPy and all that stuff.

22:23 And one of the things you can do with this is you can have one of these data sheets and you can say show and it will like print out an ASCII representation of the table and stuff like that.

22:34 Yeah.

22:34 In general, with most software, like a good five minute out of the box experience is really good for bringing someone on.

22:41 Right.

22:42 It'll be interesting to see what happens to this moving forward, because what I'm also seeing is a lot of new projects are taking the interface from pandas and replicating that.

22:55 I mean, you've talked to people who are doing similar things, right?

22:57 But like Dask, for example, and stuff like that.

23:00 Like Dask.

23:01 I was just playing the other day with a library called CUDF.

23:06 I don't know how you pronounce that, but basically it's a pandas on top of CUDA.

23:13 So you can leverage your GPU to do pandas like operations.

23:17 So it'll be interesting to see where that goes.

23:19 It looks like in general that the data science community is sort of honing in or adopting the pandas interface as sort of a standard interface.

23:30 But, you know, is there room for improvement, room for something more for humans?

23:34 I guess that remains to be seen there.

23:37 Yeah.

23:37 It definitely seems like a lot of flowers are blooming.

23:39 Yeah.

23:40 Which I think in general is good.

23:41 Competition is good.

23:42 And, you know, if you only have one tool, you have to use that tool.

23:47 Right.

23:47 But if there are multiple tools and some are better at certain things, then I think it pushes everyone to be better.

23:52 So appreciate the competition there.

23:54 For sure.

23:54 How do you think a programming library that's not for coders works out?

23:59 Yeah.

23:59 I'm not sure how to interpret that.

24:02 You start by installing friendly traceback and then you go from there.

24:05 Yeah.

24:06 Good one.

24:07 I mean, I also consider Excel a programming environment.

24:11 Right.

24:11 I think Excel is the most common programming environment in the world.

24:15 And lots of people use it.

24:17 They won't admit that they're programming.

24:19 But, I mean, if you do a VLOOKUP or something like that, you're programming using Excel.

24:24 And Google Sheets is a great database.

24:26 Yeah.

24:27 At some point, I think you have to bite the bullet and learn some syntax.

24:31 And so I'm not quite sure how to interpret that statement there.

24:35 But friendly interface, Pandas has gotten some slack for some things that might not be super intuitive or not Pythonic in that way.

24:43 So whether this is an improvement on that.

24:45 Maybe it's a way to graduate to Pandas.

24:47 Yeah.

24:48 Yeah.

24:48 This is your training wills.

24:50 Yeah.

24:50 Potentially.

24:51 Potentially.

24:52 Now, I started off this whole conversation by saying we could use this friendly traceback to possibly gather information about crashes on the server because it captures local.

25:03 But, Anthony, this next one you've got might take it up a notch, right?

25:07 Yeah.

25:07 So I'm actually going to talk about Python Remote PDB.

25:10 This is intended to be a small over-the-network remote debugger.

25:13 It's a very, very thin wrapper around PDB that ships in a single file.

25:19 It's really easy to kind of drop into an existing environment and just, like, add it to your path using Python Path.

25:26 Or you can pip install if that's easier for you.

25:28 It doesn't have all that many features.

25:30 There's a bunch of other remote debuggers that are much more powerful.

25:33 Things like PUDB or RPDB or PyCharm's debugger or Visual Studio Code's debugger or, like, any of the other things that are brought to the table.

25:43 But I found this.

25:44 It was pretty simple.

25:45 It solved my use case and worked pretty well.

25:48 That's cool.

25:48 So it integrates with Python's new breakpoint feature, which lets you plug in a new debugger, right?

25:56 Yep.

25:56 Yeah.

25:57 You just set some environment variables.

25:58 And any time you call breakpoint in your code, the runtime knows how to import the right module and call the right stuff to call your debugger.

26:06 So if you wanted to call remote PDB, you would just set Python breakpoint equals remote PDB dot set trace, and it would just do the right thing.

26:15 Yeah.

26:15 That's pretty cool.

26:16 Yeah.

26:17 And the access for this tool is you generally just use, like, Telnet or Netcat or SocketCat or anything that can talk to a socket and basically gives you a PDB session remotely.

26:29 I'm actually working on creating my own text editor just so I can learn curses.

26:33 And it was really useful to be able to debug a curses application because you can't really enter PDB while it's trying to paint your screen.

26:40 Yeah.

26:41 That's interesting.

26:41 So, I mean, even though it sets up a little server, you don't have to have it be on a different machine, right?

26:47 Yep.

26:48 Yeah.

26:48 I was just developing it in one tab, and I had a debugger in my other tab.

26:51 Yeah.

26:52 Yeah.

26:52 So, in general, anything that's printing to – if you've got something that's printing to the screen or maybe doing input from the screen, that might be a case where this would be more appropriate than the built-in debugging tools of Python.

27:04 Yeah.

27:04 Another use case might be if you're, like, using a web server, although, like, Flask has nice tools for using a debugger, and Pyramid does as well, and I'm sure the others do also.

27:13 But it's a potential tool.

27:15 Yeah, that's cool.

27:16 Yeah, more tools are good.

27:17 Tell folks really quick what curses is.

27:19 Curses is a library which allows you to paint kind of graphical user interfaces, but in a terminal.

27:25 It's kind of how text editors like Vim or Nano or Emacs draw out their UI.

27:31 Or if I wanted to create a game for, like, a BBS.

27:34 Oh, yeah.

27:35 You can make games with it, too.

27:36 I've seen some really good curses games.

27:37 Yeah, that's pretty cool.

27:38 All right.

27:39 Well, that's definitely a good one, and I hadn't heard of it.

27:41 So, yeah, thanks for sharing that.

27:42 And that's it for our main topics, but I do have a few quick extras I know we all do that I want to share.

27:48 I just want to share a story that just made me laugh.

27:51 I really love it.

27:52 So, there was this – the press calls them a hacker.

27:55 I don't really know what this person would be classified as because this is like a pretty low-level hack.

28:00 But a person was trying to avoid getting parking tickets, and they assumed that if what they could insert into the field that contains the number for the license plate was null,

28:13 the systems like at, you know, the county or whatever is going to give you a ticket, say, oh, there's no address or there's no license plate here, so we can't send them a ticket.

28:22 But in fact, quite the opposite is the case.

28:25 So, there's this person who got a custom license plate, which you can do in the U.S. and have like words on it, and they got the word null, N-U-L-L, all capital.

28:33 And then all of a sudden, all the other places where there actually were nulls in the database started directing to this license plate,

28:44 and they got $12,000 in parking tickets without even parking illegally because they started to receive all the failure cases of the database for the parking.

28:53 Ah, guess he hacked himself.

28:55 Exactly.

28:56 I don't know, hackers would take kindly to the naming there, but I don't think it worked out how he wanted it to.

29:03 It was definitely a backfire.

29:04 Anyway, I'll link into that.

29:05 That's pretty funny.

29:06 And then just really quick, PyCon 2020 has been sort of officially announced.

29:11 It's going to be earlier this year.

29:13 I'm trying to figure out exactly.

29:15 Yeah, yeah.

29:16 It's going to be April 15 to 23 in Pittsburgh, and the website is already up.

29:22 So go check it out.

29:24 Sign up.

29:24 Maybe you can submit a talk.

29:26 I'm not entirely sure, but the website at least is already up.

29:28 So I'll link to that, and people can check that out.

29:30 Matt, what do you got?

29:31 I recently released a course on Pluralsight on the XGBoost library.

29:37 So XGBoost, if you're not familiar with it, is a library that a lot of people are using with great success in like Kaggle competitions for analyzing structured data and making predictive models around that.

29:49 So if you're interested in an in-depth course on XGBoost, not only how to use it, but how to tune it, how to understand what the model is predicting when it comes out, check that out.

30:02 I've got a bit.ly link, bit.ly slash PSXGB, Pluralsight XGBoost.

30:08 So PSXGB if you're interested in that.

30:11 Nice.

30:11 Yeah, we'll put the links to the show notes.

30:12 Congrats on the new course.

30:14 You and I have both written a fair number of online courses, and that's a lot of work.

30:17 Yeah, thank you.

30:18 Yeah, I'm excited about it.

30:19 I think it's a great course for anyone interested in XGBoost.

30:23 Yeah, awesome.

30:23 Anthony?

30:25 Cool.

30:25 I've got one quick little library.

30:27 This is on the same line as the Curses work above.

30:30 I found a tool called Ekate.

30:32 I don't know how to pronounce it.

30:34 It labels itself as a Selenium web driver, but for the terminal.

30:39 And it's kind of a cool library that allows you to control a process in the background and make assertions about it as a testing library.

30:46 So are you sending Curses commands, or is that like expect?

30:51 So the way it works is it runs a TMux server in the background and then uses the TMux commands to interact with it.

30:58 So it'll sound like up arrow or like control X.

31:02 Like sends keys, kind of.

31:03 Yep, pretty much.

31:04 Ah, that's cool.

31:06 I can see utility in that for like driving demos and that sort of thing as well.

31:11 That's cool.

31:12 Oh, yeah, that would be cool.

31:13 Yeah, you sit there and it looks like you're typing and you're just flying through it.

31:16 And you just get up.

31:17 You say, let me show you some over here and it just keeps going.

31:19 Feel like.

31:19 Yeah.

31:21 Yeah, and you could make it understand Emacs.

31:24 It could even control Emacs.

31:25 Awesome.

31:26 Yeah.

31:26 Yeah, it works great for Emacs too.

31:28 Sweet.

31:29 All right.

31:30 We always sit in the show with a joke or two.

31:32 And this one is not like a laugh out loud sort of joke.

31:36 But Matt, you're here.

31:38 You do a lot of data science.

31:39 I thought I'd bring some sort of scientific-esque humor here.

31:43 And I just, this one just really is deeply satisfying to me.

31:46 So I'll just, it's a little story.

31:47 I'll get you all a reaction in a minute.

31:50 So there are two mathematicians sitting at a table in a pub having an argument about the

31:54 level of math education among the general public.

31:57 Like one of them is defending overall math knowledge.

32:00 And he gets up and goes to the restroom.

32:02 And on his way back, he wants to prove his point, right?

32:06 So he encounters the waitress and says, hey, I'll give you an extra $10 on your tip if you

32:11 can answer a question for me.

32:12 It doesn't matter what I ask.

32:14 Just say the words X squared.

32:17 X squared, okay?

32:17 She's like, yeah, sure.

32:18 No problem.

32:19 So a few minutes later, the guy sits back down with his buddy.

32:22 He says, I'll bet you $20 that even our waitress can tell us the integral of 2X.

32:26 And the cynic's like, yeah, I'll take that bet, buddy.

32:29 No problem.

32:29 So he beckons her over to the table, asks the question to which she replies, X squared.

32:34 And this mathematician begins to go and demand is winning.

32:37 And she says, plus a constant.

32:39 I don't know why I like that one.

32:45 At least she knows.

32:47 Yeah, I don't know why I like that one, but it's just satisfying to me.

32:50 So yeah.

32:51 So on that, brief aside, there was a poll on Twitter the other day whether they should

32:57 teach statistics or calculus in high school.

33:00 That's an interesting question.

33:01 And I said, the only times I've used calculus since high school, even though I much enjoyed

33:05 the class, was tutoring other people in calculus.

33:10 That's a good career path, by the way.

33:12 Just, you know, when you're young.

33:13 That's right.

33:15 No, I hear you.

33:16 But I honestly, if you're going to throw up a math class in high school for the first sacrifice,

33:21 geometry.

33:22 It's got to be geometry.

33:24 Replace that with some computer programming.

33:26 Serve as the same purpose.

33:27 Logical thinking.

33:28 Let's do it.

33:29 Awesome.

33:31 But yeah, I definitely take your point.

33:33 All right.

33:34 Anthony, do you have another joke or you want to, should we wrap it up?

33:37 I had a Golang joke prepared, but then I panicked.

33:40 Whoa.

33:41 Whoa.

33:42 I would cut that too.

33:44 No, no.

33:45 Don't worry about it.

33:46 It's all good, man.

33:48 Thank you, Matt Harrison, Anthony Sotile.

33:52 Thank you both for being here.

33:53 It's been really great to have you, Matt, back on the show and Anthony here for the first

33:56 time.

33:57 Yeah.

33:57 Thanks for having me.

33:58 Yeah.

33:58 Thanks guys.

33:58 Bye.

33:59 Bye.

33:59 Okay.

33:59 Bye.

34:00 Thank you for listening to Python Bytes.

34:02 Follow the show on Twitter via at Python Bytes.

34:04 That's Python Bytes as in B-Y-T-E-S.

34:07 And get the full show notes at pythonbytes.fm.

34:10 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

34:15 We're always on the lookout for sharing something cool.

34:17 On behalf of myself and Brian Okken, this is Michael Kennedy.

34:20 Thank you for listening and sharing this podcast with your friends and colleagues.

Want to go deeper? Check our projects

Course: Python for the Absolute Beginner course

Beginners

HTMX + Flask

FastAPI

pytest book

Full transcript