Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


« Return to show page

Transcript for Episode #145:
The Python 3 “Y2K” problem

Recorded on Wednesday, Aug 28, 2019.

00:00 KENNEDY: Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is Episode 145, recorded August 28th, 2019. I'm Michael Kennedy, and Brian is away on vacation. Yeah, I was on vacation last week, now Brian's gone, but don't despair, we have two special guests, we have Matt Harrison.

00:19 HARRISON: Hello!

00:20 KENNEDY: Welcome, Matt. And Anthony Sottile. Welcome, first time here on the show, nice to have you here.

00:26 SOTTILE: Nice to be on the show.

00:26 KENNEDY: Yeah, it's great to have you, I'm looking forward to talking about all these things with you. Now, before we get on to our topics, let me just say real quickly the show is brought to you by Datadog, Check them out on pythonbytes.fm/datadog, more on that later. But I want to focus first on something that's going to help people learning Python, or teaching people who are learning Python, and that's this project called friendly-traceback. Matt you do a lot of teaching, what is your experience with folks, you know, what are their first programming experiences? Running into like... A traceback crash, like is it really clear for them?

01:03 HARRISON: Most of my training is with experienced technologists who understand what a traceback is, so I do do some with kids in elementary school, that sort of thing, but my issue is I like the idea here, but in my courses I teach explicitly, have them hit errors and teach them how to read the traceback and recover them.

01:28 KENNEDY: How to recover...

01:29 HARRISON: So I like the idea, the other thing that's, I have sort of an issue with is that you have to install it, right? So having someone who's a beginner install something, I don't know, what do you think about that?

01:42 KENNEDY: That's an interesting question because you, well, let me tell folks what it is real quick. So this one comes to us from Jose Carlos Garcia I think, because his Twitter name is in elite speak, so that's my attempt to understand it back into English. Thank you Carlos for sending that in, and the idea is really aimed at beginners, as you kind of hinted at, Matt. And let me just give folks a sense of what it is. So normal traceback will have, in reverse order, the callstack, and then maybe the line of code, and then the error message, like the error type, and then possibly the error message, right? Well what this does instead is it will catch the exception type like an IndexError, and then it has a little help message in an example, right? So it says "Oh, this is a Python exception, you got an IndexError, that means a list index is out of range in this case, and this error occurs under these circumstances, here's usually what happens, here's usually the cause." And then one thing I really like about it is it first shows the line of code where it happened, and then it actually shows the local variables. So like in the example there it has a function called get_last_item when you're passing a list, and it says "The list that was passed was [1, 2, 3], and then here's actually three or four lines of code where this error is happening and what the index value was and all that, so you could actually diagnose this without even stepping into a debugger, you could just say "Oh I see the index is 3, but it's actually 0, 1, 2." by just looking at the output on the screen here. And that I think is pretty cool.

03:11 HARRISON: Yeah, I think it's definitely useful, I looked over the documentation, I thought "This is really cool". My issue with beginners is how to walk them through something...

03:22 KENNEDY: Sure.

03:23 HARRISON: And installing is always a pain when they're beginners. I also, the other thing that's really cool about it is it looks like it has some hooks so you can define your own exceptions. You can customize how you handle those, so I think that looks kind of cool as well, I mean, I can imagine maybe if you're in a environment at work where you have support people. And they might have to look at your code, or look at your servers and take problems and resolve them, this might be something that could aid them.

03:58 KENNEDY: Right, the fact that it catches the local variables in production would be kind of nice. Yeah I think it really comes down to, "Are you using any external packages?" If there's a pip install, or a conda install, all right you could wrap this up to include friendly-traceback, right? But if you're using none of that, then all of a sudden, yeah, this is like another burden, right?

04:18 HARRISON: Mm-hmm.

04:19 SOTTILE: It actually looks a lot like the pytest traceback, I sometimes find that those are useful, and sometimes not so useful

04:25 KENNEDY: Yeah exactly, I think it depends. I think they said there's a lot of tools to make exceptions better for advanced developers, this is not that. This is something else. One of the things I thought that was cool about this is there's three ways to, like, integrate into your app. You can install it as an exception hook, so all exceptions in the application are caught, which is cool 'cause I knew that that was possible but I didn't really, I've never really played with that and like, "Oh look how easy it is, you just take a function and assign it to this callback globally and any error will go through it". That's cool, so you can do it at the top of the start of your app, you can actually, in a try-except block, you'll say "explain friendly-traceback", explain, and it'll do that on demand, but both of those require you changing your code, you can also use it from the outside when you run an app, you can say "-m friendly-traceback" within the script file, and it'll, that way you can run it on code that's not modified to be, like, friendly I guess. Unfriendly code do they call it? I don't know . Anthony, let's wrap this up, what are your thoughts on this thing?

05:26 SOTTILE: Looks good! I'm actually teaching my brother how to program right now and he was pretty overwhelmed by the first stacktrace that he saw, and I think if he would have saw something like this it would've changed his perception about errors and maybe taught him something more than I had to teach him after the fact.

05:42 KENNEDY: Exactly

05:43 SOTTILE: It looks pretty cool!

05:44 KENNEDY: Yeah, I totally agree, I think it's awesome, I really, really like the idea, I do take Matt's point is valid, it's like it's now a step preceding actually writing code that they've got to go through. So depending on how much control you have over their environment, if you can, like, get this in place for them then maybe it's no big deal, but, yeah. Anyway, we're just considering. Matt, what have you got next for us?

06:04 HARRISON: So recently the Pandas developers released the Pandas user survey from 2019 so this came out last week, they had a call on Twitter, and they had about 1200 responses...

06:18 KENNEDY: Yeah that's cool, I was surprised they got so many folks to participate, those are really solid numbers for statistics.

06:24 HARRISON: Yeah! I think we've got a link to the survey there but some of the things that stood out to me is that more than half of the people who responded have been using Pandas for less than two years. Pandas has been out for quite awhile now but it seems to be one of, from what I see, it's one of the key drivers of growth, it just sort of essential component of, sort of the... Data analysis, data science space and...

06:52 KENNEDY: Absolutely.

06:53 HARRISON: And it looks like they're getting a lot of new users there.

06:55 KENNEDY: Yeah and so, you know, we talked about the incredible growth of Python before and a lot of that has to do with this big inflection point where Python was largely adopted, moved to by data scientists, do you think this is like an indicator of that? Like there's all these new folks coming into the Python ecosystem and they're often coming into the data science space, and so, hence, they haven't been here for many, many years?

07:17 HARRISON: Yeah I think so. I mean, that's the sexy thing right now, data science, and kind of didn't exist before so there's a big push for AI, ML, that sort of thing, but if you look at, you know, the number one tool that data scientists use, it's often called Python, and I would say that Pandas is probably the number one tool of those people who are using data science.

07:40 KENNEDY: Yeah. Cool. What else is in the survey?

07:43 HARRISON: What was really interesting is they have the numbers of the operating systems. I sort of geek out on that sort of thing but this isn't what I thought, what I would have thought at all, 'cause some definitions of data scientists are they're a statistician who uses a MacBook in San Francisco.

07:57 KENNEDY: Yeah you definitely do better data science if, like, the back of your laptop glows at AlienWare maybe, but certainly an Apple. Nah, I'm just kidding

08:07 HARRISON: Yeah, so the numbers, actually they have about 60% of their users use both Linux, 60% use Windows, and 42% Mac. I wouldn't have thought that, I mean obviously those don't add up to 100 so I imagine we get a lot of deployments on Linux, that sort of thing, but... People tend to forget that Linux sort of rules the enterprise world and I think this might be further indication that Pandas adoption, or Python's adoption, is not limited to just start-ups and people hacking around on their MacBook.

08:41 SOTTILE: I'm always super surprised by the percentage of Windows users when you consider Python and, I guess it makes sense 'cause it's really easy to get started with Python on Windows, but it's, the number just blows me away every time.

08:51 KENNEDY: Yeah, absolutely. Steve Dower did a great talk at PyCon called, "Python is okay on Windows, actually", or something like that, and he has a great, interesting...

08:59 SOTTILE: That was a great talk.

09:00 KENNEDY: Yeah it was really good and it had some interesting statistics, I feel like the Windows Python developers are somewhat in this realm of the dark matter developers, in that you know they use it 'cause it keeps showing up in these surveys, but you go to PyCon and there's many more MacBooks toting around and whatnot, but, yeah, it's definitely a good thing to remember. You know, honestly, Matt, the thing that surprised me the most here is how high Linux is in this group.

09:25 HARRISON: Yeah, I imagine that's deployments but the other interesting number here is Python 3 percentage. Python three percentage is 93% so legacy Python goodbye there I guess, the data scientists move onto the latest and greatest.

09:40 KENNEDY: Yeah the data scientists are leading the way with ditching legacy Python, I mean the whole Python 3 statement came out of that space, which is pretty cool. I think that also had to do with less legacy code as well as the models, and the technology are changing so fast you don't keep building on the same code, you're like "forget that, we're going to go to Tensorflow 'cause this whole thing is slow and wrong". Right?

10:03 HARRISON: Mm-hmm. Yeah, like you said, I think they're leading the way. Well, speaking of Python 3, I think Anthony has something around that as well.

10:10 SOTTILE: I do indeed! Yeah, let's talk about the Y2K problem that I kind of stumbled upon recently.

10:16 KENNEDY: The YP3 problem? Yeah Yeah, okay, what is this?

10:22 SOTTILE: Python 3.8, close to release and Python 3.9 right around the corner, there's the question that comes up which is, "What is going to come after?" And we really have two main choices which would be like Python 3.10 or Python 4.0, but both of these present problems just because of their version number. There's a significant amount of code out there that's using the sys.version and sys.version_info, variables in the sys module.

10:46 KENNEDY: Right I'm trying to just take that string and go "Where is there a Python 3 or a 2 in here?" Or something like that, right?

10:52 SOTTILE: Yeah there's a lot of, like, splicing or, like, checking the first character of that string, and it presents a number of different problems. The most common one that I've seen is when you access sys.version and you look at the first three characters and that works fine in 2.7, 3.6, or whatever, but as soon as the second minor version of Python becomes 10, you're suddenly reporting Python 3.1 again.

11:16 KENNEDY: Right, sorry, this doesn't support Python 3.1, you need at least version 3.5, you're like "Nooo".

11:23 SOTTILE: There's an even worse situation where if you check, even if you're doing it correctly and you're using sys.version_info, if you only check the first number and only check the quality, so like say, sys.version[0] == 3, that's a perfectly fine check if you're checking if it's exactly Python 3, but as soon as Python 4 happens that condition's going to be false, and guess what? You're going to run Python 2 again.

11:47 KENNEDY: Yeah, some of the maintainability libraries like Six have this in there, right?

11:51 SOTTILE: Yup! Yeah, Six is broken if you change the version to 4. Which is a little scary given it's one of the most installed libraries as we'll see later.

12:00 KENNEDY: Yeah, well, you know I guess that makes sense because 6 is not divisible by 4 so it's probably fine. No, actually this is really tricky, you know, it reminds me of Windows 10, right? If you look at the Windows operating system numbers, we had Windows 7 which was X-P, Windows 8 which was Vista, and then Windows 10. And the reason they don't have a Windows nine is exactly this. Like, so many people were doing susbstring searches for Windows space 9 for looking for 95 or 98, and so if they had a 9 that was beyond, you know, Vista...

12:33 SOTTILE: Yeah even Oracle Java had that problem.

12:34 KENNEDY: Yeah it's so they just said "Forget it, we're going to 10." but it doesn't sound like skipping the 4 is going to make this better. Probably worse. Which do you think is the worse way to go?

12:44 SOTTILE: I think skipping the 4 is going to be worse, most of the things with the 3.10 release will just be just, like, slightly broken, but trying to run Python 2 code in Python 3 is way more broken.

12:57 KENNEDY: Yeah, for sure.

12:58 HARRISON: So I haven't followed this but I recall, and maybe my memory is just fading me, that there was talk there would never be a Python 4, so has that changed?

13:10 KENNEDY: As far as I know that hasn't changed.

13:12 SOTTILE: I think the jury is still kind of out... On that one. From what I understand there was talks of, like, Python 4 just being the next version of Python three, but I don't think anyone has definitively chosen whether it'll be 3.10 or 4.0 next.

13:28 KENNEDY: Yeah, I mean we're at this crossroads, right, Guido has expressed a dislike of double digit second version numbers, but everyone is tired of this 2 versus 3 debate. We don't want to kick it up a notch, right? So where do you go from there, right?

13:41 SOTTILE: We'll just release 3.9.9.9.9 forever.

13:47 KENNEDY: That's right. Oh, it'll be fine.

13:48 SOTTILE: Yeah this is actually coming up pretty quickly so 3.9 will reach beta, according to the PEP, will reach beta sometime in 2020, and usually when the next version release is on beta they start developing the version afterward, and so we'll start seeing 3.10 in the wild. But I made a couple of easy ways that you can start fixing these problems before they're a problem, I guess? In one of them I prebuilt a version of Python 3.10, well it's actually 3.8 but with a fake version number, and you can run that directly on Ubuntu today, and I made a flake8 plugin which checks for these common issues called flake8-2020.

14:25 KENNEDY: Nice Yeah, that's really cool. Yeah, it makes it into your...

14:28 HARRISON: So does that suggest that you use version info instead?

14:31 SOTTILE: Yeah it makes the proper suggestion when it detects which thing that you're using incorrectly.

14:35 HARRISON: Cool.

14:36 KENNEDY: Yeah, super. That's great. Now before we get to the next one let me just tell you all quickly about Datadog, they're a long term supporter of the show and Datadog is a modern cloud scale monitoring platform that brings all your metrics and logs and distributed traces together. So basically it will auto-instrument all the popular frameworks. Django, Flask, Postgres, whatnot, and you can actually trace your requests, and your performance, across different service boundaries, so not just "What is your python code doing?", or "What is your database doing?" But, like, all together in one coherent thing, which is cool. If you go to free trial with them you'll get a cool Datadog t-shirt, just visit pythonbytes.fm/datadog to get started. Now Anthony you had hinted that we may come back to popular packages and some folks out of the, I think they're associated with the University of Michigan, but they also have their own consulting project, these two folks, they did some interesting research on the current state of PyPI now. Sometimes people will use BigQuery, you can ask interesting questions like, "Well, what are the most common user agents downloaded from PyPI?" Or "What are the most common packages?" or whatever. These folks went all in and they downloaded all of the packages from what I can tell. Like, all of them. And then they started analyzing all sorts of stuff about them. So they start by saying, "We downloaded 178,592 packages, which has roughly 1.7 million releases, and 77,000 contributors", and they also analyze something that was pretty interesting is the connections, or the interconnectivity or dependency graphs of these various things, and they found there's 157 million import statements

16:21 SOTTILE: Wow!

16:22 KENNEDY: Within these packages. And then, yeah, they just did a bunch of analysis, this is basically like an academic research paper. So the thing I'm linking to is actually a pdf, so you know, look for a download you're going to get, rather than a website they set up. But yeah, it's pretty interesting what these guys put together. What do you see that caught your attention going through this?

16:43 SOTTILE: I went to their web, I read the paper and then I looked at, they actually have a GitHub project and I wanted to actually pull the data from the GitHub but sadly the data is not there, it says "coming soon", 'cause I wanted to do some analysis on...

16:54 KENNEDY: Ah, bummer.

16:55 SOTTILE: So my question is: what do you think is the most common third party library? And it wasn't what I thought it would be.

17:01 HARRISON: I mean my guess was just going to be like Six, 'cause you see that everywhere...

17:06 KENNEDY: Yeah it's super, super low level.

17:08 SOTTILE: I would have guessed requests...

17:10 KENNEDY: Yeah that would have been my guess as well, I think.

17:12 SOTTILE: The most common was NumPy.

17:14 HARRISON: Oh.

17:15 SOTTILE: Which surprised me.

17:16 HARRISON: These data scientists! They're really dominating.

17:20 SOTTILE: I don't know.

17:21 KENNEDY: They are. They definitely are, wow! How interesting. So, certainly, so many of these libraries that are in the data scientist space do seem to, like, all focus in on NumPy as the foundation, don't they?

17:31 SOTTILE: Yeah it's sort of built around that as well.

17:33 KENNEDY: Yeah, I wonder how much more commonality there is, like more shared foundation there is in the data science rather than, say, the web space? Where you've got Flask, Django, Pyramid, Bottle, Molten, whatever, and they're all, like, all kind of have their own foundation so that you know, breaks up their potential That was definitely interesting, NumPy. I wouldn't have guessed that, but I guess it does make sense. So some interesting things that I saw was, within PyPI they said they find that the growth of PyPI itself, all the packages, has been robust under all measures, with an annual compound growth of 47% year over year for the number of active packages. And 39% for new authors, and 61% for new import statements, so I guess that means Python packages are becoming more dependent on each other.

18:29 SOTTILE: Yeah that makes sense. Again, when I'm doing a training I will go to PyPI, the Python package index and I point them at that number and I'm at it right now and it says 193,830 projects right now, and I think that's pretty mind blowing, but also like you said, you've got 47% growth in there, 39% for new authors, right?

18:55 KENNEDY: Yeah, that's incredible.

18:56 SOTTILE: So apparently it's somewhat straightforward for someone to come into Python, make a package, and start contributing and sharing it with the community

19:06 HARRISON: I think that the new authors is the most impressive stat from there, it means that people are coming into the community and, like, building stuff for other people, which is great!

19:15 KENNEDY: That's a really good point, yeah, absolutely. That's a super positive number, that's really high growth when you're talking about you already have 77,000 authors, right? Some other real quick stats I thought was interesting, the number of active packages, which is a much smaller number than the total packages, but in 2005 you could go to PyPI and you could literally, you know, just kind of browse all the active packages, there were 96. And so in the early days it was useful to have it but it was not quite as amazing as almost 200,000 now.

19:47 HARRISON: Before PyPI there's this cheese shop which I think was the predecessor of that, and it was sort of a single web page and it had like "Here are the categories", right? And so on that web page was a list of packages but yeah, this is...

20:00 SOTTILE: Right

20:00 HARRISON: This is crazy.

20:01 KENNEDY: Yeah, so the cheese shop as you were telling me is kind of like Yahoo for packaging. Alright, our final stat from this analysis is the most popular license for packages in the Python space is MIT. They've got all of them listed, but that's pretty cool.

20:19 HARRISON: Alright Matt, what's this next one? Speaking of data scientists, you got another one for us. Yeah so speaking of data scientists and sort of, I guess, the proliferation of, Michael mentioned proliferation of web frameworks, I came across a new project that I hadn't seen before, recently called DaPy, D-A-P-Y.

20:39 KENNEDY: Da Pie!

20:40 HARRISON: DaPy I'm not sure the pronunciation there... And it sort of labels itself, it sort of labels itself as Pandas for humans. Not precisely but it says "for humans" somewhere in its web page. And so I just think this is interesting now so in my mention of web frameworks, I recall, you know, when Django came out, at that time there was another popular framework called TurboGears, and there was sort of a faction in the Python community of like, "Are you a TurboGears person or a Django person?", right, and they both sort of had their pluses and minuses, right, but I think, and TurboGears has sort of morphed into what we see as Pyramid these days, but I see there's been benefit, I think in general competition is good, and there's been benefit from that. This is an interesting library, it looks like it's sort of is Pandas-esque, it's got portions of Pandas in it but it also has Scikit-learn in it, YellowBrick which is a visualization tool for machine learning, and NumPy as well. And it says explicitly on there that it's designed for data analysis, not for coders, which I think that's trying to say that maybe Pandas is a little too complicated and that data analysts maybe need something a little bit more simple than that. But I think it's interesting that there's now a proliferation and people are using Python, and maybe they're saying, "Oh we want to use Python but we want to maybe use something simpler", and there's a proliferation there.

22:08 KENNEDY: Yeah I think it's super interesting, of the things it seems to do is also leverage the simpler start-up idea, kind of like you talked about before. A lot of folks say "well, you get started by setting up a Jupyter server and installing, you know, Pandas and NumPy and all that stuff", and one of the things you could do with this is you can have one of these data sheets and you can say "show", and it will, like, print out an ASCII representation of the table and stuff like that.

22:34 HARRISON: Yeah. In general with most software, like a good five minute out of the box experience is really good for bringing someone on, right? It'll be interesting to see what happens to this moving forward because what I'm also seeing is a lot of new projects are taking the interface from Pandas and replicating that, and you've talked to people who are doing similar things, right, but...

22:58 KENNEDY: Yeah like Dask for example, and stuff like that.

23:00 HARRISON: Like Dask, I was just playing the other day with a library called C-U-D-F, I don't know how you pronounce that but basically it's a Pandas on top of CUDA, so you can leverage your GPU to do Pandas-like operations. So it'll be interesting to see where that goes, it looks like in general the data science community is sort of honing in or adopting the Pandas interface as sort of a standard interface, but, you know, is there room for improvement? Room for something more for humans? I guess that remains to be seen there.

23:37 KENNEDY: Yeah it definitely seems like a lot of flowers are blooming.

23:40 HARRISON: Yeah. Which I think in general is good, competition's good, and you know, if you only have one tool you have to use that tool, right? But if there are multiple tools, and some are better at certain things then I think it pushes everyone to be better, so appreciate the competition there.

23:54 KENNEDY: For sure.

23:55 SOTTILE: How do you think a programming library that's not for coders works out?

23:58 HARRISON: Yeah, I'm not sure how to interpret that.

23:58 KENNEDY: You start by installing friendly-traceback and then you go from there.

23:58 HARRISON: Yeah, good one. I mean I also consider Excel a programming environment, right, I think Excel's the most common programming environment in the world, and lots of people use it. They won't admit that they're programming but I mean if you do a VLOOKUP for something like that you're programming using Excel so, you know...

23:58 SOTTILE: And Google Sheets is the great database.

23:58 HARRISON: Yeah. At some point I think you have to bite the bullet and learn some syntax and so I'm not quite sure how to interpret that statement there, but friendly interface? Pandas has gotten some slack for some things that might not be super intuitive or not Pythonic in that way so whether this is improvement on that.

23:58 KENNEDY: Maybe it's a way to graduate to Pandas.

23:58 HARRISON: Yeah. Yeah, this is your training wheels.

23:58 KENNEDY: Yeah, potentially, potentially. Now I started off this whole conversation by saying we could use this friendly-traceback to possibly gather information about crashes on the server 'cause it captures local, but Anthony, this next one you've got my ticket up a notch, right?

23:58 SOTTILE: Yeah so I'm actually going to talk about Python remote-pdb. This is intended to be a small, over the network remote debugger. It's a very, very thin wrap around pdb that ships in a single file, it's real easy to kind of drop into an existing environment and just like add it to your path using PYTHON_PATH, or you can pip install if that's easier for you. It doesn't have all that many features, there's a bunch of other remote debuggers that are much more powerful, things like pudb, or rpdb, or Pycharm's debugger, or Visual Studio Code's debugger, or like any of the other things that I brought to the table, but I found this, it was pretty simple and it solved my use case. And it worked pretty well.

23:58 KENNEDY: That's cool. So it integrates with Python's new breakpoint feature which lets you plug in a new debugger, right?

23:58 SOTTILE: Yup! Yeah you just set some environment variables and anytime you call breakpoint in your code, the runtime knows how to import the right module and call the right stuff to call your debugger. So if you wanted to call remote-pdb you would just set Python breakpoint equals remote-pdb.set_trace and it would just do the right thing.

23:58 KENNEDY: Yeah that's pretty cool.

23:58 HARRISON: Yeah!

23:58 SOTTILE: And the access for this tool is to generally just use like Telnet or Netcat or Socket CAT or anything that can talk to a socket and basically gives you a pdb session remotely. I'm actually working on creating my own text editor just so I can learn curses and it was really useful to be able to debug a curses application. Because you can't really enter a pdb while it's trying to paint at your screen

23:58 HARRISON: That's interesting. So I mean, even though it sets up a little server, you don't have to have it be on a different machine, right?

23:58 SOTTILE: Yep, yeah I was just developing it in one tab and I had a debugger in my other tab.

23:58 HARRISON: And so in general anything that's printing to, if you've got something that's printing to the screen or maybe doing input from the screen, that might be a case where this would be more appropriate than the built in debugging tools of Python.

23:58 SOTTILE: Yeah another use case might be if you're like using a web server, although like, Flask has nice tools for using a debugger, and Pyramid does as well, I'm sure the others do also, but it's a potential tool.

23:58 HARRISON: Yeah that's cool.

23:58 KENNEDY: Yeah, more tools are good. Tell folks really quick what curses is.

23:58 SOTTILE: Curses is a library which allows you to paint kind of graphical user interfaces but in a terminal, it's kind of how text editors like Vim or nano or Emacs draw out their UI.

23:58 KENNEDY: Or if I wanted to create a game for like a BBS

23:58 SOTTILE: Oh yeah you could make games with it too, I've seen some really good curses games.

23:58 KENNEDY: Yeah, that's pretty cool. All right, well that's definitely a good one and I hadn't heard of it so yeah, thanks for sharing that. And that's it for our main topics but I do have a few quick extras, I know we all do that I want to share, I just want to share a story that just made me laugh, I really love it. So there was this, the press calls them a hacker, I don't really know what this person would be classified as 'cause it's a pretty low level hack. But a person was trying to avoid getting parking tickets, and they assumed that if what they could insert into the field that contains the number for the license plate was null, the systems at you know, the county or whatever's going to give you a ticket would say, "Oh there's no address, or there's no license plate here. We can't send him a ticket." But in fact, quite the opposite is the case. So there's this person who got a custom license plate, which you can do in the US, and have like words on it, and they got the word "NULL", N-U-L-L, all capital. And then all of a sudden all the other places where there actually were nulls in the database started directing to this license plate, and he got $12,000 in parking tickets without even parking illegally because they started to receive all the failure cases of the database. For parking.

23:58 HARRISON: Ha, guess he hacked himself.

23:58 KENNEDY: Exactly.

23:58 SOTTILE: I don't know hackers would take kindly to naming there but I don't think it worked out how he wanted it to.

23:58 KENNEDY: It was definitely a backfire. Anyway, I'll link into that, it's pretty funny, and then just really quick, PyCon 2020 has been sort of officially announced, it's going to be earlier this year, trying to figure out exactly, Yeah, yeah, it's going to be April 15 to 23 in Pittsburgh, and the website is already up so you can go check it out, sign up maybe, maybe you can submit a talk? I'm not entirely sure but the website at least is already up so I'll link to that and people can check that out. Matt, what do you got?

23:58 HARRISON: I recently released a course on Pluralsight on the XGBoost library. So XGBoost if you're not familiar with it is a library that a lot of people are using with great success in like Kaggle competitions for analyzing structured data and making predictive models around that. So if you're interested in an in depth course on XGBoost, not only how to use it but how to tune it, how to understand what the model's predicting when it comes out, check that out, I've got a bit.ly link, bit.ly/psxgb, Pluralsight XGBoost, so P-S-X-G-B if you're interested in that.

23:58 KENNEDY: Nice, yeah we'll put the link to the shared notes. Congrats on the new course, you and I have both written a fair number of online courses and that's a lot of work.

23:58 HARRISON: Yeah, thank you. I'm excited about it. I think it's a great course for anyone interested in XGBoost.

23:58 KENNEDY: Yeah, awesome. Anthony?

23:58 SOTTILE: Cool, I've got one quick little library, this is on the same line as the curses work above, I found a tool called "hecate", I don't know how to pronounce it, it labels itself as a selenium webdriver, but for the terminal. And it's kind of a cool library that allows you to control a process in the background and make assertions about it as a testing library.

23:58 KENNEDY: That's really wild.

23:58 HARRISON: So you're sending curses commands? Are you, is that like "expect"?

23:58 SOTTILE: So the way it works is it runs a tmux server in the background and then uses the tmux commands to interact with it so it'll send like up arrow or like control X or...

23:58 KENNEDY: It sends keys, kind of.

23:58 SOTTILE: Yep, pretty much.

23:58 HARRISON: Huh, that's cool!

23:58 SOTTILE: Yeah, and it takes screen grabs...

23:58 HARRISON: I can see a utility in an app for driving demos and that sort of thing as well Oh yeah that would be cool.

23:58 KENNEDY: Yeah you sit there and it looks like you're typing and you're just flying through it, and then you just get up "let me just show you something over here" and it just keeps going. People are like, "whoa!"

23:58 SOTTILE: Yeah. Yeah. And you can make it understand Emacs, you could even control Emacs, awesome!

23:58 HARRISON: Yeah. Works great for Emacs too.

23:58 KENNEDY: Sweet, all right, we always end the show with a joke or two, and this one is not like a laugh out loud sort of joke but Matt, you're here, you do a lot of data science, I thought I'd bring some sort of scientific-esque humor here and, I just, this one just really, deeply is satisfying to me. So I'll just, it's a little story, I'll get you all reaction in a minute. So there are two mathematicians sitting at a table in a pub having an argument about the level of math education among the general public. Like one of them is defending overall math knowledge, and he gets up and goes to the restroom, and on his way back he wants to prove his point, right, so he encounters the waitress and says, "Hey, I'll give you an extra $10 on your tip if you can "answer a question for me. It doesn't matter what I ask, just say the words x-squared. X-squared, okay?" She's like "Okay, sure, no problem". So a few minutes later the guy sits back down with his buddy and says "I'll bet you $20 that even our waitress can tell us the integral of 2X." And the cynics like "I'll take that bet buddy, no problem", so he beckons her over to the table, asks the question to which she replies "x-squared". And this mathematician begins to goad and demand his winnings, she says, "plus a constant." I don't know why I like that one.

23:58 HARRISON: At least she knows.

23:58 KENNEDY: Yeah, I don't know why I like that one but it's just satisfying to me, so, yeah.

23:58 SOTTILE: So on that, a brief aside, there was a poll on Twitter the other day whether they should teach statistics or calculus in high school.

23:58 KENNEDY: That's an interesting question.

23:58 SOTTILE: And I said the only times I've used calculus since high school, even though I much enjoyed the class, was tutoring other people in calculus.

23:58 KENNEDY: That's a good career path by the way, just, you know, when you're young. No I hear you but I honestly, if you're going to throw up a math class in high school for sacrifice, geometry. It's got to be geometry. Replace that with some computer programming, serves the same purpose, logical thinking, let's do it.

23:58 SOTTILE: Awesome

23:58 KENNEDY: But yeah, I definitely see a good point. Alright, Anthony did you have another joke? Or you want to, should we wrap it up?

23:58 SOTTILE: I had a golang joke prepared, but then I panic'd

23:58 HARRISON: Whoa!

23:58 KENNEDY: Whoa Ouch

23:58 SOTTILE: You can cut that too

23:58 KENNEDY: Now now, it's all good

23:58 HARRISON: Don't worry about it

23:58 KENNEDY: It's all good, man. Thank you Matt Harrison, Anthony Sottile. Thank you both for being here, it's been really great to have you back on the show, and Anthony here for the first time.

23:58 SOTTILE: Yeah thanks for having me.

23:58 HARRISON: Yeah thanks for having me.

23:58 KENNEDY: Yeah, thanks guys. Bye.

23:58 HARRISON: Okay, bye!

23:58 KENNEDY: Thank you for listening to Python Bytes. Follow the show on Twitter via @pythonbytes, that's Python Bytes as in B-Y-T-E-S, and get the full show notes at pythonbytes.fm. If you have a news item you want featured, just visit pythonbytes.fm and send it our way, we're always on the lookout for sharing something cool. On behalf of myself and Brian Okken, this is Michael Kennedy. Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page