Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #39: The new PyPI

Return to episode page view on github
Recorded on Tuesday, Aug 15, 2017.

00:00 Hello and welcome to Python Bytes where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 39 recorded August 14, 2017. I'm Brian Okken and again Michael is on vacation and we have a guest host and this week we have Mahmoud Hashemi. Hey Mahmoud.

00:19 Hi there. Great to be here.

00:22 Yeah, you've been on Test and Code and you've been on Talk Python a couple times.

00:27 Yeah, a couple of my faves for sure.

00:29 Yeah, well, when I was looking up Talk Python, I noticed that you were on episode 4 and 54.

00:35 Yeah, and I don't know when Guido was on, you know, Michael was kind enough to ask my question, and I did like a panel thing. I don't know, I guess. Yeah, it's been really nice to have repeats appearances. People recognize me by my voice now. It's kind of, kind of strange, but like, I'm very appreciative at the same time.

00:52 That's good. That's great. And so thanks a lot for helping to do this today.

00:56 Yeah, hopefully I can do Michael right taking his spot here.

00:59 Well, let's just jump right in. I'm really excited about your first topic.

01:02 Oh, sure. So let's see.

01:04 First up, I mean one thing that's been on my radar, I'm not sure if you guys talked about this before, like sometimes I'm listening to Python Bytes and it's a little bit garbled or something.

01:12 Have you guys tried calling decode?

01:14 I'm kind of curious like why it's not Python strs.

01:17 But one thing that's been on my radar is the new PyPI.

01:23 So if you haven't been on distutils.sig, you may have not seen that there's actually a new PyPI, pypi.org.

01:33 And this is going to be the Python package index going forward.

01:37 So this is what we've been calling warehouse before.

01:40 Is that right?

01:40 So warehouse is the software that runs PyPI.

01:44 You know?

01:45 OK.

01:45 And so, yeah, it's a package index.

01:48 It's going to be where all of your wheels and SDISTS live.

01:52 And there's basically a lot of development that's happening here.

01:55 My friend Donald Stuffed is doing an amazing job with his team.

01:59 Basically, yeah, we're up to 114,598 projects at the moment.

02:05 This even lists a number of files, almost a million files with 230,000 users.

02:12 And so, yeah, I would definitely check out this pypi.org for yourself.

02:16 But for the most part, I wanted to talk about how they're deprecating the old PyPI.

02:21 So pypi.python.org is now basically just a read-only interface.

02:26 And if you've tried to upload a package recently, then you may have seen an error, "HTTP 4.10," which is like a 404, but this is 4.10 gone, meaning it was here, but now it's gone.

02:38 And so, yeah, you basically make sure to use a new version of setup tools, and it'll automatically start using the new one as long as your configs don't state otherwise.

02:47 You might have to update a config.

02:49 But this is a tremendous leap forward in a lot of ways.

02:52 And they need some help doing it too, you know.

02:55 So it's all open source on GitHub.

02:57 There are issues.

02:58 I'm working on one right now.

03:00 Yeah, it's got a lot of cool features.

03:02 Have you taken a look, Brian?

03:04 I've looked around a little bit.

03:05 Now, one of the things I've noticed, like right off the bat, is it says up at the top, there's a big red bar that says--

03:11 I know.

03:11 It's kind of scary.

03:12 Yeah.

03:13 So do you know, I'm guessing eventually at some point, The other interface will just redirect to here or is there?

03:20 >> I mean, cool URLs don't change.

03:23 Personally, in my view, I'd like it if they just kept it up and put the red bar over there, that this is a archive version of PyPI.

03:31 But for now, all those URLs are still working.

03:34 If you ask me, PyPI.org has been in use for so long because actually, if you've paid close attention, a lot of your downloads, pip is downloading from the new one.

03:44 >> Oh, okay.

03:45 So yeah, it's been in production a long time.

03:47 In fact, they just hit, I think, a petabyte a month in bandwidth downloads.

03:52 So yeah, just for a sense of the cost there, I think it's like in the tens of thousands, like 30, 40,000 a month to host PyPI.

04:01 And that's kindly donated by the Fastly CDN.

04:05 Should they stop feeling so generous, we gotta support our community somehow.

04:10 So there is a donate button here, But I think that right now, what they need most is sort of like people to work on cool features, like one that I saw has been working on that I'm very excited for, not strictly pypi.org, but same team, the Python Packaging Authority, they are working on making a dependency graph between all packages.

04:31 So if you've ever wondered what depends on what ahead of time, then this would enable that.

04:38 So yeah, How do I start working on it?

04:40 Do I go to the GitHub page?

04:42 Yeah, so I think it's github.com/pypa or I think it might be /warehouse.

04:48 Yeah, okay.

04:49 So and you know Donald has been very candid about like you know the areas that need development and he's been working very hard.

04:57 He's at Amazon now and he spends some time working on stuff there.

05:02 Oh, one last thing like distutils, right?

05:05 So they still there's an email list called distutils-sig which stands for special interest group.

05:11 And so this util sig, you can just go join the list serve.

05:15 And you can read the archive and see the conversations they're having.

05:18 If you care about packaging, you're probably already on there.

05:21 But if you aren't definitely subscribe.

05:23 I didn't know about it.

05:24 Yeah.

05:25 So we'll try to drop a link in the show notes for that.

05:29 Okay, well, that's, that's really cool.

05:31 Pretty good for first topic, you know, I don't know.

05:34 Yeah, definitely.

05:35 And I and the one one thing I want to add is I know that Donald has been vocal before about how awful the previous code was?

05:43 Yeah, I mean, it's pretty old code, right?

05:46 Like, I don't even know, it may not predate WSGI, but it's pretty old.

05:50 You've looked at the new code.

05:52 I've looked at the new code.

05:52 I can talk about the new code if we got a second.

05:54 So I've looked at it, I've used it.

05:56 It's got 100% coverage.

05:58 It's got a lot of CI stuff set up.

06:00 It uses Docker.

06:02 I had a little bit of trouble, like, you know, with the make based approach to running the thing, but it's pretty complex.

06:09 Like, it runs, I think, an Elasticsearch and all this stuff. So basically, yeah, you just people shouldn't be afraid to help out just because they've heard bad things about the old code. No, the new code is, it's pretty idiomatic, I think. And you know, if you're familiar with SQL alchemy, and I think it uses also maybe like pyramid, I think, and it looks like the tests are in pytest, too.

06:31 Yeah, this is definitely in pytest, which is frankly, the only way I've heard and have I've also found myself.

06:38 So yeah, it's been good.

06:40 - Oh, I could talk about this for a long time, but let's move on to the next topic.

06:44 - Absolutely.

06:45 - So one of the things, I just read about this yesterday.

06:47 There's a, I read about it on Make, I think it's the Make website, but it's CircuitPython is now going to be, is supported by a whole bunch of Adafruit hardware.

06:59 - It's great news for hardware hackers and also tinkerers like myself.

07:03 - And so we'll put a link in the show notes the make article but there's also so I had heard Adafruit announced CircuitPython in January and it's a it's an open source it's based on MicroPython so CircuitPython is also open source but it's so I'm not quite sure how they differ but they've added some things to make it easier to control hardware and they already had like two devices micro Metro M0 and Feather M0 express versions that support CircuitPython right off the bat.

07:37 And I guess they're working on a Circuit Playground Express.

07:41 All of these look like really fun things.

07:44 But the thing that really caught my attention was Gemma M0 that was announced at the end of July.

07:50 And this thing is like the size of a quarter.

07:52 It's a little small thing that you can make wearable software projects with, like LEDs and whatever.

07:59 And you just plug it in and into your computer, and you instantly it's like an extra drive you can see a main.py and it just you can just start programming in Python right away. Yeah right so basically just like it sort of functions kind of like a USB drive and there's a single main entry point in there and you can just modify it and then you know you don't need to install anything or anything like that. Yeah there's no loading apparently it does support Arduino but you don't like right off the bat you don't have to install anything you can just start programming and these are Right now they're currently out of stock, but I'm sure they get new stuff in pretty quick.

08:35 But it's under 10 bucks to start programming some wearable programming.

08:39 So I definitely have to get one of these.

08:41 Yeah, I can't wait to start wearing some running Python.

08:43 That'd be taking it to the next level.

08:45 And I'm also going to link to what I thought was great was they realized that, I mean, they are encouraging people to use Python if they can for programming hardware, but they realized that a lot of people are new to the Python community.

08:58 So there's a page called Creating and Sharing CircuitPython, a CircuitPython library.

09:04 And it's got a whole bunch of great links, like basically just telling people what--

09:10 when we say library, we mean a package or a module with a setup file and doing it all right.

09:16 And there's little intros to GitHub and Read the Docs and Travis.

09:20 So is it like--

09:21 when you say package or module, is this their own format?

09:24 Or is this like Python packages, wheels, that sort of thing?

09:27 Yeah, it's just Python stuff.

09:28 But it's just really quick tutorials to get people up to speed fast.

09:32 Sure.

09:33 So it's like sort of a full--

09:35 it's got like an end-to-end thing.

09:36 It doesn't just send you left and right to other sites.

09:39 Yeah, right.

09:40 It's really telling you everything.

09:42 And they're pretty condensed.

09:44 Actually, they're pretty good job condensing all that information.

09:47 Yeah, you don't need the whole context and history of Python packaging.

09:51 We've come a long way since eggs and that sort of stuff.

09:55 Yeah.

09:55 But then one of the things that is kind of interesting is they have a concept of bundles.

10:03 And really all a bundle is is a bunch of installable Python packages that are zipped up into a bundle.

10:10 Sure.

10:11 We normally don't really care about that because on a larger computer it's not that big of a deal.

10:19 But these little tiny devices you still have to care about how big it is.

10:23 So you're only, you might want to get everything that somebody cool has made, but you don't need it all.

10:28 You just need like the little part that you know, blinks the LED for you or whatever.

10:32 Sure.

10:33 So it sort of freezes it all together.

10:34 Yeah, these embedded applications are interesting.

10:36 So now that so I maintain this one library called hyperlink.

10:40 And I guess it's pretty widely used because Twisted depends on it.

10:43 And so I've gotten some interesting feedback, a few things like one code review I just went through.

10:50 I promise this is related.

10:52 I'm using pytest and I'm writing my assert statements and you know I love that pytest rewriting with the great error messages and so forth but I got a comment on my code review that these tests are not runnable in an embedded environment because they will run with -oo which elides all of those assert statements and I'm like well you're kind of running the tests wrong if you're you're not using pytest but in these embedded environments I don't know maybe the convention is different. So when you get yours definitely like test it out maybe you'll have to put a little caveat on your pytest recommendation if that's not what not what we can do on hardware I don't know. Oh that's interesting. Yeah yeah I'll definitely have to check that out. So I don't want the hardware people to not buy my book. That would be terrible. Well that's the thing with something like hyperlink which is for URLs I'm like 99.9% sure it's gonna run exactly the same everywhere. So I'm confident that if it runs on my machine, it runs on TravisCI, it runs on CodeVeyor or whatever, it's going to app there, I think. It'll be fine. And but at the same time, hardware people can be sticklers as I'm sure you know. So I respect that. I respect that.

12:04 Cool. Yeah. Neat. Well, what do we got next? Mahmoud?

12:08 Oh, right. It's back to me. So I don't know. I mean, so I spent a lot of my time pretty deep into development of all sorts of infrastructural sorts.

12:18 And I find myself subscribed to Python Dev, Python Ideas, distutils.sig, and you can't read everything there and still have a life.

12:27 So only a few things catch my eye, but this one in particular caught my eye because my friend Henik has this great library called adders.

12:35 If you haven't heard of it, my other friend Glyph has a whole blog post that tells you why you have to use this library, ATTRS.

12:44 And it's basically class decorators that make writing high level classes very easy.

12:51 So it sort of derives from this sort of tradition of name tuples, right?

12:56 Raymond Heidinger had this great idea to make name tuples, which let us define a class like structured thing within just one line.

13:04 But the problem with name tuples is that if you want to add methods to it, then you have to inherit from it.

13:09 And they're immutable by default.

13:11 And they don't really-- even though they generate a dunder init for you, they don't do a whole heck of a lot of validation.

13:18 So adders comes along, fixes all these things, adds a bunch of other cool functionality, and does it with class degraders.

13:24 It doesn't pollute your final object with anything you don't want.

13:28 Because you don't inherit from anything.

13:30 So you just inherit from object.

13:32 After glyphs-post took off or something, the core Python devs set up took some notice of this and said, maybe we have been neglecting a higher level interface for quickly defining classes. You know, you just want to have four or five fields, all sort of batch together. And you don't want to have a lot of functions that everywhere have to define 15 arguments. So like, how can we quickly, in a nice, concise, Pythonic way define a Python class.

13:59 And they came up with this new thing, which is still I guess, kind of, this is what I mean, I don't know if this is a little bit too deep underground, but there's this GitHub that Eric V. Smith, who is a Python core dev, has called "Data Classes." And the issues of this have been really interesting to watch because Hynik and a bunch of core devs have been debating, like, "Hey, should we just use Adders?

14:24 If Adders is getting so popular, should it just be part of the core Python?" And people seem to like it.

14:30 why make something that's so close to it, that sort of thing.

14:34 There's sort of a draft pep inside of the data classes repo and there's some examples of how it's used. It has some semantic differences, has some syntactic differences. I think that it's pretty interesting to watch and in fact they seem to be encouraging more experimentation in this area. Even though I like adders, they seem to want even more options, at least from themselves. So, I don't know, I had a good time reading the issues, maybe other people enjoy it too.

15:00 >> Yeah, so is this, it's similar to Adders then?

15:03 >> Yeah, it's pretty similar to Adders.

15:05 The differences are sort of fine enough that you have to kind of look closely.

15:11 Basically, I think that what it is, is like there's actually an issue called why not just Adders?

15:18 And they sort of explain that they want to use like the new, I think, type hint syntax type stuff.

15:26 >> Okay. >> So yeah.

15:27 Other people like kind of said that, hey, maybe like naming wise, data classes is a little bit clearer than others because someone who is a new Python programmer doesn't know that either is an attribute or something like that.

15:42 That's true.

15:42 So, it has some syntactic differences.

15:45 Yeah.

15:45 And there are some big names in this discussion.

15:48 There are, there are.

15:49 So that's, I mean, it's sort of like the, the, the inner circle, right?

15:52 This is kind of like the sort of stuff that I have to follow.

15:56 Oh, that's awesome.

15:57 - Be on the edge here.

15:58 And it happens kind of behind the scenes, but I really do encourage people to join these email lists if you wanna see the action happening.

16:05 You know, you don't have to be a spectator or you don't have to sit maybe in the nosebleed section of the arena on open source, right?

16:12 You can get up close on the, on like, you know, get the front row seats.

16:16 And before you know it, you'll actually get involved.

16:18 It'll be fun.

16:19 - Yeah, that's great.

16:20 Oh, thanks for bringing that up.

16:21 That's cool.

16:22 Well, speaking of trying to get involved, unless you've had your head under a rock, data science is a thing.

16:30 Is it, really?

16:31 It isn't something that I have to use on a daily basis, but it's definitely something I want to pay attention to.

16:37 And I ran across--

16:39 there's a lot of books and tutorials that are huge, because it's a huge topic.

16:44 And I ran across a article called "Pandas in a Nutshell." And I like it, because it's a Jupyter Notebook style post, so you can just see the code working.

16:56 And it's mostly tutorial by example with just a little bit of extra code for explanation.

17:01 And the big part of it is really just talking about a couple of data structures.

17:05 It's talking about the series data structure, which is a one-dimensional array with indices, so just kind of like a vector.

17:14 And then the data frame, which is like a two-dimensional array.

17:18 And all the sort of common things that you need to do with it, like specifying a custom index, or combining two series, or with matrix stuff, adding columns, adding a column that's based on another column.

17:36 Then this sort of stuff sort of seems like Excel, like working on a spreadsheet.

17:40 I think for a lot of people, that is the natural next step when they want to get into programming.

17:46 It's either going to be doing visual, Or is it like basic script of some sort inside of Excel?

17:52 Or maybe move into Python?

17:54 Yeah, and I guess that's one of the things I like about this little nutshell article is that if somebody is already doing some things in spreadsheets and they want to switch to working with pandas, this might be a pretty good stepping point to try to get things going.

18:11 And it's actually something I'm going to grab some of the concepts in here to try to deal with some of the large amounts of data that I deal with on a daily basis as well.

18:20 - Oh, for sure.

18:21 - So I haven't used, and I bring this up because I'm just starting.

18:23 I'm trying to use pandas on a daily basis now.

18:27 - And it is, I've actually faced a lot of the same challenges it's just because it's Python doesn't mean that it, you know, doesn't require some sort of kind of paradigm shift in your thought.

18:37 It's like thinking about data frames is very different than thinking about lists in Python or dictionaries in Python.

18:44 it's somewhere between Python and like full-blown relational databases.

18:48 And so you do have to change the way you think how to approach a problem, especially if you wanna get some performance out of the thing, 'cause it has all this great broadcasting logic that it can perform, but it's not gonna work if you just iterate over it in for loops.

19:02 - Yeah, and I guess that's where the data frames and series stuff comes in is because you wanna do some computation on everything or searching on stuff.

19:13 So it's kind of like a combination of a database and an in-memory database and something else.

19:20 - Where I work, some of our data scientists are coming from an R background and the data frame is based on R construct, I believe.

19:27 So they find it quite natural and the Python is what they sort of struggle with and they come to me for that.

19:34 But a Python person would want to ramp up on the data frame itself.

19:38 And so this notebook seems like a great option to do that quickly.

19:41 So that's just a quickie. So that's it. Your last topic.

19:46 Oh, already. So yeah, basically, just yesterday, I was at this conference, PyBay 2017 is sort of the Bay Area, Silicon Valley regional Python conference, only the second annual one. There's, it's surprising how long it took to spin up here.

20:02 Meanwhile, PyOhio has been going for who knows how long. So anyways, but it was a great conference.

20:08 almost 500 developers, pretty good turnout, and a lot of great topics covered.

20:15 I gave a packaging talk, but the thing I'm going to talk about today is actually the opening panel was on static typing.

20:25 And it was quite an interesting mix.

20:28 First of all, it was very international.

20:30 They had people from Germany, Russia, Poland, USA, and Netherlands.

20:34 It seems like Europeans are big fans of static typing for whatever reason, Guido included.

20:40 So yeah, they had people from I think, let's see, PyCharm, University of California, Berkeley, then also Quora, Google, and I think another guy too.

20:52 So it was a really nice cross section of the industry and also the world.

20:59 And they just talked about the state of static typing.

21:02 So right now, just to bring you up to date, I'm not sure how recently you covered this stuff on the podcast, but there are currently three or four static type checkers.

21:13 So in Python 3, you can specify your types however you'd like.

21:17 Built into the language, it's not going to do a lot of complaining in case types don't match.

21:23 First of all, at runtime, nothing is checked, right?

21:26 So if you want to check it, it would be at a compile time step.

21:30 The annotations are still there at runtime, and then you have a static type checker, the most popular of which is mypy, run over that and check it, kind of like a linter or any other static analysis tool.

21:44 And so there are other ones too though.

21:48 Google has one that is not super well documented, but they use it internally.

21:54 then PyCharm has this functionality as well, which is also kind of built from scratch.

22:00 And they made a pretty good case why you would want one built into PyCharm, which is that basically it can do incremental checking.

22:09 So while you're still writing, it can do sort of partial checks, maybe a little bit better than mypy.

22:14 Oh, right, the last person on the panel, Ukesh Langa from Facebook.

22:19 He also comes to my meetup.

22:20 Anyways, so yeah, he's very opinionated about types.

22:23 We'll get to that in a second.

22:25 One that wasn't talked about was pylint.

22:27 I was actually blown away.

22:29 I updated my Emacs config recently and I sort of integrated some more linting stuff.

22:34 The default pylint these days can do an amazing amount of inference.

22:38 It'll tell you you have the wrong number of arguments.

22:41 It'll tell you that this default doesn't match that type.

22:45 It'll do so many different things.

22:47 In addition to its standard, very opinionated idea the idea of how many arguments a function should even have and that sort of thing. Anyways, so those are our four sort of type inference engines. And they all are slightly different.

23:02 But everyone seemed to get along pretty well on stage. And they talked about, you know, potentially in the future, actually merging these things and making a pep that would allow them to all sort of comply together, maybe even turn into a single project. So that was nice to see. And one of the most interesting questions was basically from the audience. They said like, well, what is the real point behind the static typing? Like, what is the biggest benefit that you see? And there was a little bit of divergence on this, right? Some people like it for the strictness of it all being, you know, kind of the dictator of your own code base or whatever, right? But everyone else seemed seem to be pretty much on the same page that this is for human readability. This is a sort of documentation that can then be checked automatically at a rather large scale. So it's attached to the function, but it's more than just a doc test. And so the interesting side effect of this is that they even though they all work on static typing stuff, they a pretty nuanced view of how much static typing you should apply. So they say that, like, you know, maybe a list of a certain type, right, but but actually defining, say, a completely recursive type is one not supported, and two, maybe not even that desirable, because you don't want your function signatures to get super, super complex. So, yeah, I mean, it was interesting that they thought the human side of this was the most important part, opposed to say like a Haskell programmer or something where they want the mathematical correctness of it all. It's also interesting that there's, I would have liked to listen to the discussion of how much you should use of it.

24:47 Well it was at LinkedIn I think that they recorded it, it should go up pretty soon. Yeah I'll definitely, you know it was only a couple days ago but once the video is available I'll maybe send it to you, you can add it to the show notes.

24:57 Yeah. Some interesting side effects of this by the way, like something to So Cython does not support the new Python type syntax.

25:06 So even though all these guys are kind of on the same page and buddy-buddy, like, you know, for us people who really like Cython and have used it to achieve a lot of performance and type correctness to some degree are a little bit out of luck at the moment.

25:19 I think that people are working on making a pull request to it or something that would add support for this, but it's such a big change to the syntax, and Cython has its own type syntax, which is less focused on semantic types as this is and more focused on being in line with C types which allows you to have more compact memory, memory-like usage.

25:43 And the people on the panel were actually pretty clear that the static types advantage is not in performance.

25:49 So a project like PyPy, which actually can use types to achieve higher performance, they find that the JIT is faster without taking hints from the user in the code.

25:59 So it just disregards this stuff.

26:01 Because the JIT has the actual types.

26:04 So just a real quick thought experiment.

26:06 Like imagine that I say I'm going to pass you a list of integers.

26:11 That list is three integers long.

26:13 Okay, I can just check them.

26:14 One, two, three.

26:15 All integers, good to go.

26:16 No type error.

26:18 But if I pass you a list of 20,000 integers, Every time I pass that to you, I have to check that every single one is an integer.

26:26 Otherwise, I want to have a type error.

26:28 That sort of thing is going a little bit against the spirit of Python and being sort of practical and duck typey and whatnot.

26:36 A friend of mine from Intel was sitting next to me and he was saying how he came to Python so he wouldn't have to type everything.

26:44 But thankfully, you don't have to type everything.

26:46 The standard library itself, for instance, all the type definitions for that are available in this joint type shed repo that all of these static type people sort of built together.

26:58 And I'll link to that in the show notes for sure.

26:59 Yeah, my favorite use so far that I've come across for my own work is putting type hints in interface areas like an API module to that that's how you interact with the package.

27:13 So those are great places for type hints.

27:14 Oh, for sure.

27:15 And so wait, are you saying that-- so there is this old thing, like, they're trying to get rid of it.

27:20 Basically, Python has these sort of stub files, these interface files.

27:23 Some people call them the header files for Python.

27:26 Like, I think it's a .py file.

27:28 Okay.

27:28 .py.

27:29 I was just thinking, like, I've got a package that has a whole bunch of internal code, but it has like an API module that you should-- people interact with from the outside world.

27:42 That's a great place for pretty much any interfaces that are not you're not you that's going to use it that somebody else is going to use it. Those are great places to put type hints if it matters. Oh, definitely. Definitely.

27:54 Cool, but I'm pretty new to it too. So thanks for bringing that up. That was very interesting. Yeah, yeah. And I mean, I think that they're still changing this stuff quite a bit. Right. So I, you know, early adopters go nuts. But for the rest of us that like a little bit more boring technologies, you know, I'm going to go ahead and let the auto inference engine of Pylint figured things out for me.

28:13 I'm not going to, you know, jump on the bandwagon so quickly.

28:16 And I'm glad you brought Pylint up.

28:17 I've been sort of dismissing it because I've been using Flake 8, but I'll have to take a look at Pylint again.

28:24 Oh, yeah, they've definitely ramped up development on that again.

28:27 I mean, you have to for me anyways, right?

28:30 I just blacklist a lot of errors because I kind of don't agree with every single thing that they test for.

28:35 But they make it pretty easy to do.

28:37 You just change it in an INI file. No big deal.

28:39 Last topic again comes back to me finally getting my head out of thinking about pytest 24 hours a day.

28:46 And one of the things I want to start looking at is some of the web frameworks like Django and Flask.

28:55 I haven't played with them much personally and there's a bunch of personal projects and work projects I'd like to do with them. And also quite a few people that listen to testing code are web people. And so just to kind of get a more understanding of that, I'm trying to learn more frameworks. And one of the things that I've had a hard time getting my head around is ORMs or object relational mappers. So luckily I ran across a article from on full stack Python, which is Matt McKay's site. Amazing site. Yeah. Yeah. And it's basically it's a full stack Python.

29:30 It's a, I don't remember what it's called, but I think it's just object relational mappers.

29:35 And it goes through what they are.

29:38 So a norm is some code that automates the transfer of data from your internal Python objects and classes to database tables.

29:51 And they're useful so that you can write Python code instead of writing SQL queries.

29:56 And it talks about that and then also talks about why you need them and some downsides.

30:02 And yeah, so the downsides actually were interesting.

30:05 I didn't think that anybody would talk about what's wrong with using ORMs.

30:09 Yeah, I mean, realistically, there are some definite engineering trade-offs.

30:13 So what do you say?

30:14 Well, he said, well, a few things are impedance mismatch, which coming from electrical world, I was like, impedance mismatch?

30:22 That's like 50 ohms to 75 ohms, right?

30:25 Yeah, yeah.

30:25 But it's basically the way a developer is using the objects is different from how--

30:31 can be different from how the data is stored and joined in the tables in your database.

30:37 Especially if you've set up the tables in a way that's not like, it's contradictory to how it's being used all the time.

30:45 It might be slow and you can maybe reshaping your data might speed that up.

30:50 Then potential for reduced performance, and this isn't surprising to me, if you stick some code in the middle, there's it's not free, it's got to run.

31:00 >> Definitely not.

31:00 >> Then also shifting complexity from database to the application code, which this is something that I didn't quite understand right off the bat, but if you think about it, it's not too bad.

31:10 But databases are complex pieces of software that have things like stored procedures, stored procedures and a whole bunch of fancy join math and stuff.

31:21 >> Right.

31:21 >> That might not be supported by an ORM.

31:24 So if you had to do that stuff, you have to do it in your application instead.

31:29 So it's using a database in a simpler way, But that complexity has to go somewhere and it'll go in your application code.

31:36 Yeah, almost certainly.

31:37 But I mean, until you get like database specialists, then, you know, it makes it a little bit easier for you as, you know, a sole developer, for instance.

31:46 Yeah, so I punted at first and used document databases because I didn't have to think about ORMs right off the bat.

31:53 But I mean, so, but the thing is that an ORM, like he's correct, like a database is definitely a very advanced, complex tool.

32:00 But a lot of that advances in complexity, you retain even when using an ORM.

32:05 For instance, a lot of document databases don't have great transaction models, don't have great, you know, sort of multi version concurrency models.

32:12 And, you know, so when they put all that work into Postgres, or even like MariaDB or something like that, you can just by using an ORM, it seems almost as simple as a document database, but you get that operational, you know, feature.

32:27 Yeah, I'd definitely heard of SQLAlchemy or SQLAlchemy, but I hadn't heard of a couple of the others that he listed here, PeeWee and Pony and SQLObject. Have you used any of these?

32:41 Yeah, so SQLAlchemy is definitely my go-to, and I'll talk about why in a second. But yeah, I mean, I've used Django's ORM because I did the Django tutorial, and that's one of the first things they teach you. Django has a serviceable ORM, but there are some issues with it that SQLAlchemy actually does a much better job with. And I have used PeeWee, in fact, I like PeeWee.

33:03 It's sort of like a simplified version of Django. In my opinion, it basically says like, look, if you're not going to be SQLAlchemy, then you know, you can just be plain simple. And it does a pretty good job. But these days, SQLAlchemy has gotten so good that, you know, I just I'm going to work with a relational database in Python.

33:23 So one thing that SQLAlchemy has is that it sort of has this working copy of all the models, and they end up being kind of like singletons within a given process space.

33:34 So with Django, you can actually get two copies of the same thing from the database within the same request or the same process.

33:45 And that means that basically concurrently somewhere else in your program it could change something, save it, and then when you change it in the request handler you're actually trying to work on, that will overwrite the previous change.

33:59 You know, like if you change column A in one thread and column B in another thread, whichever thread saves first is going to overwrite the other unchanged value.

34:09 So there's a setting that's off by default, I think, in Django called "atomic requests," and you have to enable that to prevent that sort of situation.

34:18 But Django is not alone in this.

34:20 I think that Rails, at least for a very long time, did the same thing.

34:24 And Django, of course, is sort of Python's response to Ruby on Rails.

34:27 So yeah.

34:28 Does SQLAlchemy not have this problem?

34:31 So SQLAlchemy doesn't have this problem because basically, yeah, you only get one copy of that thing in your system.

34:36 It has this sort of local index of primary key to the object version of that row that you're representing, for instance.

34:45 So yeah, SQLAlchemy sort of has, it adds a lot of machinery, makes SQLAlchemy a little bit more complex, but I've had a friend who I think spent days tracking down this issue with Django, and SQLAlchemy never would have happened.

34:59 So you pay some upfront costs with setup with SQLAlchemy, but I think it's definitely worth it.

35:04 When it comes to this sort of ORM thing though, like if I can provide some general advice, ORMs are sort of the tools of applications.

35:13 And if you want to form a real opinion on object relational mappers, you should look at and compare applications.

35:22 So I spent a fair amount of time reading Reddit source code, which does, I think, use SQLAlchemy.

35:28 And it uses it without the declarative object mapper.

35:31 It uses it with the legacy or lower level SQLAlchemy tools.

35:37 But you still get a real sense for where they use an ORM and where they don't.

35:41 And SQLAlchemy actually makes it very easy to pass through normal SQL text.

35:46 That's another thing I really like about it.

35:47 It understands that ORMs are an abstraction that's useful 90% of the time.

35:51 And for that last 10%, you really want the full power of the driver or the database itself.

35:58 Okay, cool.

35:59 I don't have any opinion on these extra couple links that I put in here, but Matt has some dedicated pages for SQLAlchemy and PeeWee.

36:08 And one of the things I like about Matt's site anyway, the full stack Python, is he gives his opinion and information when he has it.

36:16 And when somebody else has already explained it well enough or better, he just links to their stuff and says, go read that.

36:22 Yeah, absolutely.

36:23 No, I mean, he's a real team player in that regard.

36:25 But I also, I just got to give a shout out to him.

36:27 Like, he so consistently adds to the site.

36:30 It's become such a tremendous resource for someone who wants to develop an application.

36:34 I'm sure that the listeners of this podcast or for the most part, like already aware of it, but yeah, definitely check it out.

36:41 - Definitely.

36:42 Well, that's all of our topics so far.

36:45 We didn't address what you're up to lately other than helping out with podcasts.

36:50 (laughing)

36:52 - Yeah, no, it's funny.

36:53 I'm also like prepping for another podcast as well, but partially examine life, I guess.

36:59 But basically, yeah, what am I up to lately?

37:02 Well, I had a talk at Pi Bay and because it was based on blog posts, I thought it'd be easy to put together slides.

37:07 Now it still took, like just full disclosure here, it took like another 40, 50 hours to make slides from that blog post.

37:14 But it seemed really well received and so I'm very relieved right now.

37:17 I got some nice life events coming through, parents coming to town, keeping me real busy.

37:22 I also am working on this hyperlink library, like I mentioned earlier, URLs in Python and it's used by Twisted and some other big projects.

37:31 So fixing bugs in there is always kind of contentious, which is why I got a lot of support for people who work on things like setup tools, which is even more widely used.

37:40 So then, beyond this, let's see, yeah, writing blog posts, I got, I think my draft count is up to like 100 now.

37:49 But, yeah, maybe more conferences, more talks.

37:53 I don't know why I keep signing up for these things, but it's great meeting people out there.

37:56 People out there should really look into PiBay and regional conferences, meetups.

38:01 Oh, well, I run a meetup to the Pine Insula meetup, the hottest new meetup in the Bay Area, Silicon Valley. And so yeah, like, yeah, yeah, we were a this is programming, man. It's all about the terrible puns. So we, but yeah, Pine Insula. Yeah, I think we even have the site now pineinsula.org. And you know, we're on Twitter and so forth. I do my best to record the talks.

38:27 But for people who want to break into this type of, you know, speaking and that sort of thing.

38:32 Just look at, look no further than your local meetup, right?

38:35 Go make a 15 minute, 30 minute talk.

38:38 See how it goes.

38:39 Iterate on it, right?

38:41 Have a brown bag at your company.

38:43 Just keep iterating on it and you know, something will stick.

38:46 And, then you can submit it to something like PyCon or whatever.

38:50 That's a great idea.

38:51 I think a lot of people think that you could, you just have to work really hard on a talk and give it once and then it's done, but a lot of people give them several times.

38:59 Yeah.

38:59 And also like if there's not a meetup in your area, just maybe start one.

39:03 Python programmers are literally everywhere.

39:05 So we, like, you know, even though there's a South Bay Python meetup, which is sort of like more towards Sunnyvale, like kind of south of Mountain View area.

39:16 And there's this SF Python meetup, which is up in San Francisco.

39:20 We put one right in the middle.

39:22 I guess California traffic's bad enough that we sort of have a captive audience, literally.

39:27 But we'll get like, you know, I think when Guido came, there were almost 100 people at the meetup.

39:32 And normally we get like 50.

39:34 But it's great because everyone can socialize and something a little more intimate.

39:38 It's a little less stressful when you're trying to give the talk yourself too.

39:41 Yeah.

39:41 So it wouldn't be a Python Bytes episode if I didn't plug my book.

39:45 By all means.

39:46 So one of the things I want to bring up is the Python Testing with pytest has a nice discussion forum.

39:52 It's kind of built into what Pragmatic offers for all the books.

39:56 But it's a, if you ever ask a question on there, it pings me and tells me, emails me and says, there's a question.

40:02 Just this morning I answered a question.

40:05 Somebody got on and said that they were actually this, I love this.

40:09 They said that the book is helping them understand testing better.

40:13 And I love comments like that, but the, he asked, he had a question about monkey patch versus mock.

40:20 And I'm not going to get into it too much here, but I did reply to him and it's all up there for everybody else.

40:26 to read too. So I'll have a link in the show notes to that.

40:29 That's great. Yeah those sorts of comments really keep you going. I wish that my O'Reilly thing had had such a discussion forum. Instead I have to, I got my feedback through reviews for a while. I mean emails too. People email and I appreciate it. Yeah I get them from all over the place. I get it through the discussion forum. I get it from from Twitter and from, we've got a Slack channel so people come and tell me what's wrong in the Slack. Yeah Yeah, definitely. I know for just like sort of chatting here, right?

40:58 I've been really into like Riot.im, which is a Python based open source Slack sort of thing.

41:06 And there's also Zulip, which is just everywhere these days.

41:09 They're doing an amazing job.

41:10 So what's the first one, Riot?

41:12 Yeah, so Riot.im and it runs a sort of protocol called Matrix.

41:17 And it's a very, very large thing.

41:20 It's basically like you can have end to end encrypted chats with people who are on it.

41:24 but I use it because it's an IRC bridge.

41:27 Like I said, if you want to be sort of in this inner circle, see the goings-ons, IRC is still very much alive.

41:33 So you got your list serves and IRC and so forth.

41:38 And Riot makes that pretty easy to get into.

41:42 There's a free node bridge and you just join a free node thing and you can look at IRC through your browser while having end-to-end encrypted chats with your other friends.

41:50 It also has a sort of peer-to-peer video chat that works really, really well because it's just the WebRTC open source protocol, works great in Firefox.

42:01 - Well, I'm gonna cut you off 'cause we're running long.

42:04 - Oh, yeah, we're way long.

42:05 Anyways, that's great.

42:06 - Also, I think this is an awesome topic.

42:09 I think that you should come on to Test Encode and we can talk about IRC and communication channels.

42:16 That'd be fun.

42:16 - That's actually a great idea.

42:17 Yeah, for sure.

42:19 I'm always like coming up short with topics when they come, but yeah, here we are just chatting.

42:23 That's a great idea.

42:24 Dave Again, thank you so much for coming on.

42:26 I love having new voices on here.

42:28 Michael It's been my pleasure.

42:30 And thank Michael.

42:31 When he gets back, I'll send him an email.

42:33 This has been great.

42:34 Dave Yeah, and we'll keep in touch.

42:38 Thank you for listening to Python Bytes.

42:40 Follow the show on Twitter via @PythonBytes.

42:44 That's Python Bytes as in B-Y-T-E-S.

42:47 Get the full show notes, including links, at PythonBytes.fm.

42:51 If you have a news story you'd like featured, visit pythonbytes.fm and send it our way.

42:56 We're always on the lookout for sharing something cool.

42:59 This is Brian Okken.

43:00 On behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page