Brought to you by DigitalOcean - grab your $50 credit and deploy your first project for free


« Return to show page

Transcript for Episode #39:
The new PyPI

Recorded on Tuesday, Aug 15, 2017.

Brian OKKEN: Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds. This is Episode #39, recorded August 14th, 2017. I’m Brian Okken and again, Michael (Kennedy) is on vacation. We have a guest host and this week we have Mahmoud Hashemi. Hey, Mahmoud.

Mahmoud HASHEMI: Hi there. Good to be here.

OKKEN: You’ve been on Testing Code and you’ve been on Talk Python a couple of times.

HASHEMI: Yeah, a couple of my faves for sure.

OKKEN: When I was looking up Talk Python, I noticed that you were on Episodes #4 and #54.

HASHEMI: Yeah, and when Guido (van Rossum) was on, Michael was kind enough to ask my question and I did like, a panel thing. It’s really nice to have repeat appearances. People recognize me by my voice now. It’s kind of strange. I’m very appreciative at the same time.

OKKEN: That’s great. So, thanks a lot for helping to do this today.

HASHEMI: Hopefully I can do Michael right, taking his spot here.

OKKEN: Well, let’s just jump right in. I’m really excited about your first topic.

HASHEMI: Oh, sure. Let’s see, first up. One thing that’s been on my radar, I’m not sure if you guys have talked about this before. Sometimes I’m listening to Python Bytes and it’s a little bit garbled or something. Have you guys tried calling decode? I’m kind of curious why it’s not ‘Python Stirs’.

One thing that’s been on my radar is, “The New PyPI”. If you haven’t been on distutils-sig you may have not seen that there’s a new PyPI, pypi.org. This is going to be the new package index going forward.

OKKEN: So, this is what we’ve been calling Warehouse before, is that right?

HASHEMI: Warehouse if the software that runs PyPI. So, it’s a package index. It’s going to be where all of your wheels and STS live. Basically, a lot of development is happening here. My friend Donald Stuft is doing an amazing job with his team. We’re up to 114,598 projects at the moment. This even lists the number of files. Almost a million files with 230,000 users.

I would definitely check out this PyPI.org for yourself. But for the most part, I wanted to talk about how they’re deprecating the old PyPI. PyPI.Python.org is now basically just a read-only interface. If you’ve tried to upload a package recently then you may have seen an error, HTTP: 410, which is like a 404, but this is 410, meaning it was here but now it’s gone. Make sure to use a new version of setuptools and it will automatically start using the new one, as long as your configs don’t state otherwise. You might have to update a config. This is a tremendous leap forward in a lot of ways. They need some help doing it, too. It’s all Open Source on GitHub, there are issues. I’m working on one right now. It’s got a lot of cool features.

Have you taken a look, Brian?

OKKEN: I’ve looked around a little bit. One of the things I noticed right off the bat is, it says up at the top, there’s a big red bar.

HASHEMI: I know, it’s kind of scary.

OKKEN: I’m guessing at some point the other interface will just redirect to here.

HASHEMI: Cool URLs don’t change. Personally, in my view I’d like it if they just kept it up and put the red bar over there as an archived version of PyPI. But for now all those URLs are still working. If you ask me, pypi.org has been in use for so long because actually, if you paid close attention, a lot of your downloads, pip is downloading from the new one. It’s been in production a long time. They just hit, I think, a petabyte a month in bandwidth downloads. Just for a sense of cost there, I think it’s in the tens of thousands, like $30,000 to $40,000 a month to host PyPI and that’s kindly donated by the Fastly CDN. Should they stop feeling so generous, you know… We’ve got to support our community somehow. There’s a donate button, but I think that right now, what they need most is people to work on cool features. One that I saw that’s being worked on that I’m very excited for, not strictly pypi.org but same team, the Python Packaging Authority, they are working on making a dependency graph between all packages. So, if you’ve ever wondered what depends on what ahead of time, then this would enable that.

OKKEN: How do I start working on it? Do I go to the GitHub page?

HASHEMI: Yes. I think it’s github.com/pypa/warehouse. Donald has been very candid about the areas that need development. He’s been working very hard. He’s at Amazon now. He spends some time working in stuff there.

One last thing, there’s an email list called distutils-sig, which stands for ‘special interest group’. So, distutils-sig, you can just go join the listserve, read the archive and see the conversations they’re having. If you care about packaging, you’re probably already on there, but if you aren’t, definitely subscribe.

OKKEN: I didn’t know about it. We’ll drop a link in the show notes.

Okay, well, that’s really cool.

HASHEMI: Pretty good for a first topic?

OKKEN: Yeah, definitely. One thing I want to add is I know that Donald has been vocal before about how awful the previous code was.

HASHEMI: Yeah, it’s pretty old code. It may not predate Whiskey, but it’s pretty old.

OKKEN: Have you looked at the new code?

HASHEMI: I’ve looked at the new code. I can talk about the new code if we have a second. So, I’ve looked at it, I’ve used it. It’s got 100% coverage. It’s got a lot of CI stuff set up. It uses Docker. I had a little bit of trouble with the make-based approached to running the thing but it’s pretty complex. It runs in an elasticsearch and all this stuff.

OKKEN: Nice. People shouldn’t be afraid to help out just because they’ve heard bad things about the old code.

HASHEMI: No, the new code is pretty idiomatic, I think. If you’re familiar with SQLAlchemy and I think it uses maybe Pyramid.

OKKEN: And it looks like the tests are in pytest too.

HASHEMI: Yeah, the test is definitely in pytest, which is frankly the only way I have heard and also found myself. So, yeah, it’s been good.

OKKEN: I could talk about this for a long time but let’s move on to the next topic.

I just read about this yesterday. I read about this on the Make: website (makezine.com). It’s “CircuitPython Snakes its Way onto Adafruit Hardware”.

HASHEMI: It’s great news for hardware hackers and also tinkerers like myself.

OKKEN: We’ll put a link in the show notes to the Make: article. I had heard Adafruit announced CircuitPython in January. It’s Open Source. It’s based on MicroPython, which is also Open Source, so I’m not sure how they differ but they’ve added some things to make it easier to control hardware. They already had two devices, Metro M0 and Feather M0 Express versions, that support CircuitPython right off the bat. I guess they’re working on a Circuit Playground Express.

All of these look like really fun things, but the thing that really caught my attention was Gemm M0, that was announced at the end of July. And this thing is like the size of a quarter, it’s a little small thing that you can make wearable software projects with, like LEDs and whatever. You just plug it into your computer and instantly it’s like an extra drive. You can see a main.py and you can just start programming Python right away.

HASHEMI: So, basically it functions like a USB drive and there’s a single, main entry point and you can just modify it. You don’t need to install anything or anything like that.

OKKEN: Yeah, there’s no loading. Apparently it does support Arduinos, but right off the bat you don’t have to install anything, you just start programming. Right now they’re currently out of stock but they get new stuff in pretty quick. But it’s under $10 to start programming some wearable programming. I definitely have to get one of these.

HASHEMI: I can’t wait to start wearing some running Python. That’d be taking it to the next level. (Laughs)

OKKEN: I’m also going to link to what I thought was great. They are encouraging people to use Python when they can for programming hardware, but they realize that a lot of people are new to the Python community, so there’s a page called, “Creating and Sharing a CircuitPython Library”. It’s got a whole bunch of great links telling people when we say library, we mean a package or a module with a setup file and doing it all right. And there’s little intros to GitHub and ReadTheDocs and Travis.

HASHEMI: When you say package or module, is this their own format or is this Python Packages, Wheels, that sort of thing?

OKKEN: Yeah, it’s Python stuff but it’s just really quick tutorials to get people up-to-speed fast.

HASHEMI: Sure, so it’s got like an end-to-end thing, it doesn’t just send you left and

right to other sites?

OKKEN: Yeah, it’s really telling you everything. They’re pretty condensed. Actually, they did a pretty good job condensing all that information.

HASHEMI: Yeah, you don’t need the whole context and history of Python Packaging. We’ve come along way since Eggs and that sort of stuff.

OKKEN: Yeah and one of the things that is kind of interesting is they have a concept of bundles. Really, all a bundle is, is a bunch of installable Python Packages that are zipped up into a bundle.

HASHEMI: Hm, sure.

OKKEN: We don’t normally care about that because on a larger computer it’s not that big of a deal. But these little tiny devices, you have to care about how big it is. You might want to get everything somebody cool has made, but you don’t need it all. You just need the little part that blinks the LED for you or whatever.

HASHEMI: Sure, so it sort of freezes it all together. These embedded applications are interesting. So, I maintain this one library called Hyperlink and I guess it’s pretty widely used because Twisted depends on it. So, I’ve gotten some interesting feedback on a few things. One code-review I went through – I promise this is related – basically, I’m using pytest and I’m writing my assert statements. I love that pytest rewriting, the great error messages and so forth. But I got a comment on my code review that these tests are not runnable in an embedded environment, because they will run with -00 which elides all of those assert statements. I’m like, you’re kind of running the tests wrong if you’re not using pytest. But in these embedded environments, I don’t know, maybe the convention is different.

So, when you get yours, definitely test it out. Maybe you’ll have to put a caveat on your pytest recommendation if that’s not what we can do on hardware. I don’t know.

OKKEN: Oh, that’s interesting. I’ll definitely have to check that out. I didn’t want the hardware people to not buy my book. (Laughs) That would be terrible.

HASHEMI: That’s the thing, with something like Hyperlink, which is for URLs, I’m 99.9% sure it’s going to run exactly the same everywhere. So, I’m confident that if it runs on my machine, it runs on Travis CI, it runs on AppVeyor, it will be fine. But at the same time, hardware people can be sticklers, as I’m sure you know. I respect that.

OKKEN: Yeah, neat. Well, what do we have next, Mahmoud?

HASHEMI: Oh, right, it’s back to me. So, I spend a lot of my time pretty deep into development of all sorts of infrastructural sorts. I find myself subscribed to Python-Dev, Python-Ideas, distutils-sig. And you know, you can’t read everything there and still have a life, so only a few things catch my eye. But this thing in particular caught my eye because my friend Hynek (Schlawack) has this great library called attrs. If you haven’t heard of it, my other friend Glyph (Lefkowitz) has a whole blogpost that tells you why you have to use this library.

It’s basically class decorators that make writing high-level classes very easy. It derives from this sort of tradition of named tuples. Raymond Hettinger had this great idea to make named tuples, which let us define a class like a structured thing within just one line. But the problem with named tuples is if you want to add methods to it, then you have to inherit from it, and they’re immutable by default. And they don’t really generate a dunder in it for you, they don’t do a whole heck of a lot of validation. So, attrs comes along to solve these things and a bunch of other cool functionalities. It does it with class decorators. It doesn’t pollute your final object with anything you don’t want because you don’t inherit from anything. So, you just inherit from object.

After Glyph’s post took off, or something, the core Python devs sat up, took some notice of this and said, ‘Maybe we have been neglecting a higher-level interface for quickly defining classes. You know, you just want to have four or five fields all sort of batched together and you don’t want to have a lot of functions that everywhere have to define 15 arguments. So, how can we quickly, in a nice, concise Pythonic way, define a Python class?’ And they came up with this new thing, which is still, I guess, kind of deep underground. There’s this GitHub that Eric V. Smith – who is a Python core dev – has called Dataclasses. And the issues of this have been really interesting to watch because Hynek and a bunch of core devs have been debating like, ‘Should we just use attrs? If attrs is getting so popular, should it just be part of core Python? People seem to like it, why make something that’s so close to it.’ That sort of thing.

There’s sort of a draft PEP inside of the Dataclasses repo, and there’s some examples of how it’s used. It has some semantic differences, it has some syntactic differences. I think that it’s pretty interesting to watch and they seem to be encouraging more experimentation in this area. Even though I like attrs, they seem to want even more options, at least from themselves.

I had a good time reading the issues. Maybe other people will enjoy it, too.

OKKEN: Yeah, so it’s similar to attrs then?

HASHEMI: It’s pretty similar to attrs. The differences are fine enough that you have to look closely. Basically, I think that what it is is like, there’s actually an issue called, ‘Why not just attrs?’ And they explain that they want to use the new, I think, type hints syntax.

OKKEN: Okay.

HASHEMI: Other people said, ‘Maybe naming-wise, Dataclasses is a little bit clearer than attrs, because someone who is a new Python programmer doesn’t know that attrs is an attribute.’ Or something like that. So, it has some syntactic differences.

OKKEN: There are some big names in this discussion.

HASHEMI: There are. That’s what I mean. It’s like the inner circle, right? This is the kind of stuff I have to follow from beyond the edge here.

OKKEN: That’s awesome.

HASHEMI: It happened kind of behind the scenes, but I really do encourage people to join these email lists if you want to see the action happening, you know. You son’t have to be a spectator. You don’t have to sit in the nosebleed section of the arena on Open Source, right? You can get up close and get the front row seats and before you know it, you’ll actually get involved. It will be fun.

OKKEN: Yeah, that’s great. Thanks for bringing that up. That’s cool.

Well, speaking of trying to get involved, unless you’ve had your head under a rock, data science is a thing.

HASHEMI: (Laughs) Is it, really?

OKKEN: It isn’t something that I have to use on a daily basis, but it’s definitely something I want to pay attention to. There’s a lot of books and tutorials that are huge because it’s a huge topic. I ran across an article called, “Pandas in a Nutshell”. I like it because it’s a Jupyter Notebook-style post, so you can just see the code working. It’s mostly tutorial by example with just a little bit of code for explanation. The big part of it is really just talking about a couple data structures. Talking about the Series data structure, which is a one-dimensional array with indexes, so like a vector. And then, the Dataframe, which is like a two-dimensional array.

All the common things that you need to do with it, like specifying a custom index or combining two series or with matrix stuff – adding columns, adding a column that’s based on another column. Then this stuff sort of seems like Excel.

HASHEMI: Yeah, a spreadsheet. I think for a lot of people, that is the natural next step when they want to get into programming. It’s either going to be doing basic script of some sort in some kind of Excel, or maybe move into Python.

OKKEN: Yeah, I guess that’s one of the things I like about this Nutshell article is if somebody’s already doing something in spreadsheets and they want to switch to working with Pandas, this might be a pretty good stepping point to get things going. And I’m going to grab some of the concepts in here to try to deal with some of the large amounts of data that I deal with on a daily basis, as well.

HASHEMI: Oh, for sure.

OKKEN: I bring this up because I’m just starting. I’m trying to use Pandas on a daily basis now.

HASHEMI: I’ve actually faced a lot of the same challenges. Just because it’s Python doesn’t mean that it doesn’t require some sort of paradigm shift in your thought. Like, thinking about Dataframes is very different than thinking about lists in Python, or dictionaries in Python. It’s somewhere between Python and full-blown relational databases. So, you do have to change the way you think about how to approach a problem. Especially if you want to get some performance out of the thing, because it has this great broadcasting logic that it can perform but it’s not going to work if you’re just going to iterate over over four loops.

OKKEN: Yeah, I guess that's where the Dataframes and Series stuff comes in because you’re going to do some computation on everything and searching on stuff. This is kind of like a combination of an in-memory database and something else.

HASHEMI: Where I work, some of our data scientists are coming from an R background and the Dataframe is based on a R construct, I believe. They find it quite natural and the Python is what they sort of struggle with and they come to me for that. But a Python person will want to ramp up on the Dataframe itself, so this notebook seems like a great option to do that quickly.

OKKEN: Yeah, so that was just a quickie. That’s it. Your last topic.

HASHEMI: Already? So, basically, just yesterday I was at this conference, PyBay 2017, it’s the Bay Area/Silicon Valley regional Python conference. Only the second annual one. It’s surprising how long it took to spin up here, meanwhile PyOhio’s been going on for who knows how long. But it was a great conference. Almost 500 developers, pretty good turnout and a lot of great topics covered. I gave a packaging talk.

But the thing I want to talk about today is the opening panel was on static typing. It was quite an interesting mix. First of all, it was very international. They had people from Germany, Russia, Poland, USA and the Netherlands. It seems like Europeans are big fans of static typing for whatever reason, Guido included. So, they had people from, I think, PyCharm, University of California at Berkeley, then also Qora, Google, and I think another guy, too. It was a really nice cross section of the industry and also the world. And they talked about the state of static typing.

Right now, I want to bring you up-to-date. I’m not sure how recently you covered this stuff on the podcast. There are currently three or four static type checkers, so in Python 3 you can specify your types however you’d like. Built into the language, it’s not going to do a lot of complaining in case types don’t match. First of all, at runtime nothing is checked. So, if you want to check it, it would be at a compile time step.The annotations are still there at runtime and then you have a static type checker, the most popular of which is mypy, run over that and check it. Kind of like a linter, or any other static analysis tool.

So, there are other ones too. Google has one that is not super-well documented but they use it internally. Then PyCharm has this functionality as well, which is also kind of built from scratch. They made a pretty good case why you would want one built into PyCharm, which is that basically it can do incremental checking. So, while you’re still writing it can do partial checks, maybe a little bit better than mypy.

Oh, right. The last person on the panel, Lukasz Langa from Facebook, he also comes to my meetup. So, he’s very opinionated about types, but we’ll get to that in a second. One that wasn’t talked about was PyLint, so I was actually blown away. I updated my Emacs config recently and I sort of integrated some more linking stuff. The default PyLint these days can do an amazing amount of inference. It will tell you if you have the wrong number of arguments, it will tell you like, ‘Oh, the default doesn’t match that type.’ It will tell you so many different things, in addition to its standard, very opinionated idea of how many arguments a function should even have, and that sort of thing.

So, those are the four type inference engines and they all are slightly different. But everyone seemed to get along pretty well on stage and they talked about potentially, in the future, actually merging these things and making a PEP that would allow them all to comply together, maybe even turn into a single project. So, that was really nice to see. And one of the most interesting questions was basically from the audience. They said, ‘Well, what is the real point behind the static typing? What is the biggest benefit that you see?’ And there was a little bit of divergence on this.Some people like it for the strictness of it all, being the kind of dictator of your own code base, right? But everyone else seemed to be pretty much on the same page, that this is for human readability. This is a sort of documentation that can then be checked automatically on a rather large scale. It’s attached to the function but it’s more than a doc test. So, the interesting side effect of this is that even though they all work on static typing stuff, they have a pretty nuanced view of how much static typing you should apply. They say that maybe a list of a certain type, but actually defining a completely recursive type is 1) not supported and 2) maybe not even that desirable

because you don’t want your function’s signatures to get super, super complex.

Yeah. It was interesting that the thought the human side of this was the most important part, as opposed to say, a Haskell program or something, where they want the mathematical correctness of it all.

OKKEN: I would have liked to have listened to the discussion of how much you should use of it.

HASHEMI: Well, it was at LinkedIn. I think that they recorded it and it should go up pretty soon. It was only a couple days ago. Once the video is available, I’ll send it to you and you could add it to the show notes.

There’s some interesting side effects to this, by the way. Like, somethings to consider.Cython does not support new Python-type syntax. Even though all these guys are kind of on the same page and buddy-buddy. For us people who really like Cython and have used it to achieve a lot of performance and type correctness to some degree, we’re a little bit out of luck at the moment. I think that people are working on making a pull request to it that would add support to it, but it’s such a big change to the syntax. Python has its own type syntax, which is less focused on semantic types as this is, and more focused on being in line with C types, which allows you to have more compact memory usage. The people on the panel were actually pretty clear that static type advantage is not in performance. So, a project like PyPy which actually can use types to achieve higher performance, they find that the JIT is faster without taking hints from the users not the code. It just disregards this stuff.

OKKEN: Oh, interesting.

HASHEMI: Because the JIT has the actual types.

So, just a real quick thought experiment. Imagine I say I’m going to pass you a list of integers. That list is three integers long, I can just check them, 1,2,3. All integers good to go, no type error, right? But if I pass you a list of 20,000 integers, every time I pass that to you I have to check that every single one is an integer, other I’m going to have a type error. That sort of thing is going against the spirit of Python and being practical and duck-type-y and what not.

So, a friend of mine from Intel was sitting next to me and he was saying how he came to Python so he wouldn’t have to type everything. But thankfully you don't have to type everything. In a standard library itself, for instance, is all the type definitions that are available in this joint type shed repo all of these static type people built together, and I’ll link to that in the show notes, for sure.

OKKEN: My favorite use so far that I came across for my own work is putting type hints in interface areas like an API module. That’s how you interact with the package. So, those are great places for type hints.

HASHEMI: Oh, for sure. And there’s this old thing that they’re trying to get rid of. Basically, Python has these stub files, these interface files. People call them the Heather Files for Python. I think it’s a .pi file, .pyi.

OKKEN: I was thinking about a package that has a whole bunch of internal codes, that has an API module that people interact with from the outside world. That’s a great place for pretty much any interfaces that’s not you that’s going to use it, that somebody else is going to use it. Those are great places to put type hints, if it matters.

HASHEMI: Oh, definitely. Cool.

OKKEN: But I’m pretty new to it so, thanks for bringing that up. That was very interesting.

HASHEMI: I think that they’re still changing this stuff quite a bit. Early adopters go nuts, but for the rest of us who like a little bit more boring technologies, I’m going to go ahead and let the auto inference of PyLint figure things out for me. I’m not going to jump on the bandwagon so quickly.

OKKEN: I’m glad you brought PyLint up. I’ve been sort of dismissing it because I’ve been using Flake8, but I’ll have to take a look at PyLint again.

HASHEMI: Oh, yeah. They’ve definitely ramped up development on that again. For me anyway, I just blacklist a lot of the errors because I kind of don’t agree with every single thing that they test for. But they make it pretty easy to do, You just change it in an .INI file. No big deal.

OKKEN: The last topic, again, comes back to me finally getting my head out of thinking about pytest 24 hours a day. One of the things I want to start looking at is some of the web frameworks, like Django and Flask. I haven't played with them much personally. And there’s a bunch of personal projects and work projects I’d like to do with them. And also, quite a few people that listen to Testing Code are web people, so just to kind of get more understanding of that, trying to learn more frameworks.

One of the things that I’ve had a hard time getting my head around is ORMs, or Object-relational mappers. Luckily, I ran across an article on FullStack Python, which is Matt Makai’s site.

HASHEMI: Amazing site.

OKKEN: I think it’s just called, “Object-relational mappers (ORMS)”. It goes through what they are. An ORM is code that automates the beta transfer of data from your internal Python objects and classes to database tables. And they’re useful so that you can write Python code instead of writing SQL queries. It talks about that and also talks about why you need them and some downsides.

The downsides actually were interesting. I didn't think anybody would talk about what’s wrong with using ORMs.

HASHEMI: Realistically, there are some definite engineering trade offs. So, what did he say?

OKKEN: Some of the few things are “impedance mismatch”. Coming from electrical world, I was like, ‘Impedance mismatch? That’s like 50 ohms to 75 ohms.’ But it’s basically the way a developer is using the object can be different from how the data is stored and joined in the tables in your database. Especially if you’ve set up the tables in a way that’s not like contradictory to how it’s being used all the time. It might be slow and maybe reshaping your data might speed that up.

And then “potential for reduced performance”. This isn’t surprising to me. If you stick some code in the middle, it’s not free.

Also, “shifting complexity from database into the application code. This is something that I didn’t quite understand right off the bat, but if you think about it, it’s not too bad. Databases are complex pieces of software that have stored procedures and a whole bunch of fancy joined math and stuff that might not be supported by an ORM. If you have to do that stuff you’re going to have to do it in your application instead. It’s using a database in a simpler way, but that complexity has to go somewhere. It can go in your application code.

HASHEMI: But until you get database specialists, then it makes it a little bit easier for you as a sole developer, for instance.

OKKEN: So, I punted it first and used the document database because I didn't want to think about ORMs right off the bat.

HASHEMI: But the thing is, you’re correct, a database is an advanced tool but a lot of advance-ness and complexity you retain even when using an ORM. For instance, live document databases don’t have great transaction models, don’t have great multi-version concurrency models. So, when they put all that work into post GRASS or MariaDB or something like that, just by using an ORM it seems almost as simple as a document database but you get that operational feature.

OKKEN: Yeah, I’ve definitely heard of SQLAlchemy but I hadn’t heard of a couple of the others that he listed here. Peewee and Pony and SQLObject. Have you used any of these?

HASHEMI: Yeah. SQLAlchemy is definitely my go-to, and I’ll talk about why in a second but yeah. I’ve used Django’s ORM because I did the Django tutorial and one of the first things they teach you is Django has a serviceable ORM, but there’s some issues with it that SQLAlchemy actually does a much better job with. I have used Peewee, in fact. I like Peewee. It’s sort of like a simplified version of Django. In my opinion it basically says, look, if you’re not going to be SQLAlchemy then you can just be plain and simple. And it does a pretty good job. But these days, SQLAlchemy has gotten so good that I just reach for that every single time I work with a relational database in Python.

OKKEN: Okay.

HASHEMI: One thing that SQLAlchemy has is this working copy of all the models, and they end up being kind of like singletons within a given process space. So, with Django, you can actually get two copies of the thing from the database within the same request or the same process. That means that basically, concurrently somewhere is in your program, it could change something, save it. And then when you change the request handler you’re actually trying to work on, that will overwrite the previous change. Like if you change column A in one thread and column B in another thread, whichever thread that saves first is going to overwrite the other unchanged value. So, there’s setting that’s off by default, I think, in Django called Atomic Requests and you have to enable that to prevent that sort of situation.

But Django is not alone in this. I think Rails, for a very long time, did this same thing. And Django, of course, is sort of Python’s response to Rails.

OKKEN: Does SQLAlchemy not have this problem?

HASHEMI: So SQLAlchemy doesn’t have this problem because you only get one copy of that thing in your system. It has a local index of primary key to the object version of that row that you’re representing, for instance. So, yeah. SQLAlchemy adds a lot of machinery. It makes SQLAlchemy a little more complex but, I had a friend who spent days tracking down this issue with Django and with SQLAlchemy it never would have happened. You pay some upfront costs with SQLAlchemy, but I think it’s definitely worth it.

When it comes to this sort of ORM thing, like if I can provide some general advice, ORMs are the tools of applications. If you want to form a real opinion of Object-relational Mappers, you should compare applications. I spent a fair amount of time reading Reddit source code, which, I think, uses SQLAlchemy. And I think it uses it without the declarative object mapper, it uses it with the legacy, or lower level SQLAlchemy tools. But you still get a real sense for where they use an ORM or where they don’t. And SQLAlchemy also makes it very easy to pass through normal SQL texts. That's another thing I really like about ORMS are an abstraction that’s useful 90% of the time. For that last 10%, you really want the full power of the driver or the database itself.

OKKEN: Cool. I don’t have any opinion on these extra couple links I put in here, but Matt has some dedicated pages from SQLAlchemy and Peewee. One of the things I like about Matt’s site anyway, FullStack Python, is he gives his opinion and information when he has it and when somebody else has already explained it well enough or better, he just links to their stuff and says, ‘Go read that.’

HASHEMI: Absolutely, he’s a real team player in that regard. I also have to give a shout-out to him. He so consistently adds to the site, it’s become such a tremendous resource for someone who wants to develop an application. I’m sure the listeners of this podcast are, for the most part, already aware of it, but definitely check it out.

OKKEN: Definitely. Well, that’s all of our topics. We didn’t address what you’re up to lately, other than helping out with podcasts.

HASHEMI: (Laughs) It’s funny, I’m also prepping for another podcast as well, partially examined life I guess. What am I up to lately? Well, I gave a talk at PyBay and because it was based on a blog post I thought it would be easy to put together slides, it still took – full-disclosure here - it still took another 40 to 50 hours to make slides from that blog post. But it seemed really well received, so I’m very relieved right now. I’ve got some nice life events coming through. My parents are coming to town, keeping me real busy. I also am working on this Hyperlink library, like I mentioned earlier, URLs in Python and it’s used by Twisted and some other big projects. Fixing bugs in there is always kind of a contentious, which is why I got a lot of support for people who work on things like setup tools, which is even more widely used. Beyond this, let’s see, writing blog posts, I think my drafts count is up to like a hundred now. Maybe more conferences, more talks. I don’t know why I keep signing up for these things but it’s great meeting people out there. People out there should relax look into PyBay and regional conferences, meetups. I run a meetup, too. The Pyninsula Meetup, the hottest new meetup in the Bay Area, Silicon Valley.

OKKEN: Pyninsula? That’s a terrible pun.

HASHEMI: (Laughs) Eh, this is programming, man. It’s all about the terrible puns. (Laughs) So, Pyninsula. I think we even have the site now, Pyninsula.org, and we’re on Twitter and so forth. I do my best to record the talks, but for people who want to break into this type of speaking and that sort of thing, look no further than your local meetup. Go make a 15 minute, 30 minute talk, see how it goes. Iterate on it. Have a brown bag at your company, just keep iterating on it. Something will stick and you can submit to something like PyCon or whatever.

OKKEN: That’s a great idea. I think a lot of people think that you just have to work really hard on a talk and give it once and then it’s done, but a lot of people give them several times.

HASHEMI: Yeah and also, if there’s not a meetup in your area, maybe start one. I think programmers are literally everywhere. Even though there’s a South Bay Python meetup, it’s more toward Sunnyvale, south of Mountain View area. And there’s this SF Python meetup, which is up in San Francisco. We put one right in the middle and I guess California traffic’s bad enough we have a captive audience, literally. But we’ll get, I think when Guido came there was almost a hundred people at the meetup, and normally we get like 50. But it’s great. Everyone can socialize and something a little more intimate. It’s a little less stressful when you’re giving the talk yourself, too.

OKKEN: So, it wouldn’t be a Python Bytes episode if I didn’t plug my book.

HASHEMI: By all means.

OKKEN: One of the things I want to bring up is that Python Testing with pytest has a nice discussion forum. It’s kind of built-in to what Pragmatic offers for all the books. But if you ever ask a question on there, it pings me and emails and says there’s a question. Just this morning I answered a question. Somebody got on and said that the book is helping them understand testing better and I love comments like that. But he had a questions about Monkeypatch versus Mock and I’m not going to get into it too much here, but I did reply to him and it’s all up there for everybody else to read, too. So, I’ll have a link in the show notes to that.

HASHEMI: That’s great. Those sorts of comments really keep you going. I wish that my O’Reilly thing has such a discussion forum. Instead I got my feedback through reviews for awhile. (Laughs) Emails, too. People email and I appreciate it.

OKKEN: Yeah, I get it from all over the place. I get it through the discussion forum, I get it from Twitter and we’ve got a Slack channel and people come and tell me what’s wrong in the Slack.

HASHEMI: I’ve been really into riot.im, which is a Python-based, Open Source Slack sort of thing. There’s also Zulip which is everywhere these days. They’re doing an amazing job.

OKKEN: What’s the first one?

HASHEMI: So, riot.im. It runs a protocol called Matrix and it’s a very large thing. It's basically like you can have an encrypted chats with people who are on it. I use it because it’s an IRC bridge. I guess if you want to be in this sort of inner circle and see the goings on, IRC is still very much alive. So, you’ve got your listservs and IRC and so forth. Riot makes that pretty easy to get into. There’s a free node bridge and you just join a free thing and you can look at IRC through your browser while having encrypted chats with your other friends. It also has peer-to-peer video chat that works really well because it’s just the webRTC Open Source protocol. It works great in Firefox.

OKKEN: I’m going to cut you off because we’re running long.

HASHEMI: Oh, wow, we’re way long.

OKKEN: I think this is an awesome topic. I think you should come onto Testing Code and we can talk about IRC and communication channels. That’d be fun.

HASHEMI: That’s actually a great idea. Yeah, for sure. I’m always coming up short for topics, but here we are just chatting. That’s a great idea.

OKKEN: Again, thank you so much for coming on. I love having new voices on here.

HASHEMI: It’s been my pleasure and thank Michael. When he gets back I’ll send him an email. This has been great.

OKKEN: Keep in touch.

Thank you for listen tin to Python Bytes. Follow the show on Twitter via @pythonbytes. Get the full show notes including links at pythonbytes.fm. If you have a news story you’d like featured, visit pythonbytes.fm and send it our way. We’re always on the lookout for sharing something cool. This is Brian Okken. On behalf of myself and Michael Kennedy, thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page