Brought to you by DigitalOcean - grab your $100 credit and deploy your first project for free


« Return to show page

Transcript for Episode #3:
Python 3.6 is coming, and it's awesome plus superior text processing with Pynini

Recorded on Monday, Nov 21, 2016.

Michael KENNEDY: Python headlines and news delivered directly to your earbuds. Episode #3, recorded November 21st, 2016.

Hey, Brian. Hello, everyone.

Brian OKKEN: Hey. (Laughing)

KENNEDY: Hey, man. Great to chat with you. I’m really loving this show we’re putting on, this is so much fun.

But before I get to the headline actually I have some good news about the show.

OKKEN: What’s that Michael?

KENNEDY: Python Bytes is now brought to you by Rollbar. They tale the pain out of errors and we’ll hear from them later. Thank you Rollbar for sponsoring the show. That’s really awesome. We appreciate it.

OKKEN: That’s really awesome. Thank you, Rollbar.

KENNEDY. Definitely. Alright, let’s get to the headlines.

Every week there’s so much cool stuff happening, right?

OKKEN: Yeah, and this last week, I just made a huge list of articles I read, Python-related that I really enjoyed. It was hard to pick out with one to talk about.

KENNEDY: Yeah, it really was. But you had to pick, right? Because we’re only doing 3 each. We’re trying to keep this short and on topic.

OKKEN: Yeah, and I thought maybe we picked the same three, but we didn’t. So, first up I’ve got an article titled “How to Get Super Text Processing in Python with Pynini” I think I’m saying that right. Or maybe it’s ‘nee-nee’ I’m not sure.

KENNEDY: Pynini?

OKKEN: Like the “Knights of Nee” from Monty Python. This is from the O’Reilly Blog and it was written by Kyle Gorman and Richard Sproat.

KENNEDY: Both who are PhDs in Linguistics of some variety or other, right?

OKKEN: Yeah, and actually because of that and because of the title, I was afraid this was going to be a hard thing to get into but it’s a really easy article. And my interest in it is mostly because when I read that The Pragmatic Programmer a long time ago, it introduced me to the concept of mini-languages to solve simple problems with ‘This is the data I have’ and how to change it into the data I need. Often, I use regular expressions for things like that, but sometimes that’s not enough. The next step always seemed to be to build up a parser or an abstract syntax tree, that’s like way to far that I want to go. It think this article produces a nice halfway ground, talking about what they call, “finite state transducers.” It sounds like something in the Back to the Future movie.

KENNEDY: ‘Quick, Marty. Connect the finite state transducer!’

OKKEN: (Laughing) Yeah, exactly. But I like working on finite state machines and apparently, this is a finite state machine turned into something else. The article starts with a small introduction into what finite state machines are. They’re really easy to get the grasp of; if you look at the article there’s some nice pictures. And then they talk about how that applies to text processing and then introduces a concept of “finite state accepter” which just accepts things and then transducers spit things out, I believe I’m getting this right. How to translate your thinking from regular expressions to these. And anyway, it goes through some of this and it gives an example using cheese names, which I love the Python tradition of using cheese names in examples. Then it introduces a library called Pynini from Google to help build these things up. At the end, I kinda got a little bit lost, but I wasn’t working through the examples myself. It is something that I think I might spend an afternoon poking through.

KENNEDY: Yeah, it is quite interesting, right? I mean, one of the things that makes me really happy about looking at this is 1) it’s not too complicated and 2) it gets you away from regular expressions, which anything that gets me away from regular expressions puts a smile on my face.

OKKEN: I actually really like regular expressions, although it is true there’s often times text processing where you just, it doesn’t work when you can’t do it line by line. I’m probably not going to use this right away but I appreciate this tool introduction so I can put it in my tool chest and next time I have a problem like this I might look this thing up.

KENNEDY: Right, you guys will be looking at some data and thinking, ‘Oh, you know what we need? A finite state transducer. We’re so gonna solve this.’ (Laughing)

OKKEN: I totally want to do that in a meeting. ‘I think we should put a finite state transducer in here.’ (Laughing)

KENNEDY: Fantastic. It does things like pluralize stuff. You can teach it; for example, you taught it what all the cheeses were and it could like, locate sentences that were talking about cheese and do lots of interesting processing. Not just pattern matching. It’s cool.

OKKEN: It isn’t but it does start out with like a simple example of like, if I tell it all the cheeses to look through the code and find the cheeses and add a cheese tag around it or something. There are simple problems that it can solve as well.

KENNEDY: Yeah, it’s great. I have a (UNCLEAR 4:49) toolbox.

Well, you know where I’d like to run it though? I don’t really want to run on Python 3.5. Definitely not on 2.7 anymore.

OKKEN: Ah, yeah. Nice transition there.

KENNEDY: Thank you. I want to run that on Python 3.6. You guys, there is amazing stuff coming in Python 3.6. They just released Beta 4.

OKKEN: Beta 4? Wow.

KENNEDY: Yeah, I don’t know what the exact release schedule of the final release is but I think it’s coming up pretty quickly. These things are looking pretty solid. Let me just give you a sense of some of the PEPS, Python Enhancement Proposals, that are coming.

First up, it seems like no programming language these days is content with their string formatting. And Python is like, Look we only have like 8 ways to format strings, we need another. But this new way is actually pretty good. There’s a PEP 498 and what it is about is formatting string literals. So, imagine you’ve got a string and normally you might, there’s a variety of ways to do it. Maybe the newer way would be to do something like, I wanna say ‘Hi, my name is’ and you have a variable called ‘name.’ So, you might say, “Hi my name is}0”.format name. Well, it kind of takes this idea and just says, Why do you have to do the .format and arguments and orders and then number them. Why don’t you just put the name of the variable in the string? So, now you can say Hello my name is “ name, and whatever the value of name is, if there’s a variable that will be put into the string as if you did the first thing I said.

OKKEN: That’s really cool.

KENNEDY: Yeah, it’s pretty cool. The only bummer is you have to indicate that it’s this new format string. I think it’s an F prefix on the string. Because otherwise you’d have dictionary stuff mixing in. Just for compatibility it’s got to have an identifier, but other than that it’s a really cool way to format stuff.

So, we also have another easy to understand but really nice one. PEP 515, which will let you put underscores numerical literals. So, if you have a number like 100000 and you look at it. ‘Is that a million? Is that one hundred thousand? Is it ten million?’ It’s hard to know what the separators are. Now you can put underscores, so you can have 1_000_000 and that would basically just ignore the underscores but it allows you to visually see that’s a million and not one hundred thousand.

OKKEN: That’s kind of cool.

KENNEDY: Yeah, it’s pretty cool. It’s not going to be world changing.

One thing that is more significant is in Python 3.5. we have type annotations. So, I can have a function and let’s say it says the functions name is create and it has a thing called entity. Well, if you’re given that function, you could be like ‘I have no idea what this does. Entity maybe has to do with databases.’ But in Python 3.5 you can say :customer where customers a class of something, and then the tooling and whatnot would know to give you the right autocompletion and various type checking and whatnot for that type. But that only applied to functions, you couldn’t put it on variables and other types of class fields and other things that are outside of functions. This PEP 526 brings that to functions.

OKKEN: That seems obvious. I don’t know why they didn’t have that at the start.

KENNEDY: Yeah, it definitely seems obvious. Maybe you don’t care too much but tooling will care. Linters might care. The tools you use could possibly get better because of this.

So, we have 525, asynchronous generators. Async in a way is awesome. Yield, yield from are awesome. But they were not friends previously. You couldn’t use them together. You could either have async coroutine or you could have a generator but you couldn’t have both. Now you can have both. Similarly, we have asynchronous comprehensions.

You know, actually one of the biggest things is the change to dictionaries coming in Python 3.6.

OKKEN: What? Oh, yeah right.

KENNEDY: Python has had dictionaries since the very beginning. When you add field or attributes to a class it’s actually stored in under dict. Dictionaries are the key word value. All over the place, dictionaries play a super important role.

Well, in this release from 3.5. to 3.6, dictionaries are now 20% to 25% more memory efficient. They should be a little faster. They are ordered. So, the order that you put things in they remember that. When you first start working with dictionaries, you’re like, multiple dictionaries can have the field show up at different times and different orders. It’s bizarre, it can just keep switching. Now it’s consistent.

OKKEN: The order dict (UNCLEAR 9:33) part and making them fast so it’s good. It’s all good stuff.

KENNEDY: It’s good stuff. We’ve got a new secrets module coming so you have cryptographically strong random number generators. Windows, they’ve now marked the Python executable as long path aware. I don’t know if you guys are aware but Windows typically has crummy 260-character path limits for a long time. There’s a lot of legacy stuff that applies to newer code. So, now long path limitations are gone on Windows. If you write code that has to run on Windows, then you’ll care a lot about that.

OKKEN: Yeah. Actually, I try to write code that I don’t have to care about Windows.

KENNEDY: (Laughing) Yes, exactly.

If you look through the link, the release notes, the documentation there, there’s a lot more stuff. I’m scratching the surface on all the cool stuff coming in 3.6. If people needed one more reason to upgrade from 2 to 3, well, here’s like 20 or 30 more reasons.

OKKEN Yeah, and I like the PEPs that you talked about. You left the links to those particular PEPs in the show notes, so that’s great.

KENNEDY: Yeah, definitely check them out.

OKKEN: Okay.

KENNEDY: Hey, Brian.

OKKEN: Hey.

KENNEDY: Before we get to the next headline, let me tell you about Rollbar.

OKKEN: Well, tell me about Rollbar.

KENNEDY: Alright, man.

So, Rollbar is sponsoring Python Bytes and we’re really excited about it. I use Rollbar for my website. Rollbar is a package you install on your Python web app, Pyramid, Flash, Django, whatever. Anytime there’s an error, you get an immediate notification and detailed error reports. Event thought the Talk Python websites handle almost 2 million dynamic HTTP requests per month and they transfer 4 to 5 terabytes of data, I’m not worries about them.

OKKEN: Really?

KENNEDY: No, not really. Because if something goes wrong my slack, my email, my phone, everything will be blowing up with notifications. I don’t get many, but when they come in all the info is there. I often don’t have to debug it, I just go fix the problem. It’s pretty awesome, right?

OKKEN: That’s very awesome.

KENNEDY: I love Rollbar. So, the Python Bytes listeners, all you guys out there. You can have the same piece of mind. Just visit Rollbar.com/pythonbytes and sign up for the free tier. You get like 300,000 errors tracked, although I really hope you don’t have that many.

OKKEN: (Laughing) Yeah, definitely. You should do more testing if you do.

KENNEDY: You definitely should.

OKKEN: Well, next up for me… I really like efficiency, developer efficiency. It’s always kind of been important to me. And part of that is being able to set your personal development environment to help you succeed. I know it’s different for everybody, and I’m always tweaking mine.

I like articles when people write about them. There’s an article by Dougal Matthews called, “Create an Excellent Python Dev Environment.” He goes through installing Python, but everybody knows how to do that. But he’s got some goals, like don’t muck with the system Python and set up virtual environments easy. One of the things I think is about this article is to start off the bat saying why he has particular goals for why he’s chosen things and that’s kind of neat.

KENNEDY: Yeah, that’s cool. Because you might look at it and go, ‘I would never do that’ but maybe your goals are different than his. You can really understand where he’s coming from because of that; that’s cool.

OKKEN: Yeah. Actually, the first part confused me because he was covering a project that I’d never heard of called pyenv. I’m more used to veenv, which just comes into 3.6, or virtual environment. So, this is something different. I’ll just have to check it out. It’s something like virtual environment. It’s independent of Python. Sets them up separate, I guess?

KENNEDY: Yeah, but I haven’t used it that much either. I might be a little off here, but I think it’s for managing and creating and activating different virtual environments. It’s like your virtual environment manager. I think.

OKKEN: Oh, okay.

Next is a py- and virtualenv, which is apparently similar to a virtualenvwrapper. Which a lot of people use. I don’t use either of those but I still don’t get what they’re for. Straight virtual environments work fine. I don’t understand what they need more for.

KENNEDY: You have different goals.

OKKEN: Yeah, that’s probably it.

The next thing he talks about is a thing called pipsi, which I hadn’t heard of and I just used it today. It’s a tool that will let you install Python-based command line utilities in their own virtual environments, so that you can just set your path to them. You can use them on a command line from anywhere. That’s really cool, just to have these little virtual environments for a command line utility.

KENNEDY: That’s really cool. I’d never really thought of doing it that way. That’s great.

OKKEN: When I was talking in one of the interviews I’ve done, I heard about more people using Atom than before. I tried to set up Atom and I needed to give it a path to flake8, and I’m like, ‘Oh, I’ll use pipsi to set up this so you can see flake8.

But he does talk about, with pipsi, he usually includes tox for testing multiple environments which I highly recommend; mkdocs, which I think is a static site generator communication thingy. I’ve never used it; git-review, which I would totally use, it’s a gerrit integration for git but I don’t use git right now at work. I don’t actually do code reviews on my own code, in my free time. And then flake8, of course, which is my favorite recommended way to static analyze stuff.

As a side note, one of the things he was talking about for this pyenv, is because it’s compatible with fish which is a different shell that I’ve never heard of. I use bash. I tried Z shell for a while. I always think it’s fun to try new shells. Maybe I’ll give it a shot.

KENNEDY: Yeah, it’s definitely worth checking out. Fish sounds cool. I use Z shell, as well. I just actually formatted my computer and I’m like, ‘I’m going to try a different shell this time when I set everything up.’ I’ll check out fish.

I so love to see the way people put these environments together and I could really help people who are knew. It could help you discover things you don’t know about like pipsi, that’s really great.

I was just stuck in traffic this morning and was thinking, for some reason, about this stuff. Other people, I don’t know what they daydream about, but this is what I was daydreaming about. I might try switching to conda, instead of pip and trying using conda for managing all these environments. I really like the way conda vets the packages and pre-compiles them for you. I was reading about something and it was like, ‘as long as you have this compiler setup it will probably be fine but it might not work so well.’ I’m like, ‘You know what, maybe I should use conda and all those problems are solved by the continuum guys.’

I’m also doing a lot of virtual environments with Python 3 veenv and the one thing I like that I’ve stopped using the main Python repl and I’m using pt Python. Do you know pt Python?

OKKEN: pt Python? No, I don’t.

KENNEDY: pt Python is basically the regular repl, but it has a lot better editing and history and autocompletion and all sorts of stuff you can do. For example, if you type function in the regular repl, you know it’s like three lines with indentation of whatever. If you want to edit it, you’ve got to remember to always replay them in the right order, up arrow 4 times, enter, up arrow 3 times, make the change… that kind stuff. This will, if you hit up arrow for code suites, it will bring the whole thing back and you can edit it. Stuff like that.

OKKEN: Oh, okay. Do you work in the repl very much?

KENNEDY: Not too much. Usually, I’m in some kind of editor. When I am there, I’ve been moving to pt Python; it’s pretty cool.

OKKEN: I think I’ll check it out.

KENNEDY: Yeah, check it out.

OKKEN: So, what’ve you got next?

KENNEDY: A lot of people have been checking out Python lately. There was an interesting article. It was an article that was posted as a GitHub repository, which is accompanied by a go program that’s generated a bunch of statistics and graphs. The thing I’m actually linking to is a Reddit article with 350 votes and tons of comments, you can check out the conversation.

But the headline is awesome. GitHub language statistics: Python is the 2nd Most Popular Language on GitHub for Active Repositories.

OKKEN: That’s awesome.

KENNDEY: Yeah. Python’s growth is just continuing and expanding. It’s such a cool time to be involved in the language, and here’s just one more stat to back that up.

OKKEN: Yeah. Looks like the graphs you have to click through from Reddit to the GitHub page.

KENNEDY: Yeah, I’ve been linking to both.

They say, ‘We’re going to study active repositories.’ Because it doesn’t make sense to think about GitHub statistics if something was put there 5 years ago and hasn’t been touched. So, they said, ‘Let’s consider the set of active repositories. Those are created or updated during the period and the period is a bunch of sates like throughout the last three years, you’ll see when you get there. It has to have at least one star so somebody has to care about it. Somebody has to care enough to have forked it and it must be more than 10k. It can’t just be like, ‘Well, I put up this one file and somebody forked it,’ you know, something silly like that. It’s got to be like a real project. So, if you take all of those together, JavaScript turns out to be the most popular. But JavaScript, I’m sure, is highly counted. I’m pretty sure that’s generally true on GitHub statistics; almost every one of my Python projects has some JavaScript. You know what I mean?

OKKEN: Yeah, since it’s the front end, almost every project has it.

KENNEDY: Almost any web project in any language is going to be counted as JavaScript, to some degree, in the language category. So, I feel like JavaScript, while it is extremely popular, I think it’s massively overcounted in these types of stats. It seems to appear that way in the graph. But Python displaced Java, which we can all cheer for that, right? That’s awesome. But let me give you some stats that are pretty interesting; I think this might blow your mind.

So, back in November 2014, there were 1,790 active repositories by the 4 criteria buff that were Python. November 2015, there were 2,500 and if you look throughout the year 2015 and early 2016, it would vary but there’s 2,000 to 2,500/2,600. By November 2016, there’s 10,944 active repositories. It’s growing like crazy.

OKKEN: Yeah, I love seeing that. One of the things I like I think has changed in that timeframe – I don’t know if it can be attributed to this – but the cookie cutter project has made it a lot easier for people to setup something to share on GitHub.

KENNEDY: Yeah, cookie cutter is awesome. If you guys don’t know about cookie cutter, check it out. It’s a cool project. I’m sure it’s not hurting.

OKKEN: Anyway, that’s good. Also, it makes me think that I spent the right time learning a new language. I started from C++.

KENNEDY: Absolutely. It’s good to be in a place that’s growing. I mean, obviously take these stats with a grain of salt. There sliced in all these various criteria and so on, so they might not be perfect but they give a glimpse into a world that is doing very well and is very fun to be part of. So, I’m happy to see it.

OKKEN: Okay, we’re up to my number 3 pick, which is “Handling Unicode Strings in Python.” It is from Juan Lee Song. I’ll link it, of course, in the show notes. I like it because it’s a nice article to introduce people about the concepts of text representation in Python. It hurts my head sometimes because I came up with the 2.7 Python World and I think anybody who learned with 2.7 and had to switch to more recent 3.x version, sometimes it’s hard to understand how the switch happens. There’s a nice table for some of the things that are different. Then it goes through specifics of dealing with unicode in things like IO boundaries to databases or services. And then also, a few other examples are dealing with it with logging, JSON encoding and Redis. I nice pointer to a 2012 article from Ned Batchelder called “Pragmatic Unicode” which I hadn’t read before. I learned a lot about unicode in this last week.

KENNEDY: That’s great. If you have to juggle that stuff, it can be tricky. I remember talking to some of the Web Foundation guys. I think it was Kenneth Wright and Request but it could have been Armin Ronacher and Flask. I’m not sure, I think it was Kenneth. One of the most challenging things from upgrading their projects 2 to 3 or supporting 3 is a dramatically different way that bytes go to strings, in all this kind of stuff. So, if you’ve got to upgrade something, like 2 to support 3, be really careful of the strings.

OKKEN: Yeah. This is another one of those, I’m going to bookmark it and if the next time I come up with an issue I’ll look through this and see if it can help me figure out how to fix it.

KENNEDY: Yeah, awesome.

OKKEN: What have you got as your last one, Michael?

KENNEDY: My last one, I think this one may be a little surprising for you.

So, I ask many people what they’re favorite editor is over at Talk Python to me. I’ve heard many answers. It would be fun to go back and actually do some data analysis and graphs. I would say sublime text is extremely popular. Emacs, very popular. Vim, it might be slightly beating emacs. Pyterm is pretty popular. People hear me go on and on about pyterm, I love pyterm.

OKKEN: Really? I didn’t know that.

KENNEDY: Most people don’t know this about me but I actually do use it. I do love it a lot.

But one thing I came across was, I was fooling around with Visual Studio Code. This is not visual studio from Windows, the big .net thing. This is a lightweight editor. It’s based on, I think, the same underlying stuff as Atom. It’s using Chrome as its underlying engine, based on some client-side html stuff. It’s a free editor. It’s growing quite a bit in terms of its extensions you can get for it. It’s got cool git integration and now it’s getting better Python integration.

So, my pick is this thing called Python extension for Visual Studio Code, it runs on Mac and it runs on Windows and Linux and so on. I have a link to the one there. What’s really interesting is, the bunch of features. If I had to ask you, Brian, how many times do you think this thing has been downloaded?

OKKEN: Uh, 23.

KENNEDY: (Laughing) I would guess, thousands, 20-30,000 times. How many people are installing full Python support. This is not just standard completion. If you were writing a package, it understands your package and it will tell you exactly all the names of all the elements symbolically in your package. It will do AST-based refactoring, really cool stuff. Anyway, 850,000 installs.

OKKEN: That’s actually incredible. Now I want to go try it.

KENNEDY: When you think of Python editors, almost a million-people installed this Python support for their editor. Now, of course not every one of them is doing Python all the time and there’s not a one-to-one mapping between developers and editors, but this is way more popular than I realized. So, I thought, ‘Why don’t I play with it?’ I played with it this week and I’m pretty impressed for lightweight stuff. I think this is my new editor.

OKKEN: I actually just installed Atom this morning. Maybe I’ll try this one out as well.

KENNEDY: Yeah, it’s really similar to Atom so it’s worth giving it a try. It has linting and a bunch of linters you can configure, including flake8. It has full-on autocompletion as well as support for PEP 0484, I think. I had Jupyter support, indenting, code formatting, refractoring, code navigation, debugging. You can remotely debug over SSH to a Flask cap. That’s pretty serious for a plug-in. That’s awesome. Unitest, all the various tests you were talking about. Lots of good stuff. This is one of the unsung heroes of the editors, so I thought I’d highlight it.

OKKEN: Cool. And maybe next time you ask somebody they’ll say, ‘I use Code.’

KENEDY: ‘Oh yeah, I use Visual Studio Code all the time.’

Anyway, it’s pretty cool. I’m excited to try it out.

OKKEN: Okay, I think I’ll try it out, too.

And that brings us to the end. But we do like to catch up with what both of us are up to. Michael, what are you up to lately?

KENNEDY: Well, I’ve been continuing to crank out the Talk Python to Me episodes. The last one I just released was “Parsing Horrible Things with Python” which I wanted to make sure I pointed out here because I think it’s a really interesting tie-in to your first pick of the week. It’s another way to parse horrible things which I think is really powerful and interesting. I released that with Eric Rose and that was cool.

OKKEN: That was a really interesting episode. I really enjoyed it. Eric is, obviously, a brilliant guy. That was good.

KENNEDY: He was a great guest and I definitely enjoyed talking to him.

You last week had asked me if Awesome Python, the GitHub thing was the same as Awesome Python, the newsletter, and I said, I think so. Now I think I might be wrong. The Awesome Python newsletter comes from python.libhunt.com, which is a really cool place for Python news. I think one or two of the news items throughout the podcast, not this one but over all of them, I’ve gotten from that place.

That’s my picks, what’s up with me.

OKKEN: Python is so awesome we like to slap it on everything.

KENNEDY: It is awesome.

How about you? What are you up to?

OKKEN: I’ve been doing a lot of writing. I’ve got an editor deadline in about a week, exactly a week from today. I’m trying to put on full steam and it’s the pietest book. I’m trying to get it ready so that it’s printable and I can take a stack of copies with me to PyCon in 2017 in May. That’s what I’ve been working on.

The last episode that of Testing Code is recorded and partly edited. This was number 25 with Dave Hunt and about Selenium and there’s some really good content mixed in with some really stupid questions that I’ve asked. I need to go and cut out the stupid questions, then it will be good to go. Hopefully, we’ll get that out in the next couple days.

I’m really excited about, I have an interview tomorrow scheduled, hopefully it goes through, with David Hussman. He runs a place called DevJam Studios and he talks about Agile and he goes around and speaks to people. He’s a really good speaker so I’m looking forward to talking to him.

KENNEDY: That will be awesome. A slightly above the code talk about testing and whatnot.

OKKEN: Yeah, I don’t even know if he’ll get into testing. One of the things he likes to talk about lately is story mapping, which is sort of high level planning for an entire project above the writing stories level, and that’s pretty neat.

KENNEDY: Yeah, it definitely preceded code.

OKKEN: Yeah. Well, anyway, I really enjoyed talking to you this week. I look forward to seeing what people put out in the upcoming week.

KENNEDY: Thanks, Brian. I enjoyed it as well and I learned about some cool new things. I’m definitely checking out pipsi.

Thank you everyone for listening and we will see you next week.

Bye, Brian.

Back to show page