Brought to you by Michael and Brian - take a Talk Python course or get Brian's pytest book


Transcript #57: Our take on Excel and Python

Return to episode page view on github
Recorded on Tuesday, Dec 19, 2017.

00:00 Hello and welcome to Python Bytes, where we deliver Python news and headlines directly to your earbuds.

00:05 This is episode 57, recorded December 19th, 2017.

00:09 I'm Michael Kennedy.

00:10 And I'm Brian Okken.

00:11 And we have a bunch of really cool stuff to share with you.

00:14 As always, we've been looking through all of the news sources and finding all the Python goodies for you.

00:20 Brian, what's the first goodie that you found?

00:22 Anthony Shaw, or I know him as Tony, has at work on GitHub.

00:27 I like his handle, Tony Maloney.

00:30 But he has been working on a tool called Retox.

00:33 And this is a tool you can really run anything with it.

00:38 But the original intent is to run your tests.

00:41 What Tox does is a tool that will run your setup if you have packages.

00:48 And you can turn that off if you don't want the setup to get tested.

00:52 But it'll test your setup and then run it in different environments.

00:55 And the typical example is running multiple Python versions.

00:59 Running all your tests in multiple Python or multiple library versions to make sure all your tests pass.

01:05 And there is a Detox version that can distribute that across processors.

01:11 Can speed it up two to three times or two to four times faster or more if you run it on distributed processors.

01:18 Or if you spent like, what, $15,000 on a new iMac Pro and got 18 cores?

01:23 Yeah, yeah.

01:24 You could be 18 times faster or 16 or something.

01:27 Anthony put together Retox, which does all this, but also does it with a nice GUI so you can watch your tests run, which is really kind of cool.

01:38 And he did it for Python 2 at first and then very quickly within a week, ported it to Python 3 with some problems along the way.

01:48 But he worked through them, which is nice.

01:50 Yeah, well done.

01:50 It also has the cool capability of watching a directory.

01:53 So you can have this GUI sitting up on a monitor somewhere if you've got a couple monitors or in the corner of your window.

02:01 And then it'll watch, you can have it watch your source code and when you, as you, as you save changes, it tests everything on multiple versions of Python or multiple hardware or whatever you want to do.

02:12 Yeah, that's really, really cool.

02:13 Very cool.

02:13 You know, I'll just describe kind of what it looks like.

02:15 So you run it in terminal or command prompt and you get like an ANSI art type of interactive thing.

02:24 And it shows you different columns for the different tests that are all running like Python 2.7, Python 3.6, Lint, PyLint, all these things running in parallel and their status.

02:33 It's pretty cool, right?

02:34 Yeah.

02:34 And one of the things I was trying to do this morning, getting a chance to quite get it done with meetings and all.

02:41 But instead of running Python 2 versus 3 and stuff, what I'm going to use it for is running the same set of tests with the same Python, but against different instruments.

02:51 So I can run across multiple hardware.

02:54 That's cool.

02:55 So you're parallelizing the same code running, but against, you're parallelizing the hardware, not the versions of Python.

03:02 Yeah.

03:03 Yeah, that's pretty awesome.

03:04 That'll be good.

03:04 Yeah, that'll be good.

03:05 So I thought this episode, I'd talk about some of the tools that I've been using for a little while that I really enjoy, but just haven't thought to bring up on the show.

03:14 So the first one I want to talk about is a database management tool for MongoDB called, it used to be called Robo Mongo.

03:20 It got bought by a company, which is an interesting story in and of itself, by a company called 3T.

03:25 And so now it's Robo 3T, but in my heart, it will always be Robo Mongo.

03:29 Anyway, what this thing is, is a shell for MongoDB that you type.

03:34 So a command line type of interaction with it, but all the results appear as a GUI result.

03:40 So you have this kind of part where you type in and you run commands against Mongo, but all the results show up in maybe a tree view, or you can right click and interact with them once the results come back and stuff.

03:51 So it's really, really nice if you're doing anything with MongoDB, runs on all the platforms, check it out, Robo 3T.

03:57 Nice way to visualize your database.

03:59 That's cool.

04:00 Yeah, it's one of the best, like simple visual interactions with databases because the majority of what you do is still the CLI.

04:07 It's still the straight interface for working with it.

04:09 But all the results that come out of that interaction are GUI based, which I think is cool.

04:14 It's also interesting in an open source way.

04:16 On one hand, it's number 34 most popular repository on GitHub, which is cool.

04:22 It's built with QT, which makes it look like a really nice cross-platform app, which you could build out with PySide or PyQt, I think, and get the same looking app.

04:31 So that's also a really interesting example.

04:32 And it was an open source project that was started somewhere in Eastern Europe.

04:37 I don't remember where.

04:38 It became really successful.

04:39 And this other company bought it.

04:41 And so here's also an interesting business model around open source stuff.

04:46 Yeah.

04:46 Cool.

04:47 So anyway, if you're doing anything with MongoDB, check out Robo Mongo.

04:50 Cool.

04:51 Sometimes, we talked about in the new version of Django, you don't have to do regular expressions anymore for...

04:58 For the routes, right?

04:59 For the route definitions, which I think is a positive thing for sure.

05:01 Yeah, definitely.

05:02 But there's definitely some times where I've been using regular expressions for as long as I've been programming, almost.

05:08 So programming's always been hard.

05:09 Yeah.

05:11 Yeah.

05:11 Yeah.

05:11 Shortly into my CS program, I got thrown a Perl book and said, learn this.

05:18 So yeah.

05:19 Anyway, there's a couple articles that came out recently that I thought were really good for people that need to get a handle on regular expressions quickly.

05:27 And one of them is regular expressions practical guide.

05:31 And it's a kind of a nice article that talks through using the RE package to do things like parse email addresses and phone numbers and URLs.

05:41 And those are good examples.

05:44 Even if you don't have to do that, everybody knows what those look like.

05:47 So it's good to sort of learn some of the regular expressions with that.

05:51 Yeah.

05:51 I really like the example driven approach as well.

05:54 Like, here's how you match a bunch of different things you might want to.

05:57 And also the fact that it's using the Python libraries and not just here's a random regular expression means it's really quick to just drop in directly, which is cool.

06:06 And then there's another article called Regular Expressions for Data Scientists that do some of this.

06:11 But then also it's mostly focused around parsing text, of course, with regular expressions.

06:18 But it's also a good intro and it dives a little bit deeper into find all and search.

06:26 And so I think check both those out if you want to beef up on regular expressions.

06:30 Yeah, that's really cool.

06:31 Nice.

06:31 I like the motivational introduction for the data science one.

06:36 Sometimes, you know, this might include searching massive corpus of text.

06:40 For example, suppose you're asked to figure out who's been emailing whom in a scandal of the Panama Papers.

06:46 That's 11.5 million documents.

06:49 We need regular expressions.

06:50 Let's go.

06:50 Yeah, definitely.

06:51 Yeah, very exciting.

06:53 Very exciting.

06:53 Cool.

06:54 So before we go on, let me just tell you about the sponsor for this week's episode.

06:58 That is DigitalOcean.

07:00 Thank you, DigitalOcean, for sponsoring the show as they are, I would say, the major sponsor for the show.

07:06 They definitely sponsor it more than anyone else.

07:07 And with good reason, they have really, really cool things going on over there.

07:12 You know, there's a lot of places you can get your cloud computing resources, but a lot of them are overly complicated, overly expensive, and so on.

07:19 With DigitalOcean, go over there, quickly set up a server, set up some storage, link them together.

07:25 It's really wonderful.

07:26 So this site's website's running on DigitalOcean, among a bunch of other things that I'm running as well.

07:32 So definitely a great place to check out.

07:35 And you can get started by going to pythonbytes.fm/DigitalOcean.

07:40 So keeping with my theme of things that I want to talk about that I've been using and really enjoying but haven't bothered to bring up on the show,

07:47 to complement the other side of the story with the MongoDB thing is MongoEngine.

07:53 So you've heard of SQLAlchemy, right, Brian?

07:55 Yeah, definitely.

07:56 Yeah, so that's probably the most popular way in Python to talk to relational databases.

08:01 Well, the MongoDB equivalent of SQLAlchemy is this thing called MongoEngine.

08:05 There's a handful of these so-called document, object document mappers, because you don't have a relational.

08:12 So it's not an ORM, it's an ODM.

08:15 And anyway, there's a handful of these for MongoDB, but this one really is quite needed.

08:20 Let's you take a class, derive from a base class, and map it into MongoDB.

08:24 And the way you work with it is very similar to Django's ORM, actually.

08:28 You map a class, you create an instance of it, you call save, you can do queries on it, all sorts of stuff.

08:32 But leveraging the hierarchical document TV nature of MongoDB, it also adds new features.

08:38 Like, MongoDB doesn't really have schemas in general, like, they're starting to add some of these features.

08:44 But there's no schema definition, it's just whatever you put in there.

08:47 With MongoEngine, you have a schema that matches your classes, so you have some reliable data structure.

08:53 MongoDB has no concept of required fields or constraints, like this number must be between 10 and 1,000.

08:59 But MongoEngine does, relationships, all sorts of cool stuff.

09:03 So it really provides a lot of nice structure around working with a NoSQL schema-less database.

09:08 Oh, that's nice.

09:09 Yeah, definitely check that out.

09:10 Yeah, so I've found this to be super helpful.

09:12 Like, actually, a lot of my sites are implemented, including Python Bytes, for example.

09:15 The one thing that's a little bit of a pain is the deserialization speed is not awesome.

09:23 So if you retrieve 100,000 records from MongoDB, that's really fast, probably.

09:28 But then it'll drag on, turning those into MongoEngine documents.

09:31 But if you're getting 50 or 100, it's totally fine.

09:34 For whatever reason, it's just not that fast deserialized and stuff.

09:37 And usually it's plenty fast.

09:38 But if you do it 100,000 times, it turns out to start to show off.

09:42 Okay.

09:42 Yeah.

09:42 All right.

09:43 What you got first, next?

09:43 Another Pretty Printer.

09:45 So this is, I guess it's called Pretty Printer.

09:48 Yeah.

09:48 An article called Introducing Pretty Printer for Python.

09:52 I think it's an extensible one.

09:53 It's kind of nice.

09:54 So somebody took, it's 3.6, or three, six and above only.

09:58 But somebody was taking a look at all the different ways to Pretty Printer their output for debugging your code.

10:04 And it wasn't really happy with them.

10:06 So added yet another.

10:08 But this is kind of extensible and cool.

10:10 So you can add on, if you have a particular way you want to print your classes or your calls to different functions, you can customize that.

10:20 And it's a pretty simple interface.

10:22 And I like it.

10:23 Yeah.

10:23 It's really nice.

10:23 So if you're in, say, the Python REPL, you can ask it to print out some stuff.

10:28 And it'll actually do things like format the dictionaries according to PEP 8 with the line breaks and indentation.

10:35 It'll colorize all the values.

10:38 Like the keys will be one color.

10:39 The values will be another.

10:42 Things like that.

10:43 So it really does.

10:44 It is really nice like that.

10:45 The colorizing is very nice.

10:47 That's pretty cool.

10:48 Yeah.

10:48 I would say that.

10:48 The colorizing and also the reformatting for human readable parts is quite nice as well.

10:56 So it's very nice and has a nice declarative API as well.

10:59 So you can even extend it, right?

11:01 There's an example in there to be able to extend your call with comments even.

11:08 So let me ask you a question.

11:11 What do you think the most popular database in the world is?

11:14 MySQL.

11:14 MySQL is definitely a good one.

11:16 I'm going to say Excel.

11:17 At least it turns them out of data.

11:20 The number of people that like create new quote databases.

11:23 Like Excel is this weird thing that people who don't quite know programming or technology use it to more or less play the same role, right?

11:32 You've got relationships.

11:33 You've got formulas and whatnot.

11:35 So Excel is really super important in the business world all over the place.

11:40 Well, it so happens that Microsoft is considering replacing the macro scripting language for Excel that is built into it, VBA, with Python.

11:51 Yes.

11:52 That would be very nice.

11:53 So on one hand, I'm like, whee, I don't really want to program Excel, right?

11:56 It doesn't make me super excited or super happy.

11:59 On the other, look at what the adoption of Python in the data science space has done for Python in general.

12:06 It's really, really complemented it and made it a better place.

12:10 I think that's what would happen if Python became the main scripting language of Excel as well.

12:14 There's all these people who are not really working in this space now.

12:18 All of a sudden, they'd be super interested and contributing things that who knows what they would actually come up with.

12:22 I'm imagining this is actually pretty cool.

12:25 I'm imagining a manager working with a spreadsheet and wanting to add some macro capabilities and getting a little bit too deep into it and getting a little lost, grabbing a developer and saying, hey, can you help me with this?

12:40 And right now, if it was me, I'd say, no, you're on your own.

12:45 I don't want to do that.

12:46 I would hide under a desk like, no, Michael's busy.

12:48 Michael can't talk right now.

12:49 He's looking for someone to work on PVA.

12:51 Stay away.

12:52 But if it's a Python interface there, yeah, I'd help out.

12:55 So I think the interaction between developers and managers would increase dramatically if Python was in Excel.

13:01 Yeah, I think it would be super cool.

13:03 Who knows what the chances of this actually happening are.

13:05 I tried to get Python into Windows because, you know, it doesn't ship with Windows.

13:12 So Microsoft has this place called User Voice where you can like, but maybe it's not even a Microsoft thing, but they use User Voice, which lets you vote on features and requests and things like that.

13:22 So I actually got over a thousand people to vote for shipping Python 3 with Windows 10 before it actually came out, but no dice.

13:29 However, they did put Python into SQL Server so you can do in-process machine learning against your data.

13:36 So there's one example of them doing this, and if they'll do it for SQL Server, they may well do it for Excel, which would be awesome.

13:42 So everyone listening can have a voice on this.

13:45 If you go there, click the link, upvote the item, and there's a survey you can fill out and tell them, Python 3, please, we'll take more of that.

13:54 That'd be awesome.

13:55 Yeah.

13:55 So if this sounds cool to you, be sure to go there, at least upvote it so that they know this is something we all want.

14:01 And call your congressman.

14:03 That's right.

14:03 No, that was net neutrality, which didn't really help so much.

14:07 But this might, this might, so that's awesome.

14:09 All right, well, that's our items for this week.

14:12 How about you?

14:13 Any personal news?

14:14 We've been getting over, still getting over being sick.

14:16 So we have a tree up now, but we haven't decorated it.

14:20 So that's our decorating for Christmas.

14:22 That's awesome.

14:22 That'll be a fun, fun thing.

14:24 Kids going to be home from school pretty soon?

14:26 They're all home now.

14:27 So that's fun.

14:28 So work is super productive these days.

14:30 Yeah, I can actually get to work at a decent time.

14:33 So that's good.

14:34 Cool.

14:35 All right.

14:36 Well, for me, I have a webcast that I'm doing.

14:39 And I guess one more MongoDB thing.

14:41 So I'm doing a webcast that I call Let's Build Something in MongoDB and Python.

14:46 I don't have to click the link.

14:48 I don't remember.

14:48 I think it's February.

14:49 It is February 22nd.

14:52 So there's a link at the bottom of the show notes that if you're interested, you can come attend the webcast.

14:56 It's just free.

14:58 It might be fun.

14:58 I'll definitely go.

14:59 So do you have like an intro to Mongo also?

15:03 I do have an intro to Mongo.

15:04 A free one, actually.

15:05 It's at freemongodbcourse.com.

15:07 Okay.

15:08 I just thought I brought that.

15:09 Yeah.

15:09 And it uses Mongo Engine and it uses RoboMongo.

15:13 How about that?

15:14 Nice.

15:14 But it doesn't use Excel.

15:15 Good.

15:16 All right.

15:18 Well, Brian, thanks so much for finding all these things and sharing with everyone.

15:21 Yeah, thanks.

15:22 Thank you for listening to Python Bytes.

15:25 Follow the show on Twitter via at Python Bytes.

15:28 That's Python Bytes as in B-Y-T-E-S.

15:31 And get the full show notes at pythonbytes.fm.

15:34 If you have a news item you want featured, just visit pythonbytes.fm and send it our way.

15:38 We're always on the lookout for sharing something cool.

15:42 On behalf of myself and Brian Okken, this is Michael Kennedy.

15:44 Thank you for listening and sharing this podcast with your friends and colleagues.

Back to show page