Acceleration in Python: which is right for your project?

About

In this session, Dale Tovar,data scientist, will highlight three main ways that open source developers have approached writing performant Python libraries using multiple languages in the same module.

Writing the codebase in a compiled language and writing bindings to other languages using a Python-accelerating library.

Learn more techniques on how you can accelerate your Python systems.

Dale Tovar
Data Scientist at OpenTeams

Transcript

Brian: [00:00:00] To this third session of the day. Again this is the tech shares event, Top big data opportunities for telecom companies. Our third session for the day is going to be a presentation by Dale Tovar and titled Acceleration in Python, which is right for your project. Dale is a data scientist at Open Teams Global, working on the engineering team there. Dale, welcome. Thanks for presenting today.

Dale: [00:00:26] Yeah. Thanks for having me.

Brian: [00:00:29] Before you get started, tell us just a bit about your background. What brought you to open teams and Python and this experience with acceleration technologies?

Dale: [00:00:37] Yeah, definitely. I started out as a musician and I was going to grad school for music theory and then jazz performance. And I got pretty interested in the sciences and I got involved with Neuroscience Laboratory. And by some fortuitous happenings I ended up working there for a couple of years. And I get exposed to a lot of, well, really the Python PyData ecosystem. And I started to become an open source developer and contribute to some of these different packages. And that eventually led me to working at open teams. But it also led me to be a bit more familiar with some of the workarounds that we have to do to have fast running compute intensive code in Python. And so a lot of what I learned over the last few years and working in the neuroscience lab really contributed to my understanding of how people tackle this issue in Python.

Brian: [00:01:41] That makes sense. It sounds great. Well, I think we’re all set. I’d say go for it. The floor is yours.

Dale: [00:01:47] Ok, very good. So what I want to start with is we’re going to take a look at some of the motivations for acceleration in Python and why this is something that we need to be concerned with at all. And we’ll take a very brief look at the Python execution stack to see what our strategies can be and how we might tackle this problem. And then I will highlight three approaches for how we can accelerate Python. And the first approach will have to do with using multiple languages and have them living together. The second approach will be about separating out compute intensive code from our bindings in other languages, and the last approach that we’ll take a look at has to do with some of the recent packages and recent developments and things like Cython, Pythran and Numba for accelerating Python code in a really user friendly way.

[00:02:47] And just to begin with, I want to take a look at some of the reasons that we write in Python to begin with. And the main reason that I take it to be is that it’s very simple and easy to read and write in Python. And as a result the development time in Python tends to be faster than it is in other languages. We can write similar code a bit faster. We can test things in a way that’s much faster than in some other languages. So there’s a lot of reasons why we might want to be writing in Python. And it’s also very versatile. We can use it for scripting and I can interact with Python. And so I can look at data, I can examine objects. It’s a nice language to be able to work with.

[00:03:34] And so this is kind of what feeds into why we might want to overcome some of its obstacles because its benefits and pros are so nice. So as a result of some of these pros, Python ends up getting heavily used. It’s heavily used in scientific computing. It’s very large in data science and machine learning for development of applications and web applications as well. There’s Python communities for most everything and Python is very large right now. And so some of these detriments that I mentioned have to do with it being not the fastest.

[00:04:13] So for algorithmic code for things that are a bit compute intensive with a lot of loops and whatnot, C can be anywhere from 10 to 1000 or more times faster than Python. It’s quite a bit faster to write, to have code run in a different language. And so one would think that this Python runtime is prohibitively slow for a lot of the applications that we might want to use it for, like mathematical or scientific applications. But one of the main workarounds here and what we want to achieve is to somehow have the benefits of a compiled language, the fast execution speed but also have the fast development speed and have the nice interactivity and simplicity that comes with writing in Python.

[00:05:06] And so the way we can make this work, if we look at the Python execution stack at the top, we see we have our python source code and at the bottom we have our hardware. And generally how this works is that the Python compiler will take our source code and will convert it to the python bytecode and feed it to the C python interpreter. And this is the main bottleneck that we face in Python, is that the interpreter is very, very slow. It’s what allows us to have interactive, nice fun sessions in Jupyter Lab but it’s also the thing that really slows down the execution speed of the language.

[00:05:51] And so the little trick that we do here and that everybody does, if we want to be able to write fast executing code is to write code in a compiled language that extends the C Python virtual machine. And then we’re able to call these functions, these routines that we wrote directly from the Python source code. And so this is the way that we can get this nice python code that is friendly. But under the hood we’re able to call these faster running routines that allow us to have these benefits from both worlds here. And so because we have to deal with this kind of obstacle, there are a lot of different strategies for how we might go about implementing this in a real life repository.

[00:06:46] And so the first strategy is to write computationally intensive code in C or in C++ directly and then we can call these algorithms from the outward facing Python code. And so, for example, we can look at NumPy, SciPy, PyTorch. These are libraries that use this kind of approach. For example, SciPy in the SciPy sparse module, what you have is you have these classes, these sparse matrix classes that are outward facing. They’re written in Python and then all of the indexing, all of the mathematical code is written in C++. And so you have this nice separation here of your algorithmic code and you’re able to see these huge speed benefits and you’re able to run your sparse matrix algorithms very fast.

[00:07:42] It’s similar with NumPy. We can see that a lot of the NumPy array object is written in C. And the sorting algorithms, for example, you can just look at the exact C source code to see their implementations there. And we can see how by using both of these languages we’re able to get real speed up. And I can use NumPy, I can use SciPy, I can use PyTorch and they all run very fast. But I can also have this interactive environment and analyze data. It’s because we’re using multiple languages here.

[00:08:17] And what I take to be the main pros here are that we have really fast code and when we write in C or C++, we’re able to control our memory management in a nice way. And so this kind of approach definitely has some nice pros to it. What I will say though is that contributors will need to know C, C++ or FORTRAN to be able to make changes to the compute intensive code. And so there is a bit of a language barrier that may prevent many Python developers from contributing to this library. And especially since in the scientific space, so many of our users come from sciences and they may not know C or C++, then we might have a divide between the user base and the contributor base.

[00:09:08] And some people who may wish to contribute may have a bit of a difficult time being able to contribute to at least the compute intensive portions of the code base. Furthermore, libraries following this model cannot be easily ported to other languages. If I wanted to replicate NumPy in a language like R or Julia, then I’d have to rewrite it from scratch and have it be separately developed and maintained. It would be somewhat difficult to use any of the existing NumPy code in another context because of the way it’s all tied together and written. 

[00:09:47] And lastly having multiple languages makes building and packaging a bit more complicated and more difficult. It’s not really my expertise, but it is my understanding that having multiple languages like this can pose some difficulties. So looking at a very different approach, we could just separate out the core of the algorithmic code and have it maintained in a separate repository. And then we can write separate bindings to data science friendly languages like R, Julia or Python. And you could see with something like this, then all of the main changes to the algorithmic code are centralized.

[00:10:28] And so if any updates or bug fixes need to take place, we can do that directly in a separately maintained repository. And then anything that has bindings to this can directly benefit without making possibly any changes. And so it’s a nice organizational structure is to separate the core of your package and be able to have bindings to other languages. So some classic examples of this is that a lot of old linear algebra packages are written in FORTRAN or in C, things like Linpack or Propack or Arpack. These are all very old libraries that are heavily used in a lot of different languages and a lot of data science languages. 

[00:11:15] And we can see that again NumPy and SciPy show up in this kind of thing because under the hood of SciPy we have Arpack and under the hood of NumPy we have these blast operations. And so we’re able to have this old separately maintained code base. And similarly for anything that makes use of these packages in R or Julia, all of the changes can be localized and can exist just in Linpack or in Propack. Not that I think these packages get a lot of changes these days but that’s a huge benefit that we can have.

[00:11:58] And so what I take the main pros to be here is that your code may reach more users across different languages because you can write bindings. Other developers can write bindings. And since most all languages can interface with C or C++, then you can really use these kinds of code bases in a lot of different languages. And furthermore, algorithmic changes and developments are centralized and in most cases need only occur in a single place. There are a lot of newer packages that adopt this kind of strategy. The tensor algebra compiler package is one that I think is pretty cool. And it’s an example where you have all of this nice functionality that’s written in C++. 

[00:12:47] And then people in R, people in Julia, people in Python, if they’re interested, they can write a wrapper for it and take advantage of this really nice existing code base. So the cons with this approach are that similarly to the first approach, anyone who wants to make substantive changes to the library must know how to code in C++. And since most users of scientific computing libraries don’t know C++ or C or FORTRAN, contributions from your main user base may be a bit scarce. And especially with new projects without a large contributing base, it can be difficult to keep the steam going on an open source project.

[00:13:30] And so I think as a long term solution this is a really great approach to take. But if you’re just getting started, it might have difficulties getting off the ground. It may be harder to get people interested in developing and working on the project with you. The last approach that I want to highlight is using a python accelerating library, things like Pythran or Cython or Numba where we can write our compute intensive code directly in Python. And what these tools do is they essentially compile your code and you get to see all these huge speed ups while just writing in Python the whole time.

[00:14:11] And so this is a really nice approach for most people because if you’re a Python developer or if you’re a scientist and you have an idea for a package, you don’t have to know multiple languages or you don’t have to do the cumbersome packaging and building of all of these things together. You can just write in Python and you can be in your domain space. And you can think about your algorithms, the thing you want to write and then you can just use a Python package to make it run the speed that you would like. And so as an example here, here’s what a sparse matrix multiplied by a dense vector might look like over on the left.

[00:14:55] And we can see that this is kind of compute intensive code. We have a few for loops. This would run very slowly in native Python. And so what I did over on the right is I just decorated the function with numba.git. And then I had to make a few changes to the code because Numba doesn’t recognize sparse matrices. I had to split up the matrix object into a few different arrays and had to make some subtle changes to the code. But in general you can see how these both just look like Python code and they’re both easy to understand and write. But one of these runs near C++ speeds and the other one runs into the very slow python speed.

[00:15:49] And so you could see how just with a little change, we’re able to see this huge improvement. And so some examples. I have SciPy in here again because there’s some Pythran code in there but packages like PyData sparse and sgkit, make a lot of use of Numba, whereas scikit-image and scikit learn. These are packages that rely pretty heavily on Cython code in a lot of places. And so you can see that in some of these main packages that get a lot of use, we see sometimes multiple strategies and sometimes really different strategies. And this might contribute to why it might be a little easier to get a new developer into working into scikit learn. And then it would be into something like NumPy.

[00:16:39] And so what I take to be the main pros of this approach are that code bases in a single language tend to be easier to write and maintain. They’re also easier to build and package and there’s no language barrier for community contribution. So I can have scientists who are interested in what I’ve written. I can have anyone who’s interested and they know Python and they might be able to make interesting changes to some of this algorithmic code. Anything that needs to run really fast, it be much easier for them to adopt one of these packages and just write code that runs really fast and they’re able to think in their domain.

[00:17:22] Furthermore, there’s a lot of good compatibility with NumPy and other libraries with some of these packages and there’s sometimes GPU support. Furthermore, all of the above tools are pretty easy to get working. And so you saw in the Numba case, it was just a few changes and we were able to see some huge speed ups. If I were to identify some cons, they would be that any pure Python library can’t really be easily ported to other languages. Like my earlier NumPy example where if we want to replicate NumPy in R or Julia, we would have to kind of write the whole thing from scratch and not be able to reuse a lot of code here. With pure python packages that’s just a basic obstacle.

[00:18:11] And then lastly these tools may require altering one’s code to work properly as in the case with the Numba example I showed. And sometimes significantly, sometimes with Cython there needs to be a lot of changes. Sometimes with Numba there needs to be a lot of changes. And so it isn’t always as easy as just importing the package. There might be some restructuring that needs to take place but I think it’s a pretty small con overall. And so kind of to summarize what I’ve discussed, we have these three main approaches that I’ve seen in some of the main projects that a lot of people really use.

[00:18:52] And what I would say is that I think the second and third approaches are a bit more favorable. The second approach really allows us to engage with more communities because anyone who’s interested in a project can just write a wrapper to that project and then maintain it in Julia or R or some other language. And a lot of times these may be easier to build and maintain. And definitely in the case of having bindings we’re able to reach a much wider user base and developer base. And so I think that while the first approach is old and reliable and some of the most important packages use it, NumPy and SciPy, I think that some of these newer approaches of using Numba and Cython or just delegating different repositories to having a separate C++ package and then bindings in Python are probably the way to go and seem to be more common these days. So I’m happy to take any questions.

Brian: [00:19:59] Terrific. Thanks very much, Dale, for that overview of these different approaches for acceleration. Yeah. So anybody with a question, please post it to the comment thread actually to has invited. In the meantime, so in terms of the accelerators, the Pythran and Cython and Numba that you mentioned, can you talk a bit about what our use case is where you might choose one of those three over the other where one works well and the others don’t or situations like that?

Dale: [00:20:30] Yeah. So one thing that I would say right off the bat is that Numba has some nice GPU support that I think I don’t know if Cython has but I think it might be a little more complicated. So write off if you’re interested in enabling GPUs, Numba would be a great way to go. In general I think that Numba is the simplest and it’s one that I’ve enjoyed working with the most. I haven’t used Pythran very much but I would say that Cython is very reliable. There is a huge user base of Cython. And so if you’re looking for like Stack Overflow questions, there’s just a huge community in Cython.

[00:21:20] And so it’s very easy to get things working because so many people have asked questions, so many people have answers. It’s a really strong community. And so that’s what I would say I like most about Cython is that generally if I have a question, it’s very easy to find the answer to. And there are a lot of people who know these answers.

Brian: [00:21:40] Very cool. You mentioned the GPU support for things. I know a little bit about that. I didn’t know that Numba wires into GPU very well. That’s news to me and I’m definitely going to file that away. So you’ve got CUDA and the CuPy connection to CUDA. So from what I understand the GPU acceleration is kind of a separate thing to these Python accelerators. So can you kind of say compare contrast how it’s the same if at all and then how it differs and how you approach it?

Dale: [00:22:18] And how GPU is accelerate things versus how use [Inaudible].

Brian: [00:22:23] How you would go about introducing GPU acceleration to your code?

Dale: [00:22:28] Definitely. I think this is another interesting. So what I’ve talked about really is how do we organize scientific repositories. How do we organize our tools. And I think this introduces another interesting use case that a lot of people might want to use. And so one approach that I quite like is a lot of tools work really well together. Like if you guys are familiar with Dask. Dask is able to use zarr arrays. It’s able to use QPy arrays, it’s able to use NumPy arrays and parallelize your code using different types of arrays.

[00:23:13] And so one thing that’s quite nice is that some of these tools work really well together. So things like if you want to enable GPU support, I think the easiest way in my mind would be to have some Numba compiled functions and have some Numba GPU compiled functions to be able to work with both. And then use things like the fact that we have all of this nice duck typing in Python where as long as something is recognized as following a particular array protocol, then we’re able to use it in the place of a NumPy array or some other array.

[00:24:00] And so I think QPy would be the easiest way for people to use GPUs in their Python context. And so what I would say is that write a lot of your code base in a way that is NumPy friendly. And in a lot of cases it’s just doing that. You may be able to just use QPy because of the duck typing features. I think a lot of this needs to be planned out explicitly. But I would say that using things like Numba in particular cases and using duck typing with QPy would be a great way to be able to use GPUs all in some kind of library that lives together nicely.

Brian: [00:24:43] That’s very cool. Yeah. It’s so nice that the all the work being done in that area of the ecosystem is turning things into that kind of drop in replacement where you can very pretty easily get those acceleration benefits.

Dale: [00:24:55] Yeah. It’s a very nice thing.

Brian: [00:24:58] So very cool. Well we’re coming up on the end of the time slot. I don’t see any questions coming into the comment thread. So I think we’ll just wrap it there. Dale, thanks so much for your time and for presenting your expertise on Python acceleration. We really appreciate it.

Dale: [00:25:15] Yeah. Thanks so much for having me.

Brian: [00:25:17] Happy to have you here. So that wraps this Tech shares. We’re done for today, just three sessions. Be aware that the next tech shares event is coming up next week. It is a technology roundtable on Wednesday, November 9th. Check out the site tech-shares.com for more information there. Until next time, thank you very much for attending and have a good rest of your day.