How to select the right open source software for your product stack

About

Businesses are increasingly adopting open-source software. As a result, the open source service market is predicted to become $50 billion by 2026. The thriving open-source community is releasing the software and its source code to the public, allowing anyone to use, modify and distribute the software under a license agreement. You can use open source software at every stage of the software development lifecycle. With the abundance of open source software options to select from, which is the best software for your needs? 

Making a quick decision based on what the software does may seem like the right solution initially but may cause future problems like cost overruns, launch delays, or unplanned downtimes. 

A systematic approach to selecting what open source software to use will help you look at the broader picture and help you make the right decision.

Travis Oliphant, CEO at OpenTeams and Quansight, founder of Anaconda, NumFOCUS, and PyData, and creator of NumPy, SciPy, and Numba will share the process he uses to help companies select the right open source software for their product stack.

Travis Oliphant
Founder of OpenTeams, Quansight, Anaconda & NumFOCUS. Creator of NumPy, SciPy & Numba

Transcript

Brian: [00:00:00] And for anyone who is just joining us, remind that tech shares are a forum sponsored by open teams for technology leaders to connect with business leaders to help their organizations create better software solutions, utilizing open source software more effectively. So for the second session of today, we have Travis Oliphant leader in the Python data science community with a wealth of experience developing applications and improving software for Fortune 500 companies.

[00:00:26] He led the creation of many open source cornerstones, including NumPy, SciPy, Numba, Conda, the organization’s NumFOCUS and Pydata and founded Anaconda and Quonset along with most recently open teams. He holds a doctorate in biomedical engineering from the Mayo Clinic and a master of science in math and electrical engineering from Brigham Young University. So welcome to the second session, Travis.

Travis: [00:00:49] Thanks, Brian. It’s great to be here. I appreciate the opportunity to talk about open source.

Brian: [00:00:54] Absolutely. I’m enjoying it myself. Looking forward to it. So again the topic for this second session is how to select the right open source software for your product stack. And I have to assume with your extremely deep expertise in open source you have a lot of wisdom for us here.

Travis: [00:01:16] A lot of words, we’ll see if they’re wisdom.

Brian: [00:01:18] Oh, yeah. I guess, yeah, fair enough. We’ll leave that to the audience to decide. And for somebody coming new to it or trying to decide which tooling to choose, it can be a very confusing thing to step into. What is even out there? How do I choose which open source tooling to look into and to invest in time and money to bring into my company or my organization? What should people do to navigate this dizzying array of options in front of them?

Travis: [00:01:57] Yeah. It’s a great question. And I’m going to tell a little story here that might not seem relevant but I think it has some corollaries. When I was getting married, I had the same kind of mental process, mental thought. I met a fantastic woman who is now my wife. When I was young and I was trying to decide if we should get married or if I should ask her, I would think about this. Wait a minute, what about the rest of the world? What about there may be other people out there that I should choose? My wife didn’t really appreciate this line of thinking, by the way.

[00:02:28] It’s not advisable to mix mental gymnastics with romance. But me being who I am, I of course had to, I had to be totally transparent with her and talk about it. And I in fact wrote an essay about it. But it kind of illustrates the point that sometimes you have to go with what is helping you and you’re not going to solve the whole problem. What else is out there? What else could I use? What else could I do? And that’s kind of what was happening to me as a young man. And I had to kind of understand, wait, I can evaluate the situation. Here’s an amazing woman I have the chance to be with and I should do it.

[00:03:11] And so in some cases I’ll talk about some of the things you have to consider. But number one point is go with what works for you. Go with what makes sense for you at that moment and then commit to it and build it and fix it and prove it. Once I made a commitment to marriage, I’m not constantly thinking, well, who else could I have chosen. Where else do have gone? I mean, making an investment in that relationship. Your software stack, it’s kind of similar. It’s not entirely the same. So some people will argue about that. I maybe making an analogy, my wife being one of them, should probably hate this analogy because a commitment to a person is stronger than a commitment to technology.

[00:03:50] But there’s something to learn from the fact that things are what you make them. And if it’s helping your business, unless there are concrete things you can point to and say, this isn’t working, this is not helping me and I’m actually going getting worse. And sometimes just making the software better is what you need. So that’s one of the things. So we’ll probably get to that a little bit later as we talk about what happens when you choose your software. But the factors you consider are really what are your needs. A lot of times the choice of software stack is often not intentionally made. That’s something I think we should acknowledge as an elephant in the room.

[00:04:25] In many, many cases you end up like the person who’s in charge, maybe the person who is in charge of development of a thing, somebody in the organization has just pulled in a dependency kind of without everybody understanding. And then fast forward three months and we’re like, oh, we’re dependent on this thing that nobody told us that we didn’t really because it’s so easy and because it’s so productive and it helps developers make so much more improvements. I’m not saying you should stop that behavior because honestly that’s how productivity is happening in the world today. Like right now people are much more productive by instead of writing everything themselves going, hey, I have this thing to accomplish, let me go write all the code. They’re doing wait a minute, I just going to stitch together some libraries that are sitting out there.

Brian: [00:05:10] I benefit from them personally all the time.

Travis: [00:05:12] Exactly. So it’s not to say that you shouldn’t do that. It’s just to recognize this is what’s the reality. Let’s make sure we understand that and then understand what we’re making dependencies on. So part of it is, I think good organizations are kind of creating training for their staff and they’re helping people understand, cool, you’re going to build on this technology. Let’s just have a way to report on it and form that this is happening and before it gets to baked in some amount of review can take place. Because the things you might care about as an organization, as your developer team is pulling dependencies on open source software.

[00:05:51] You want to understand some things like one, what kind of support is available for this? Am I essentially signing up to support this open source piece that I now own or is there a community out there that I can ask questions to? Is there a commercial company out there that can actually support our use of this software? And those are questions that are really critical to understand. So we impacted open teams have come up with an open team score which essentially is a business score, so you can kind of at least look at a score of a software you’re getting and kind of understand, Oh, if it’s high score then it probably has a large range of things.

[00:06:27] And we’re in the process of making open source how we get to this score. So people can see the different components of it, then kind of stitch their own score together based on how they want to weight the different components because it has things like IP. What’s the IP licensing? Is that a concern? Is it GPL or is it BSD? Is it Apache? Does it have guarantees for patent protection? Because there are situations you can certainly stumble into if you’re not careful by making adoption. There’s things like, how am I getting the software? It’s one thing to go get the source code and then compile yourself. Not that that’s not free of attack vectors. It can if nobody’s paying attention to source code.

[00:07:07] There’s examples of people having injected code into the source code. People unwillingly just pull it in. So there’s that one. But the bigger one is binary. How am I getting the binary? Am I just going to some website, does that website provide guarantees of there’s not a man in the middle attack, there’s not someone has injected a binary up there. The other thing that can happen these days is people are kind of putting typos on the big package managers like a large number of package manager are out there. People are putting a little off by one character packages up there.

Brian: [00:07:40] Pluralization.

Travis: [00:07:41] Exactly. So I’m importing pandas and I forgot to put an S on it. I just imported panda.

Brian: [00:07:46] Not the same thing.

Travis: [00:07:47] Not the same thing. So all of a sudden you find yourself internally. Making sure as you’re pulling in software, you have sandbox areas where people can pull in software and there’s a process to go from. I’m pulling in software and then, oh, I can review what my dependencies are and then I go to deployment. I’m understanding what that is. A lot of organizations are doing that well, some not so well, some need a little bit of upgrade to how they’re doing it. So one, I think you want to enable your developers to make those choices. I think it’s really bad when you have some committee.

[00:08:22] Let’s say you have 10,000 developers and you have the Ministry of open source selection. Like I would say, any organization that does that is going to be behind within a year because you’re going to basically bottleneck behind that ministry. They’re not going to have enough information. So you want to segment it out to your developers who have the most knowledge, have the most understanding but then have a little bit of a process. So as they pull the software in, there’s a point of review where you can say, ok, well, how are we going to get this software? What is the score associated with it? Where are our risk associated with it?

[00:08:57] So we talked about IP risk. We talk about often just the community support, community help. Is the community very big? Is it a one person community? Is it a ten person community? Is it 1000 community? How many other people are using this? It’s sort of like safety in numbers. If a million other organizations are using it then they’re going to have the same problem as I do. It’s like nobody got fired for hiring X company.

Brian: [00:09:24] And it’s more likely that problems will be found quickly.

Travis: [00:09:26] Agreed. As the author of NumPy, I’ve kind of understand what’s happening now is like nobody got fired for using NumPy because it’s just a massive project with lots of support at this point and lots of people who understand it and it’s evolving. There’s things that are building on top of NumPy like PyTorch and TensorFlow and so forth and going beyond it. What does the documentation look like? What’s the version I’m depending on? These are the kinds of things you have to do. And for more information I would say check out our work at open teams on the Open Team score.

Brian: [00:09:55] Sounds good. Yeah. Good recommendation. So what you’ve described, it kind of is somewhat from the perspective of building something from scratch and where you have the freedom to choose. You’re just building something new, bringing new things in. But there’s also situations where you have existing software stacks and existing projects that are experiencing friction or challenges from proprietary lock in or other issues with the non-open source tools that are out there. So if you would talk a bit about, how a company or how an entity might think about the philosophy or the approach to switching to some open source tooling that the benefits that, that might provide them. But at the same time there are real costs to moving away from an established system that works maybe well enough but has its issues.

Travis: [00:10:44] Yeah. Agreed. I think this is at the heart of a lot of people’s thought process right now. And one of the challenges of course, is that innovation keeps happening in technology. I’ve talked to leaders at large organizations and many times over drinks or over coffee, I’ve heard kind of the complaint like I spent $50 billion on technology, I’m still spending $50 Billion. Is there never an end? And the answer is kind of, no, there isn’t. What you have to do is understand what is your spend and how do you get the most out of that spend. And I think there are evolving answers to that question. You have to pay attention to it. It’s not the same as it was 10, 12, 30 years ago. Like how you get the best value out of that spend is different.

[00:11:28] So one, you do have to have a real business need to move away from existing software that’s working. I think honestly, many times its better. Even though I love technology, I love innovation, I love advancement, I’m pragmatic enough to understand that sometimes it’s actually not worth it to try to go replace something that’s working. That you understand, well, this is working and let me just shore it up. And then you think about what’s the cost of maintaining. At some point the cost of maintaining something is going to be more than the cost of replacing. So somewhere in the lifecycle of a piece of software you want to make this decision ahead of that point. Because you don’t want to the point of essentially you’re spending more to maintain something that would have cost every year than it would have cost to replace it three years ago.

Brian: [00:12:15] One of like COBOL and Finance seems like a really good case study there.

Travis: [00:12:18] Yes. So mainframes are still in practice. There’s still a lot of people using mainframes. I have a lot of friends in IBM and a lot of people don’t realize that IBM still has a lot of revenue because they’ve built great solutions that are still working for their companies. And they don’t need to change them just because the latest fad in technology, they’ve got an answer that’s solving, that’s giving them value and solving the business problem and incrementally improving that can do it. So in the mainframes we’re actually making changes because we’re kind of going back to the mainframe. Like the whole move back to the cloud is actually very, very similar to back to the data center where now the cloud is just a new mainframe. The cloud is a new mainframe.

[00:13:00] So you update the mainframe software and protocols and mechanisms and it actually doesn’t look that different which is why IBM’s cloud strategy doesn’t look strange. And as you’re thinking about how do they compete in a world of cloud, they’ve been doing cloud for years. So I think there’s a similar metaphor there of, hey, if you’ve got a software one, you do have to be intentional and not be afraid of understanding when software should just be continued with. You don’t need to replace it with every newfangled object that shows up.

[00:13:32] Usually what the best approach is, is to recognize you have new projects emerging, new things that are happening to your organization and that’s where you can try new technology. Hey, I’ve got this new thing I’m doing that I want to do and then there I’ll kind of try something new. I’ll try a different approach. I’ll try something that helps me. Often there’s a new business need. Quite often, for example, I need speed. I just need this be done faster. And the only way to get done faster is take advantage of, let’s say, GPU technology or cheaper cloud technology or ASIC hardware or something.

[00:14:08] So there’s a new need that shows up and once that’s there then it’s a straightforward business because there’s value in the switch. So these are strong drivers to get started with the open source technology is I’ve got a problem I’ve got to solve. Definitely not just because I want a fancy new car. The cars helping me. I don’t have a need. But let’s say my car is a gas guzzler and it’s polluting the environment and I’m spending too much money on gas. Well, that represents a different need now. Now that’s a reason to kind of switch, an incentive to switch. Same thing in technology. There’s stuff that happens all the time.

[00:14:43] So what happens typically with legacy software is the software becomes expensive to maintain. COBOL is good example. My dad programmed in COBOL 40 years ago, 50 years ago. Now people still do but it’s hard to find them. There aren’t that many COBOL developers. So you’ve got a system that’s dependent on COBOL. You have to think, ok, is it going to do this. The cost of maintaining it, hardware, you can often see enough situations where actually for that software to work it doesn’t work on newer hardware. So you’ve got to basically maintain the hardware that you don’t in extreme circumstances like the space shuttle or the space program. They’ve got dependency on hardware that doesn’t even exist. So how do you get replacement parts for that, X86, 8686 or 8286?

Brian: [00:15:39] Simply not made anymore?

Travis: [00:15:40] Yeah. It’s just not made anymore. And so the supply chain and things you take for granted aren’t there. So there is a reason to stay current. But as a business you don’t have to stay openly current. There’s a ten year gap usually. And frankly, Brian, I think a lot of lessons were learned in the migration to cloud which is still ongoing and that are relevant here. It’s like, yes, we should use the cloud but we don’t have to just wholesale jump to it and just stop everything we’re doing here. So I think those are making sure you’re staying ahead. You do have to have an approach here, especially as you have new things, new business practices that you need to take advantage of newer technology or the business doesn’t work?

Brian: [00:16:20] Yeah. All very good points. So one of the things you touched on briefly earlier that might be worth leaning into a bit is the question of the IP and licensing question. Because that is whether you’re a company or whether you’re just an individual with a side project. Considering the licensing and the usage aspects of the dependencies that you take on is a significant practical consideration. What are your thoughts on educating oneself on business considerations as you think about dependencies, open source dependencies to bring on board?

Travis: [00:17:04] So you broke up at the very end when you were finishing your question. It’s about the licensing concerns and the business considerations about licensing.

Brian: [00:17:12] Yes. What are some of the pitfalls that businesses might face if they make a poor choice and bring in a license?

Travis: [00:17:21] Yeah. Lots of open source participants have heated debates and have had heated debates over the years about the right license to use and it’s evolving but there are real consequences. We’ll talk about three categories of licenses. One is the license that says the new license family. And there’s differences between the LGPL and GPL but the straight up GPL and AGPL is a new one that has additional restrictions. But effectively it’s a license says, here’s some free software, go ahead but there’s restrictions if you publicize or publish modifications. So if you use it internally, it’s fine actually. You can use it internally and nobody ever sees the light of day. Doesn’t matter, you can use GPL all you want.

[00:18:05] Now there’s some legal ramifications of using it internally for a multinational organization where you have multiple entities. Are you actually publishing that work to your other entities if you’re a multinational? There are some nuances there that again most of this has not been tried in court. They does have legal precedent here. It’s actually a lot of mental thought because people concerned about what they can get in trouble with. So there’s all kind of license. Many companies because of the uncertainties have just said we don’t want to deal with that. We’re not going to allow GPL.

[00:18:38] Now in practice, a lot of software has started to adopt AGPL these days more than in the past. So there are actually a lot of companies who are now inadvertently essentially getting dependencies on AGPL software. AGPL is simply, it’s kind of a reaction to the fact that large cloud organizations are making a lot of money off of open source and that’s not coming back to the creators of open source. So if you’re shipping Mongo and you have a company that has a commercial version of Mongo and you’re trying to help, you want money to go to the commercial company backing the open source software, then a large cloud vendor steps in and just offers the same service for free or for lower price. It’s disrupting that business model.

[00:19:22] Now I’m not going to chime in on the debate there about the proprietary of it. I’m just saying that’s what’s motivating the AGPL. So AGPL is basically you can’t do. If we make it some AGPL, if you even use it to host a service, you have to give back any changes, you have to publicize what you’re doing. So you actually have to make open source your server side code. So that’s one issue. The GPL issue as a whole one set of things people think, Oh, is that a risk I’m going to take in this part of my code base. And there’s another tranche, the BSD MIT license, where it’s like, hey, use it just don’t blame us or sue us but use whatever you like. And there’s a lot of people in that category.

[00:19:59] And there’s a newer frame called Apache and the Apache license family has the new extra positive clause where people say, if you are committing code, if you’re providing code to this Apache license software, you’re committing that there is no patent attached to this code. So MIT and BSD doesn’t mean there are patents but they don’t make that, they don’t require that assertion. Whereas Apache license code requires the contributor to make the assertion essentially that there aren’t patents or other.

Brian: [00:20:34] It’s not encumbered.

Travis: [00:20:35] It’s not encumbered with this extra IP clause that isn’t clarified. And so many people really like the Apache license because it sort of helps them [Inaudible]. So from a business perspective you can kind of order these roughly and again cases will vary and people have different stories but Apache, MIT, GPL, effectively because of the different risks that are associated with it. So that’s an example. Now in a practical situation, a real situation, you do have to get a little more detailed than that. That’s kind of a high level view but you need some more perspective in order to really understand what your risk is.

Brian: [00:21:11] Indeed. So we’re getting close to the end of time. We want to leave a little bit of time if there are any questions from any of the attendees. I don’t see any at the moment. If anybody does have any, pop them in. But until I see some come in, one of the things that’s you certainly have to deal with, so you mentioned the one great place to try to move to open source is when you have a new initiative or a new need. But whether you apply it there or whether you’re trying to replace a component, you do have to be concerned with compatibility of those new pieces with any existing code. So do you have some experience there or some thoughts there, how do we make sure that the new open source pieces play well with these?

Travis: [00:22:06] Yeah. Certainly have experience seeing this at scale. And I would say lots of effort has gone in large organizations to try to handle this to various degrees of success. A new software project will cause some amount of disruption like it will because just simply it’s going to say, here’s a new file format that has to be supported or here’s a new process or here’s a new UI or here’s a new system that somebody’s, even simply if you have a manifest of software dependencies your organization that has to be maintained, it’s altering that manifest. So everything you do is going to have some amount of disruption. The question is how much rework does it require?

[00:22:51] And it really kind of depends on where your legacy system or where your older systems fit in the lifecycle of software lifecycles in the maturity of software during practice. 50 years ago software engineering was how do I even write this stuff? How do I get the computer to do what I wanted to do? So you would have this huge stack and all this abstraction you’d build in order to accomplish what you’re trying to do. And then, ok, what about somebody else? And there are other abstractions. So you’ll end up with these language systems and frameworks that have, again back to the framing problem notion we talked about in another session.

[00:23:37] Computer languages have the same challenge as if there’s a framing of how to think about logically solving that problem. So you end up with a framing that there’s nowhere to hook. So one of the innovations over the past ten years has been this notion when the web and the fact that I have these APIs. I have a restful APIs, this is sort of a follow on to soap and XML and kind of other ways to have websites talking to each other. But there’s sort of this emerging, this notion of server side architecture which is effectively, let me see if I can encapsulate what I’m doing in a function that passes well known objects or if they’re not well known, they’re very easily described with JSON.

[00:24:18] So that I can be really explicit about here’s what I do and here’s my interface. And that’s software engineering innovation which is basically let me solve a problem and publicize how I’m solving it, what I’m solving. So it really implies reuse. The other day, how much reuse is possible of your existing systems. And many times actually today it’s totally possible to take a legacy system and encapsulate it in a reusable function like that. You can actually wrap a Restful API around a legacy system pretty straightforwardly. And so then make it accessible to something else. 99% of the cases that’s possible. There maybe some situations where there’s more difficulty.

[00:25:07] So that has simplified the problem. That doesn’t work for certain kinds of problems where speed, parallelization is a problem because all of that, any type of wrapping puts a latency delay at least and sometimes a throughput shrinkage as well. Like I have constraints. So if I’m really operating at the edge of speed or the edge of size, some of these software engineering techniques don’t work. And I’m kind of back to the old days of let me just make something that works. And if you built something like that and that’s what you’re depending on right now, that’s going to be hard to integrate with. And so you may have to think about integration a little different way.

Brian: [00:25:50] One of the most entertaining examples of that kind of wrapping legacy that I’ve heard of was somebody used a mouse pointer driver to automatically interact with a graphical user interface, software to do what they needed to do. But there’s no way in the world that you could paralyze or scale that, its all-out bound.

Travis: [00:26:07] Yes. That’s a great example, Brian. Yeah, good point. So I think fundamentally and this is also matches my philosophy of testing like as you have a piece of software you rely on, testing becomes more and more important. You really need that to be able to automate and test what your reliance is. So testing is the key to understanding whether that disruption is going to hurt you or not for sure.

Brian: [00:26:33] All right. Well, we are at the end of the session. Again it’s been a terrific conversation, Travis. Thank you so much for your time. Really appreciate it.

Travis: [00:26:40] My pleasure.

Brian: [00:26:41] And look forward to hopefully doing another session sometime soon.

Travis: [00:26:45] Thank you, Brian.

Brian: [00:26:46] Thanks very much. All right. So we are about to wrap. We’ll take another short break and then we’ll have our third session of the event with Dale Tovar presenting on accelerating Python. Back in a bit.