Technology roundtable

November 9, 2022
11:00 am

About

The Technology Roundtable is an opportunity for technology architects in the technology industry to learn, innovate and collaborate with their peers. Roundtable members work together on industry priorities and general topics of interest and concern related to open source technology initiatives. As the Roundtable identifies topics of interest, working groups form to champion and advance agreed-upon Roundtable goals within that area of interest. In addition, open discussion and networking time allow attendees to discuss common problems and discover new and innovative solutions.

Other program highlights include:

The Roundtable chooses agenda topics
Interactive sessions that provide opportunities for discussion and debate
Analyst-led sessions that outline the latest industry research
Exclusive environment: technology architects, decision-makers, and influencers

David Charboneau
CTO at Mimoto

Lalitha Krishnamoorthy
Co-Founder and CEO, OpenTeams Global

John Pantano
Energy and Environmental Solution Architect at OpenTeams

Marc-Andre Lemburg
CEO at eGenix.com

Transcript

Brian: [00:00:00] Hello everyone and welcome to today’s Tech Shares Technology Roundtable. Tech shares are a forum for technology leaders to connect with business leaders in order to help their organizations create better software solutions that utilize open source software more effectively. So today we have something of a change of pace from the events that we’ve held so far. Whereas all the tech shares today have been divided into sessions where each session was either a one on one conversation or a presentation. Today, we’re holding our first roundtable event. My name is Brian Skinn, and I’ll be moderating today’s discussion. If anyone has attended panel events at a conference, that’s kind of the field we’re aiming for, engaging interactive discussion among smart people on tech topics that you care about. So in that same vein of a panel event, we’re open for your questions from the audience. So if you have them, please post them in the chat. And I’ll also note, if you’re interested in participating as a speaker or a roundtable participant in a future Tech Shares Event, please reach out to us at tech-shares.com/speakers. So in today’s Tech Share, we have four great panelists with wide ranging expertise across various areas of technology and open source. They’ll be discussing a handful of topics, mostly in data strategy, data processing, data management. And we’re going to dip a bit into artificial intelligence as well. That should be a great discussion. So let me get started by briefly introducing the panel. So I’ll just go around alphabetical by first name just to pick an order. First, David Charboneau recently of Open Teams, but now of [Mimoto.AI.] David, welcome. Please introduce yourself and and tell us how you got where you are.

David: [00:01:40] Thank you, Brian. Hi, folks. David Charboneau, currently CTO at Mimoto. I got my start working on a really obscure programming language back in the mid or late 90s at IBM. From there, I moved on to work at Bank of America Merrill Lynch as a member of the core technology team. I did a bunch of freelance work for a while, did a few startups, including open teams, and still at a startup currently focused on what we call continuous recognition for active threat detection and response.

Brian: [00:02:14] Excellent. Glad to have you here, John. Would you like to go next?

John: [00:02:20] Sure. John Pantano. Glad to be here. And I’ve been doing scientific computing for four decades where we’ve been trying to emulate things on both the geological time scale and then also on the human time scale of the operation of both environmental and oil and gas operations, and have probably used about 12 different languages and really believe this open source is a solution to a lot of the problems that we need to face.

Brian: [00:02:56] Terrific. Very glad to have you here as well. Lalitha, you’re next.

Lalitha: [00:03:02] All right. Great to be here. And nice to see you Brian and everybody here. So I’m co founder and CEO of OpenTeams Global so I spent 20 years of my initial career at IBM. I started with IBM Research on Federated Technology. So yes, I’m definitely a data geek and a big fan of AI. So at OpenTeams Global, I lead up a team of open source professionals and experts and support both our community based open source community and industry backed open source communities in order to make open source more sustainable and for it to thrive for generations to come. So I’m excited to be here today. Thank you.

Brian: [00:03:48] Terrific. Yeah. We’re looking forward to hearing what you have to say on those. Finally, Mark, please introduce yourself, if you would.

Marc: [00:03:57] Hi there. I’m Marc. First of all, thank you for having me on this panel. This is a very interesting discussion that we’re going to have here. I’m based in Dusseldorf in Germany. I am a longtime Python user. I’ve been using it since 1994. I’m a coder of Python. I added the Unicode support to Python in the early 2000s. I also have a consulting company called Eugenics. I started that after developing a product and then I wanted to sell the product. So I started a company, made all the mistakes that you can possibly make. So I sold it exactly once and then basically turned to consulting. And after doing that ever since mostly in the financial services space, but also in energy and health services. And alongside I’ve been doing very much work in the Python community. So I was chair of the EuroPython conference for a couple of years. I’ve been helping a lot of other conferences start and get them off the ground. I know also committee chair of the PSF Trademark Committee, which basically makes sure that the trademarks get used correctly.

Brian: [00:05:09] Indeed. Yeah. An important legal function. Well, clearly a lot of deep experience with the community and with open source. So you know thank you again for participating in the conversation today. So with that, let’s jump into the topics. Again reminding the audience any questions you have, please do drop them into the chat. We’ll work them in as we can. For the first topic of discussion, Lalitha this is one that you suggested in our planning for this conversation. Data driven decision making. What is it? Why is it important? And why do businesses and the open source community need to care about it?

Lalitha: [00:05:46] Yeah. So the reason I brought up this topic and I think it’s a topic that I’m hearing across the board when we meet with the different clients and partners in our ecosystem is they all want to elevate their business into becoming more data driven business. They want to have data driven decision making. They want to have predictable insights. They want to know what’s going to happen before the event happens so they can control their sales pipeline or their funnel. But I think there is a big misunderstanding that by adopting AI, they’re going to solve a lot of these problems. The key I think for AI to be successful for any company to have a successful AI adoption journey is for them to have a very solid data strategy behind it. Because without data or without a very strong information architecture, your AI is not going to scale. It’s not going to get you the results that you need. So I think that is the conversation that we’re trying to land with most of the clients that we talk to with the CIOs and the CEOs of companies that want to adopt AI, that want to understand how they can further their decision making, cut costs around their IT and overall increase their operational efficiencies.

[00:07:00] So our theory here is that when you have a solid data strategy to back everything that you do, then you actually reduce the number of data stewards and people that you need that are basically spending their time mapping data and metadata in order to help you make decisions. So in order for you to have better decision making and to actually leverage the data, which we all I think agree across the board, that data is your most powerful currency as we go into this current century. So how you actually create the data, how you govern your data, how you’re going to manage the data, and then how you democratize that data and further use that to drive AI driven decision making is going to really differentiate whether you’re going to make your business successful in the future or not. So that’s my perspective. Yeah.

Brian: [00:07:52] Excellent. Mark, David, John, any particular thoughts about data strategy and governance and the challenges of those sort or personal experiences perhaps?

David: [00:08:04] I’ve been racking my brain while Lalitha was talking. There’s a project that really stood out to me a while ago that when I was at OpenTeams, we had helped a client adopt just so that one of the big issues they had with their data team was sharing of connection details to different data sources. And so there’s a really excellent open source data catalog project that if I remember during this session, I’ll share it into the comments in the chat. But I think that’s a key thing that you need to consider. It’s one thing to get all the data into one place. It’s another thing to have a strategy to enable your team to be able to identify those data sets and find them in order to match need the content. So properly tagging, properly organizing and setting it up so that your the folks that are going to be building your models or doing your data analysis, know where to find the information and have some way of approaching it rather than having to run to different departments, speak to people directly.

Brian: [00:09:08] And is that something that you would consider as part of the data strategy that for example, might ought to be part of the planning from the outset? That’s the pre-work?

David: [00:09:17] Yeah. Have a clear understanding of how are you going to govern access, how are you going to govern where the data goes and how are you going to make the data sets available or different data sources available to the teams that are going to need it. You know, the situation most mid sized larger companies have is that they’ve got a wide variety of data sources, lots of little silos. And part of the general strategy should be how do you cross the boundaries between those silos so that you can unify the data?

John: [00:09:50] And do you think that the way that it can be implemented is to have a third party or a ezar of some sort, to just be a facilitator, to kind of heard all these systems that are disconnected.

David: [00:10:09] I think that can go a long way, right, So that somebody is accountable for putting together the catalog, but then also just establishing standards and practices so that if you publish a Jupyter notebook that accesses a specific data source, make sure that that data source is in the catalog so that it’s not one person’s job to make sure that that thing is up to date, but that everyone is continuously making sure that things are clearly labeled, have tags on them so they can be found and are easy to share.

Marc: [00:10:40] I think that’s one of the key things. The data catalog is is very, very important. But it’s also getting the right mind setup in the company and teaching everyone about data literacy so that everyone knows the importance of data, everyone knows where to look for certain things or who to ask for getting access to something. I think that’s very important. I’ve been working in a couple of projects where getting to the data was extremely hard and finding the right people to ask to get the data was the hardest part in that project. And then the second step was getting access to the data because very often they had to have agreements with clients to be able to access the data and so on.

[00:11:30] So that’s something when you’re creating a data architecture, when you’re creating something that you want to use to make decisions based on data that you have to keep in mind right from the start. So it’s very important to start with these processes early on in the project in order not to hit any roadblocks further on. For example, in our case it was, we were building a portal and then when we wanted to start testing the portal with real data, we found that the agreements were not in place. So we had to then wait three months to get the agreements in place to then finally have some initial data to work with. And that’s something that you don’t want in your projects.

Lalitha: [00:12:17] Right. And I think [just in] that right, I think when I’m in conversations with clients, when I ask them, what is your business roadmap? They’re like, boom, here it is. When you ask them, where’s your product roadmap? Boom, here it is. What is your technology roadmap? They have it. When I say what’s your data roadmap looks like? And the immediate is like silence all over the place because they don’t have one. And that’s so fundamental. So I think of three points to everybody.

Brian: [00:12:45] Related to that. So I mean, part of what I’m hearing is that it’s not just the technology that you have to think of in the strategy. It’s very important to have the human element, the oversight, the management, the gatekeepers. All of that is just as important a part as the technology that supports your data strategy. Is that fair to say?

David: [00:13:06] Absolutely. I think that’s right on point. It’s been an open secret for a long time I think in the data science community. There are two really hard problems that come before you even getting started on the data science. There’s getting access to the data and then need another piece of the strategy we haven’t talked about yet is getting the data cleaned up so that you can actually perform analysis. So those two things just involve a lot of work around making sure the right agreements are in place, as Marc pointed out. But even as we’ve all discussed, knowing who to ask, knowing who to talk to about getting access, one person may know where it is, but they may not have the authority to grant you the access.

Marc: [00:13:50] That’s another point to really also take into account. You have to have buy in from the management for the project. So it really has to have a lot of backup from the sea level of the company to make this happen because really it’s a cultural change that you’re putting in place. If you don’t have to buy in from the management, you’re not going to get very far.

Lalitha: [00:14:13] Yeah, I think that when people think about implementing or they hire, I mean, there’s always things that say, Hey, we need to hire four data scientists. And the question then I ask is why? And they say, Well, we have a lot of data. We want to know what is telling us. Now, if that is the reason why you’re having these very expensive data scientists, I think that’s a mistake. And I coach them accordingly because I think you’re not hiring them for them to come and tell you what the data is telling you. It is really for you to build that culture around your entire organization, to Marc’s point. It is a mind shift. There is a paradigm shift in how you think about the data of your users that you’re collecting, how you maintain data privacy, governance and keep everything all the compliance laws in place. But in the same token, you’re actually able to use that data to make your business more efficient.

[00:15:04] So I think that thinking has to change a little bit around how do you have this digital and business transformation that you’re driving for your organization through data and getting everybody to buy into it so that as an organization you work as we are a data driven business and we make decisions based on the data that we have. So it’s really important for us to make sure that we govern that data and use it to make these better decisions to run the company. And I think that it’s easier said than done because we’ve seen this challenge with a lot of clients that David and I work with, some of them in Open Teams. And we’ve seen that when we call these things out, it becomes kind of a light that bulb goes off in their head and they’re like, okay, how do I do that now? So before they hire these data scientists, which are very, very expensive.

Brian: [00:16:00] I think that’s a good transition actually. We’ve talked a little bit about some of the challenges in implementing a data strategy, but you mentioned all these different things that the light bulb comes on for someone attempting to do this. What are some of the challenges that we haven’t talked about so far that a company may run into either as they’re trying to formulate their data strategy or as they’re trying to take a strategy that they’ve come up with and the roadblocks that may hit as they try to turn it into reality.

Lalitha: [00:16:33] It’s like if I were to break this down, I think there’s probably about three or four components that I would put them on bucketize them, if you will. The first is the layer on how you collect the data with, like I alluded earlier, the GDPR, the various compliance laws that are in place about what data you can actually keep, what you can keep, what you can collect, what you can’t collect. So I think there needs to be a very conscious decision about consent data, non consent data, anonymous data, not anonymous data. So there’s a lot of that. So I think that as a company, as a business, you do need to start thinking very carefully about what are these these different laws in place that on privacy that you need to pay attention to and how you collect the data. The second step in that process becomes where do you store the data? I would say four years ago, every client I talked to was just about public cloud, public cloud. Today, the conversation shifted to Multicloud, hybrid cloud, and most of them are talking private cloud. They’re talking hybrid cloud now and Multicloud. I’m not hearing public cloud as much in the market. So where you store the data becomes extremely important. And the third piece of that is your automation level. The cleansing of data that I think David talked about is you can have all the data in the world, but if it’s not going to be cleansed and properly governed then you’re not going to make a lot of sense out of it. So you do have to focus on that. And then finally, it’s automation. You cannot hire hundreds of IT people to support and just manage and map your metadata. You do need a strong sense of automation and framework where open source communities are extremely strong and we bring a lot of wealth in that space that you can do. So going back at data collection, data storage on the cloud, the data cloud AI around automation and cleansing and governing, and then finally it comes down to automation.

Brian: [00:18:22] Excellent. Any one particular thoughts on there? John, any experience with any of those four areas in your work with the scientific data?

John: [00:18:32] You know, it is one of those things that I want to go back to the point that management has to buy into it and so that you can collect the right data and not just what has been collected in the past, but what you need to collect in the future with the Internet of Things and all these low cost sensors available. There’s a lot more data available for low cost, but it’s not obvious how it’s going to be used in the AI machine learning space. But it’s an educational process to just say we have to have that roadmap that Lalitha referred to for the data collection.

Brian: [00:19:25] Marc or David?

Marc: [00:19:30] I think another point that people need to consider is finding the right scale of the architecture that you want to put in place. Because I very often see people just running with the biggest system that they can possibly find in the market, which then is way too complex to administer, manage and to maintain. It’s also usually very costly. And then, of course, you also have the other extreme basically everyone doing something small, having multiple different systems within the single company, which is also not ideal. So you have to find some kind of balance between those two and do some proper research before you actually head into this whole basically data strategy project and make sure that you choose the right tooling, the right architecture for whatever you want to do. Many people talk about, for example, big data 10 years ago was something completely different than what it is nowadays. Most companies nowadays, they don’t have big data anymore because as Travis Oliphant, for example, mentioned in another session nowadays you can put what used to be big data on a USB drive. Nowadays when you’re talking big data, it’s like petabytes of data. It’s like when you do IOT data collection, then you get into this area or you can get into this area of big data. But to be honest, most clients don’t need this, don’t need the big data stacks to actually handle their data processing.

David: [00:21:19] I think this is more targeted toward data scientists. One of the things, even with people I’ve worked with directly and clients is having a good understanding of what kind of operation you want to perform on your data and how that affects performance and how that affects what you should be choosing is your execution environment. Now, we had a really large client that was using spark, for example, and was using the data frame capability on Spark. And we’re just having awful, awful performance issues when really they had a batch oriented problem.

[00:21:57] So they were using a column oriented store, but they were accessing everything in a batch oriented way where they’re going row by row, which means all sorts of messaging across the network. Just a lack of understanding of what the implications of that choice of how the data structure and in the execution environment, having the cluster was absolutely necessary for what they were trying to do, but they weren’t using the cluster properly. So having a good understanding of what algorithms you need to choose and how that affects the data structures and how that’s arranged on the cluster matters a lot. And then I think to Marc’s point, a lot of folks think they need a cluster when maybe what they need is just a really powerful machine and could be executing orders of magnitude faster on one machine versus relatively quite slower on multiple machines.

Brian: [00:22:47] Getting the network latency out of the way.

David: [00:22:52] Yep. [overlap]

Brian: [00:22:53] You’re muted.

Lalitha: [00:22:57] I think to add on to David’s point. Some people believe that once you build a data science model and machine learning model, it’s done. But the reality is that it’s never done. And some of us who’ve been in the data business for a while, we know the challenges that come with it. And I think more importantly it’s the reproducibility of these machine learning models in different environments. So the way that behaves in your staging environment & in QA environment, it’s quite different. So I think the other thing to keep in mind is it’s not one size fits all and it’s a continuous, really a deep learning model that you have built. And and it requires consistent maintenance and support and sustainability that you need to factor as you go into this, get on this journey. [overlap]

David: [00:23:49] That Lalitha just pointing out also requires monitoring. So you need to establish metrics that you’re watching to make sure that the model is continuing to perform as expected.

Brian: [00:24:00] And there’s the human communication element there too and educational most. If you have the business leadership that isn’t aware that the models aren’t one and done that it’s a continuous maintenance effort, then they be maybe caught off guard when like, wait, I thought we could taper our spend on this. And then in reality, no, you can’t. So reaching back to something you said earlier, just throwing money and Oh, all we need to do is hire some expensive data engineers. One of the the phrases that came up when we were discussing ahead of time was data rich and insight poor. And that just like really bad and what you were talking about in that context, John, where it’s like you have this ocean, this lake, ocean, universe of data, but you don’t really know what it says. You don’t know how to make good use of it. Thoughts there, I mean that seems like another challenge of having the right tooling or the right methodology, the right approach to actually get useful insights out of your data. Thoughts there.

David: [00:25:07] John, you’re on mute.

John: [00:25:13] One of the big things that I think we underestimate is that there’s got to be a decision that this data is driving and you have to have the right business question framed. So it’s not like I have a tool and I want the answer. It should be I have a question now, what do I need in terms of data and analysis to answer that question? And I think the critical thing is what business decision is going to be impacted, that if you’re collecting data, it should be changing the way that you’re doing your business. Otherwise you shouldn’t be collecting that data. And unfortunately, people just collect the data because that’s checking off the box. What you’re supposed to be doing, I don’t know. I’d like to hear the other panelists respond to that.

David: [00:26:14] Before OpenTeams, I was with an Internet of Things company, and I think one of our number one challenges with customers was helping them to see that it wasn’t enough to gather the medical data or the performance of the device or its operating parameters over time. You had to have a real understanding of what were you going to do with that information once you had it. There was a lot of perspective thinking like, well, maybe we could do preventative maintenance, or maybe we could use this data to help the sales team know when to reach out to the customer to replace consumable parts. Those kinds of things are important, but you need to really have that as part of the plan from the outset. Otherwise what folks end up doing is they put a lot of cash into what they hope could turn into a business, but they don’t have a plan ahead of time. And that puts a lot of pressure within a year or two to deliver something. So they’ve got a lot of information, but they don’t necessarily understand what the information is telling them or it’s just not relevant to a business problem.

Lalitha: [00:27:19] Yeah, I think when people think about data, I think one of the things that they need to really think about is we’re not talking about data that’s already captured and put away and now they can go and parse it and go run advanced analytics on it. What businesses actually need to react to is real time data that keeps coming from their customers, their interactions, right from their CRM systems, their finance systems, that accounting systems, their blockchain systems. So I think sometimes when they think, they think Oh, I have all this data, I want to go make sense out of it. That’s not really solving the problem that they need to in order to run their business more efficiently. For example, for me personally, I want to know what’s currently happening at the top of my funnel and my sales and sales pipeline. I want to know how that’s impacting the propensity to buy and sell for me and how that’s impacting this quarter. Yes, it’s fantastic to know last year’s data, if you can look through it and run advanced analytics and tell me why we failed. But I would rather know that I’m going to fail next quarter. And that’s where the intelligence behind all of these data driven decision making comes. It’s going to help me get my business and shore it up, because that’s the value of this to me.

[00:28:37] So when people think about data, I find sometimes they’re thinking about the data sitting there like, what do I do with it? But that’s a wrong question is there is continuous data that’s coming into your systems, into your organization. The question is how quickly are you able to process that through your advanced analytics, through your data and AI? And tell me what action I should take as a business leader or as a technology leader so that I can actually salvage my business and make it more profitable. So it’s just slightly different way of thinking, current data that’s existing, but there’s continuous new data that’s coming in and your ability to turn that around into insights.

Marc: [00:29:16] To your point Lalitha, I think you’re making some very good arguments there. Unfortunately, I mean you want from the data system, right? Because you are both on the technical side and on the business side. Now, typically when you go into one of these data projects, those two roles are split in two. You have two, you have the tech team, you have the business team. And they usually don’t speak the same language. So that’s what I often find very difficult to mix and match, to make sure that the business people know what they actually can get out of the data and then also make the data scientists and the data engineers know what the business people are actually talking about and how to actually find that information in the data, because of course they don’t necessarily have the domain knowledge necessary to do this right.

[00:30:10] I think that’s the biggest challenge that you have in these projects to find a way to make the data scientists and data engineers understand what the business wants. And then you have a second problem. I don’t know if this is still very common, but I had it in a couple of projects where basically the business side just came to us and said, okay, we have all this data here. It’s all in our data lake or data warehouse. Please go and make some sense of it. Find the magic bullets in that data set. Give us predictions on how to do business in the next quarter without actually telling us of how we are supposed to achieve this. And that’s this mismatch is impotence, mismatch that you have there. This is something that you have to solve very early in the projects as well.

Lalitha: [00:31:04] Yeah, and it’s the toughest one. Marc, I think you hit the nail on the head, which is majority of the time that I spend is really trying to be the glue code between the CEO who wants business results. They’re looking at top line revenue. They’re looking at their CapEx rate. And then the other side, I have the CTOs and the CIOs who are like, well, how do I run this machine so that it’s actually producing. So being the glue code between the two of them so they can actually get on the same page in terms of what’s driving the business and the technology driving some aspects of it. The business driving the technology in some circumstances and trying to arrive at, okay, this is I think, where you need to be going and guiding them in that direction. So you’re absolutely right. It’s an extremely challenging spot because the two are equally right. But what they’re looking for is very different. And we have to find that sweet spot that’s going to help them move their business forward. So absolutely, there’s no silver bullet there, like you said, but that’s where we spend a lot of our time, is making them understand what’s important to them.

Brian: [00:32:16] [] good? Well, that was some very good discussion on those challenges. Thank you, everyone. At this point, I’d actually like to shift gears a bit and move on to a still data related topic, but one that Marc suggested that he’d like to talk about, which is event driven architecture as a paradigm for, I believe it’s data collection. Is that right?

Marc: [00:32:40] Well, it’s actually something that you can use for many things, right? I just came along this. For me it was new, a new architecture, a new way of thinking about architecture. When you actually dive into it, you realize that, okay, this is probably just another new term for something that’s been around for a longer time. Because if you’ve been in the IOT space, for example, then you’ve probably been using MQTT or similar kind of architectures already for a long time. But what’s interesting is that this architecture is now being developed on a larger scale, and it’s also being used for areas outside the IOT space. And I think it’s a very interesting way of thinking about how to put together different systems. Maybe I should just give a short intro into that. So what event driven architecture actually is?

Brian: [00:33:38] Yeah, please.

Marc: [00:33:39] It’s an architecture that’s completely based around events. So everything is asynchronous. An event happens and then you put that event into some queue. Then you can have subscribers to that queue that get the information about this new event and then they can do something with that event. So you have this publish, subscribe kind of approach to things and a queue approach which makes sure that ideally nothing gets lost and everything gets processed. By doing this, you can completely isolate the different components that you have in your architecture. And the components can be implemented in different languages, in different systems across networks and clusters. It’s highly scalable. And that’s also where many larger companies, especially for example, Amazon is using this for doing their whole shopping system. By separating all these different components, you can easily scale up because you just have to put, let’s say, more workers in that list on a particular queue and then take an event from the queue, process it, and then return the data into some other queue to continue processing. So it’s a very nice way of decoupling components, whereas the standard approach that you typically have is where you have like a workflow engine, like let’s say Apache airflow, take care of, of the whole process and managing the whole process from beginning to end. With the adventure of an architecture, this is different. You don’t have this entity anymore.

Marc: [00:35:15] Basically, every single component in the system takes care of whatever that component is supposed to do. And then, of course, you have to have some observability tooling around all this to make sure that you don’t lose any event or lose processing in your architecture. But I think it’s a very interesting approach to something that I’m pretty sure it’s going to be big in the next couple of years. It’s going to continue growing. And the reason why I was excited about this was basically because I was at a online conference recently where they basically talked about this. There’s this standard called AsyncAPI built around the event driven architecture, which is similar to open API, which you probably all know from the rest, or the GraphQL kind of approach of doing things, which is synchronous and AsyncAPI basically takes this and adds to it. This extra approach of basically doing things asynchronously, much like what you see in Python nowadays where we used to just be able to do synchronous processing and now we have the async keyword and we can do asynchronous processing in Python as well. What I found a bit strange though is that Python is not really big in that kind of community yet. And I would like to know your thoughts about this. Have you heard about this architecture before? Maybe even used it?

Lalitha: [00:36:49] It just reminds me, Marc of like a subscriber and publisher kind of. I mean, but this was like five, seven years ago when we had MQSeries. I don’t know if any of you remember web MQ. There was WebSphere, had a product at IBM that had MQSeries that was used as a message queue or a message broker. So we had this concept of subscriber and publisher. It sounds similar, but not the same when you talk about AsyncAPI, but kind of maybe it’s matured from where it was eight years ago to where it is headed now. But it sounds interesting. I haven’t experimented with it, but it sounds like where I would say things would go because when you type this out with like a microservices architecture and through a very stable gateway, API gateway or some sort, that maybe this actually can make things scale up and perform faster for sure.

David: [00:37:46] But I’m just looking at the website for AsyncAPI real quick. And what’s standing out to me is that it sounds like it’s a similar situation to rest when Swagger was first introduced, which is that it existed, it was available, people were using it. The problem was discoverability. And so you needed a way to document what do you provide so that people could then, let’s see, tool around consumption as well as have an entry point in the documentation system. So I’m looking at this AsyncAPI description and it looks a lot like open API in the sense that it’s saying, we have these different things you could subscribe to. Here’s the description of those things and then here’s what we subscribe to. What’s awesome about that to me is that you could traverse. So airflow and such are all about, you know exactly what you want to process. You know how you want the nodes to connect. But the problem is that it doesn’t make it easy to do ad hoc, to kind of build arbitrary workflow connections. So yeah, I think this is exciting and that you have this combination of discoverability with ease of extension of the flow of information. Interesting. I hadn’t seen it before. Thank you for introducing it. [overlap]

Marc: [00:39:19] John, you’ve probably been working in that.

John: [00:39:19] Yeah. Event Driven worked very well for a lot of other things with the Internet of Things and stuff like that, that it just makes sense that paradigm can be applied to our programming thing. But a lot of it is just attractive.

Brian: [00:39:46] But just to check my understanding, I mean, event drivens like within a single computer, like many languages have event driven paradigms on a small scale. One of the main innovations here is bringing it to a distributed large scale context.

Marc: [00:40:03] Yes. Exactly. It’s basically taking it out of the usual context where you use it. For example, let’s say I don’t know if you’re using windows on your on your desktop. Windows is completely event driven. And the same on Linux platforms. If you take GTK, it’s completely KDE. It’s completely event driven. Likewise, for the IOT kind of stack, you typically have all these devices send in events to, let’s say MQTT, which is a broker. It’s a protocol, but then it sends the data to a broker which then sends or makes these events available to other subscribers so that they can process that data. So it’s a very easy way for these small devices to basically push data into a system that then processes the data to create dashboards, for example, without actually having to put all the logic on the small devices. And now this architecture, this AsyncAPI idea basically takes this to the next step so you can have different endpoints that then go into those cues and then publish or subscribe to those cues, get events or push events into those cues.

Marc: [00:41:25] And then you can basically build your whole architecture around this. It’s very much like what we had a couple of years ago with these enterprise bus systems like what Lalitha mentioned. In the banking world, you have that a lot, IBM MQSeries was used a lot and nowadays it’s AMQP, that’s being used a lot. And you have various other implementations that are being used around this all with the same intent that because especially in banking, you always have different systems that have to work together. It’s typically hard to do direct interfacing via APIs, and that often has to do with the banking world not being very open. So they don’t necessarily publish the documentation for the APIs, but they do provide integrations into those queuing systems. And so you can use those systems to then basically take the systems that you have, which are usually black boxes, and then you can still integrate with them with other systems, which is a big thing in banking. And most likely it’s also a big thing in other areas.

Lalitha: [00:42:45] Have you seen this Marc apply in like I can see this kind of play out really big in e-commerce platforms and marketplaces. Because there’s a lot more even driven activities and applications that are interacting. So maybe that’s why you mentioned Amazon earlier. Right?

Marc: [00:43:04] Right.

Lalitha: [00:43:06] Cool.

Brian: [00:43:07] Excellent. Thank you for suggesting that topic, Marc. It definitely will be interesting to see if that event driven architecture paradigm does expand into different areas to where it’s more of a common thing. We did want to touch on AI., at least for part of the discussion. And so one of the topics that we had discussed beforehand was, you had suggested Lalitha where considering the importance of information architecture IA as a key underpinning piece of A.I.. Do you want to talk a bit more about what you have in mind on that topic?

Lalitha: [00:43:50] Yeah, I think we hit on that a little bit when we talked about data, data strategy and data platforms. But I think what I have seen is people have this misunderstanding that AI is about all these new models that you’re bringing, you’re creating this stuff, and it’s going to eliminate a lot of your IT staff. It’s going to change things. And they have a shock on their face when I tell them that’s really not the point. The point is that when you bring right information architecture to back your AI, what it actually allows you to do and where we’ve seen the most amount of success with clients is where they’ve actually improved their overall efficiency of running the business. When they have AI that has actually taken over some of the things like basic accounting ledger updates in their blockchain circuit, I’ve seen cases where insurance companies have been able to automate and use their for their CRM, their claim processing stuff has been automated. There’s a huge insurance company that we worked with where they actually took a predictive model and they were able to say, okay, these are the people that they want. I mean, a company is large with like 50,000 salespeople, digital sellers and salespeople. They wanted to know who they go after, which clients have the highest propensity to buy.

Lalitha: [00:45:21] So what this has helped is, you know, in some ways the AI is a little bit boring, right? Because it’s not actually doing something that you didn’t anticipate. Where we’ve seen the most amount of value that is provided with the right information architecture behind it is it has actually helped these companies elevate their game. So they have I mean, who wants to do these mundane jobs anyway? There’s a low impact work. So they’ve actually made those automatable, predictable, actionable. And they’ve been able to take those people and apply them to high impact work. And that has been the biggest win for AI. In my opinion, in the last five, six years where we’ve been constantly hearing about we want to be an AI driven company. Every company wants to do that and they want to get on the AI ladder in this AI journey. But the most benefit that they’ve all obtained is through this operational efficiency by making their development teams a lot more optimized for success and being able to automate some of these so that their business efficiency and they can repurpose their staff towards something that actually delivers business results. So that was really the discussion point that I know some of us were talking about earlier.

David: [00:46:34] I really like that description. The way I’ve been thinking about it. You’ve seen this explosion in the past month and a half with the stable diffusion models that were released and the different large language models that have come out. And there’s been a lot of concern that folks have voiced about the automation that this enables. And I think that I’m in agreement with you Lalitha. When I look at it, I think about it. This reminds me of what we saw around 1979 when VisiCalc was introduced. A lot of worry and concern that this was going to put accountants out of business. Instead, there was an explosion, primarily because it enabled previously impossible things, because it alleviated the burden of all of the tasks that were machine runnable, leaving the more creative applications and the more high impact applications to person to do it. I think AI. looks a lot similar where you end up having this kind of virtual set of assistants, a little army of interns that are highly capable. You can give them high level directions and they go off and do it, leaving you to do the things that are high impact and of any strategic value.

Lalitha: [00:47:46] Yeah.

David: [00:47:52] One thing I wanted to mention today was that if folks don’t haven’t started thinking about how to pair large language models with the work that they’re doing today, you’re behind the ball. It’s time to start really thinking about that. And I’ve been hearing rumblings that when GPT-4 comes out sometime early next year, it’s going to blow folks socks off. So it’s really time to start thinking about how could large language models be used to help with automating your environments and better informing the work that you’re doing within your different areas. There’s a lot of opportunity if you can figure out ways of better semantically labeling your data to your point about information architecture so that you can better pair it with and train models on on your data and your problem sets so that you can work at a higher level.

Brian: [00:48:44] [] I know this is probably old AI compared to the GPT-4 and things. But even just the voice recognition and the text correction on my smartphone, being able to just kind of get close and then it’s like, oh yeah, that’s what I wanted is I mean, sometimes it’s wrong, but most of the time it is genuinely helpful and it speeds things up and seeing that happen in larger and broader, like the Google, GitHub copilot in the software development space, so many different interesting new products and applications that are being developed to try to do exactly that to fill those annoying little cracks of work that you fall into with this AI assistance.

David: [00:49:30] I was using Tabnine earlier today and it wrote five or six lines of code for Python. It was really just the body of an initial user, but I didn’t have to type it. It was fantastic.

Lalitha: [00:49:44] It feels good when somebody can write my code, for sure.

Brian: [00:49:50] Any specific thoughts there, Marc or John? []. Fair enough. So [] you mentioned stable diffusion and obviously it’s just been amazing to see some of the other the video generation that’s coming out. That is awesome. But then there’s also the challenges of the veracity of video you see. Can you trust the pictures you see online? Can you trust the video? And so there’s questions of trust in AI from a number of angles. One is what we’re seeing real or has it been fabricated? Then also the other angle of AI based driven decision making where maybe someone is denied a mortgage or has a decision made against them and explainability of the AI models. So there’s this sphere of trust questions around AI. What thoughts there? Can we measure the various aspects of trust or the reliability or the visibility of the models in their outputs? What sorts of considerations there?

Marc: [00:51:12] I think that’s a very important question to ask. And it’s also a very important point to discuss not only now, but also in the coming years, because like what you say, when models start actually making decisions, for example, whether you get a mortgage or not, or whether you get a house, can you buy a house or can move into some area, local area? I think those are all things that have to be discussed on the ethical side of things. So it’s not so much a technical question. On the technical side, there are some tools for explainability. So you have like lime or SHAP or skater or LE5, these tools and they’re more coming out. But there’s still a long way to go to actually make sure that you can understand why a model is coming out to a certain decision. And it’s going to become even more interesting when suddenly you put models into, for example, self-driving cars where they have to make ethical decisions as well.

Marc: [00:52:27] For example, I’m going to hit this person there on the street, or am I going to drive into that tree that’s next to it. And I’m pretty sure that the driver is going to be dead afterwards. Those kinds of questions. And eventually we’re going to have courts deal with this as well. And the courts are going to ask questions. Who is responsible? Is it the driver? Is it the car manufacturer? Is the programmer that was in charge of building that model? Is it the supplier of the training data? Maybe it’s the model itself. Maybe we need to make models themselves liable for something. And then, of course, you have to associate some ownership with that model or maybe no one is responsible. I think that would be the worst outcome of that particular discussion. So I think it’s a very important discussion to have and spreads not only the technical side but also the political and the ethical side of things.

Lalitha: [00:53:29] I think a key component there to think about, like we talked about biased AI models where there was a use case that we worked on with a client where their models would not allow cops to go to a certain neighborhood just because of the way the model was designed is because it was unsafe. And this is exactly where Marc’s point about having ethical models so we don’t build subconscious bias into our models when we design them. And it underpins the need to have a very diverse and inclusive team and the culture within your organization and the talent and people that you have. It becomes even more important because the more diverse and inclusive your teams are, the more it becomes that you have a lot more folks paying attention to how the model is being developed. And you avoid the possibility of building these biased AI models, which can lead to really bad consequences. And especially in health care, I’ve seen some cases where obviously we have AI models that are in production that are being used by doctors and physicians to make medical decisions on your treatment plan.

[00:54:49] And I mean, these can’t afford to be biased. They have to be a lot more open. So in some ways, I know when you think about open source and some of our community work that we do, community backed open source, there is tremendous potential here because one of the beauty of having open source and these libraries and projects is it goes back to the hypothesis of open source software, which is you have a lot of people looking at this from various mindsets, from various backgrounds, so you have fewer defects. I’m not saying our defect backlog low or anything or debt, but what I’m saying is that you have a lot more people looking at the problem. And when we all stand for that ethical AI for good, there’s a lot of motions out there in the industry about what is the right AI. And I think this really helps and I’m really happy to see the work in the open source communities around making AI actually unbiased.

Brian: [00:55:52] Any other thoughts there? Nothing particular. All right. Well, we’re coming up already on the end of our time slot, so I think we’ll wrap it there. I didn’t see any questions come in to the chat, so nothing to ask on that. I think that’ll be it. Thank you so much to you four panelists for participating in this roundtable. Really appreciate your time and your contributions. Thank you to everyone in the audience for attending. Again, it was a great conversation about data, about data strategy, about event driven architecture. I particularly enjoyed. I hadn’t hadn’t been familiar with it going larger scale like that. So, yeah, we really appreciate everyone’s time. The recording of this event will be available soon. I don’t know exactly when, but it’ll be coming and information on that recording will be distributed to all participants once it’s available.

[00:56:51] Our next Tech Shares Event is going to be a week and a half, November 18th at 11 a.m. US Eastern Time. This event is going to be focused on the energy industry and it’s entitled Boosting Renewable Energy with AI and Machine Learning. We look forward to having you participate in that. Finally as I mentioned earlier, if anyone out there is interested in speaking or being a round table panelist at the future events, future Tech Shares, please reach out to us at tech-shares.com/speakers. Thanks very much to our panelists and our audience and enjoy the rest of your day. Have a good one.

Lalitha: [00:57:28] Thanks, Brian.

Marc: [00:57:30] Thank you. Bye bye.

Technology roundtable

Committed to your success with open source. OpenTeams is your easy point of access to a range of services from our open source expert network, from commercial open source support to open source training, staffing & recruiting services, and more.

Resources

OpenTeams