The March Of Ops: Dev To Data And Beyond With Steve Wallo, Ryan Yackel, And Brian Singer
Ops, ops, ops! The march of ops! Why is this term everywhere now? Well, ops is where the work is, really. And that’s a moving target. Thanks to DevOps, we got better at building. Thanks to DataOps, we got better with managing data. Now there’s MLOps and AIOps? That’s a good sign!
[00:00:40] Eric: All star guests for you now, folks, for a very fun show are all about March of Ops. What is ops? I keep hearing about ops. They had DataOps, DevOps, MLOps, AIOps, and all this ops stuff. Being a thinking type of person, I sat down and thought about it. I’m like, “What is ops?” Ops is the work. That’s the stuff that you got to do every day.
Your ops people are the operations people who were doing things, and that’s changing. The work is changing. Why? It’s because of automation primarily, artificial intelligence, integration, and lots of different things. Low-code and no-code, for example, are all over the place these days. It always has been. Low-code and no-code are not new, but it’s much more prevalent now in part because we have this whole confluence of amazing things happening on the innovation scale.
We are going to be talking to several guests, including Steve Wallo of a company called Vcinity. We have got Ryan Yackel and also Brian Singer lined up, and we are going to talk about ops. What are ops? Why are they important? Why are ops changing? It’s all this fun stuff. It’s changing because of automation primarily. Let’s go ahead and dive right in, and I will throw it over first to Brian Singer from a company called Nobl9, and you do SLO.
SLA means Service-Level Agreements. Everyone in the tech and business world understands what an SLA is. It says, “You better do this, or we are going to penalize you.” SLO is a little bit different, which fascinating. Let me do an opening statement for everyone. We’ll throw it out to Brian first. Tell us about your company and what you are doing.
[00:02:14] Brian: Thanks, Eric. I appreciate you having me on. Nobl9 provides a platform for companies to create, set, manage, and use Service-Level Objectives or SLOs as part of their overall ops strategy. The obvious question is, “What is an SLO?” I like to describe it as the point of ops. When we do ops, at the end of the day, the goal is to provide a service and to have that service running in a certain fashion for a certain amount of time.
All too often, we don’t ask ourselves the question, “What is that goal? How should the service be running? What is the customer expectation?” SLAs are a terrible way to answer that because SLAs don’t tell you anything about whether the customer is happy. They are a penalty if you breach some arbitrary goal. SLOs try to answer the question of how we need to operate this service to have happy customers and users.When we do ops, at the end of the day, the goal is to provide a service and to have that service running in a certain fashion and for a certain amount of time. Click To Tweet
[00:03:18] Eric: We are going to talk about that more. Let’s go around the room. We have got Steve Wallo from Vcinity. It’s a very interesting company. Tell us about yourself and what you are doing in the ops space.
[00:03:28] Steve: Thanks, Eric, again for having me on. What Vcinity does is it allows you to reach data in real time that is not local to you. If you think about ops, maybe it’s business intelligence, or you put some AI or ML behind that. One of the things that kill that process is the fact that the information is usually not where it needs to be in order to do something with it.
People tend to take copies and move copies around all of these different places, which slows down efficiency. What Vcinity does is we have changed that burden. We found a way to make the win perform in Google or the local area network. What that means to the users and the operations is I can now reach from my business applications, whether in the cloud or something like that.
I can reach this remote information that has been presented to me from my operation structure in real-time where I don’t have to say, “I asked him over here. It’s going to take me an hour to get it.” Now I can reach that instantly, and I don’t have to move things around. If you think about what that means from a business perspective, now I’m getting real-time insights.
I’m getting information that could be distributed geographically, and I can fuse it all together. I can use these tools anywhere and not have to make copies of data and move around. It’s a different approach, but it changes the way that 1) You are developing the code and 2) The way that you are using it because it’s a whole different strategy of where’s my data and how I can get information faster.
[00:04:53] Eric: That is a revolutionary approach. I’m not going to lie because you think about how much time and effort is spent trying to tackle the proximity question. We talked about that in an earlier show. We have a new show called Software in Motion. I recommend people check it out. It’s very interesting.
We talked about that very issue of data locality. Where is the data? What do you have to do to process it or to get it? You look at a Snowflake and what they did, which was very clever in separating, computing, and storage. That was a pretty big deal. Those changed things. We have all these different developments going on. We’ll get back to what Vcinity is doing because it is a seriously disruptive approach. Last but not least, we have Ryan Yackel out there from Databand. Tell us a bit about yourself and then what you folks are working on.
[00:05:36] Ryan: Thanks for having me, Eric. It’s good to hang out with you, Brian, and Steve. I’m a CMO at Databand. We got acquired by IBM, and we’ll tell you a little bit about why this is such an exciting space that we are in. It’s a big deal because they’re IBM. It’s exciting. What we do is data observability. The way that you can think about it specifically from the ops perspective is that traditionally when people think about observability, they think about application performance monitoring.
Those are tools like Instana, Relic, and Datadog. They are all around monitoring your production, application, microservices, cloud infrastructure, and all those things. What we have seen in the past few years or so is that there are a lot of these software engineers that are taking all of their skillset and are becoming data engineers now.
They are taking all those skills they have with Python, CIBC, and even DevOps. They are taking all those frameworks and now are supplying data, specifically data pipelines. They are taking data from a source. They are pumping it through a pipeline, maybe through a process like Spark as well. It’s ending up in warehouses like you mentioned, Snowflake, LakeHouse, Databricks, and then it reaches your end consumer.
The problem is that during the whole process if that works perfectly, that would be great, but it does not. Software delivery pipelines break, data pipelines break, and the data within those pipelines become very ambiguous. We don’t know if the data you are sending to your consumers is accurate. What we do is help you detect data incidents earlier and resolve them faster so you can deliver trustworthy data to your end consumers in your companies.
[00:07:18] Eric: That’s great stuff. I love the focus on data pipelines, and the old expression when you talk about prevention is, “An ounce of prevention is worth a pound of cure.” There’s a better cliché, “A stitch in time saves nine.” There’s an old one. What they are talking about is if you get a hole in your sweater and fix it right away, you can do a stitch in time to solve that problem. If you wait, then you are going to have ten stitches to do, and it’s a very unpleasant process.If you think about ops, one of the things that kill that process is the fact that a piece of information is usually not where you need it to be to do something with it. Click To Tweet
Ryan, you are getting me excited because you are focused on something which is a key driver for me and for a lot of these shows, and that is morale. Morale goes down when you are doing stupid things that make you upset because someone didn’t fix something, we don’t have the right tool, it doesn’t work properly, or whatever. That’s a drag. It’s going to bring your business down. It’s going to bring everyone down.
[00:08:14] Ryan: Maybe you talk to a software developer. I used to come from the software just automation space, and you talk to developers and like, “What do they want to do?” They want to build code, applications, and software. They want to fix bugs in production. They don’t want to spend all the time testing stuff. It’s the same thing with data engineers.
They take 80% to 90% of all your data workflows and are responsible for that large amount of data. They don’t want to spend half their time maintaining broken pipelines, broken data sets, schema changes, random no records, and all these things that go. They are constantly firefighting, and we want to help them resolve that.
The only way you can resolve it is you can do the best coding and testing you can, but like APM, you need something to catch something that you may not be aware of. We do things like anomaly detection around that so we can tell it right away and say, “You didn’t even know that this is an issue. There’s a broken pipeline. There’s your spike in data that reads that you had that you weren’t expecting,” and so on.
This is great because these are leading indicators, and maybe I will throw this over to Brian to talk about. Observability focuses on what we could call leading indicators. This pipeline went down. Let’s say, in your house. We had a deep freeze, and I’m lying in bed and hear something, and I’m like, “That doesn’t sound good.” It was a leading indicator that the pipe had broken. It froze. I look out the front door, and it’s like water spraying onto the ice sheets in front of my house.
That was an unpleasant situation. If there had been an earlier leading indicator, I could have prevented that from happening and solved a lot of trouble myself. In the world of tech, data, and systems, when you are troubleshooting, some people enjoy that stuff. It can be fun. When you get it right, you’re like, “I solved the problem,” but otherwise, if you are doing it all day, that’s misery.
[00:10:03] Brian: When you talk about morale and the data we have seen, one of the biggest issues for ops folks and engineers is the amount of pages they get during the day and night just paid your fatigue overall. Anything that we can do to reduce the amount of false alarms and pages that aren’t leading to actual issues or aren’t actionable is helpful for morale and employee retention because it is hard to hire good ops SREs, etc.
If you thought about ops several years ago, it was more of a classic system administrator skillset like, “Let us stand up the VM, patch the VM, and make sure that it’s running.” If you talk about automation, it’s much more of a software engineering skillset where you are building platforms that are software. If you have systems that are constantly bugging engineers and paging engineers, and you are not able to do anything about it, you are going to start to suffer some pretty serious employee churn.
[00:11:08] Eric: An employee churn is painful, especially in ops, because those people are doing the ops. When they leave, you are like, “That’s not good at all.” I’ll throw it over to Steve to comment. What do you think, Steve?
[00:11:22] Steve: You are spot on that. That is one of the biggest problems. You mentioned data overload. I used to do some work for the government on some fighter jets. It got to the point where the pilot could hold so much information to make a decision, so the jet had to do it for them. That’s where things like AI and machine learning are helping that process.
One of the biggest challenges with that is that I had so much data, and it’s everywhere. If I had a way to look at all the data at one time, I could do a better job of filtering out stuff that isn’t important. Maybe I’m looking at ten locations. Maybe I was looking in the past and said, “Here’s the data for this location.” I filtered it to what I thought was important and moved it back so I could do operation stuff on it.
If I could look at the entire data set, I might find things that are common that I never saw before because I was in a very myopic view. If I can see that entire data set, which is what we do as a company, then all of a sudden, I can create an SLO. It advances in the pipeline that allows me to skip steps but get better results in the long run. It’s all about getting information in the time that you need to make decisions.If you have systems that are constantly bugging and paging engineers and you're not able to do anything about it, you will start to suffer some serious employee churn. Click To Tweet
[00:12:36] Eric: That’s good stuff. I will throw it back over to Ryan, where this thread started knowing what to watch for is important. That’s what’s very cool about the world we are in now. I credit the open source community and a lot of the big data vendors, the early guys who went out there and forged this new territory. We can see so much more than we used to be able to see. That observability allows you to get the lead time and become a leading indicator instead of something that smacks you in the side of the head.
[00:13:06] Ryan: If you go to our website right now, we say, “No more surprise. Deliver trustworthy data.” That’s what we say on our website, which is we want to remove all the surprises you don’t know. Brian was talking about platforms or software. That’s what’s going on in the data space too. In my world, the people we talked to had engineers, platform teams, and data science teams. It used to be that everyone was a software company. That was the cool thing everyone used to say like, “Everyone’s a software company now.” That’s table stakes. You have to be a software company, and it’s like, “Yeah.” It’s common.
In our space, everybody is a data company now. Everybody is looking, “How do you harness this data? How do you harness all this power of the data?” If you are feeding ML and AI pipelines or a business dashboard that you can assume to make financial decisions about your revenue, if that data is wrong, that’s a big problem because everything is being looked at through the lens of a trusted data model. If that trusted data model doesn’t hold up, your analyst or science team has to go back to your engineering team. It’s very similar to a tester that found something in staging or production. They are going to have to do that. They are going to have to tell them and say, “Go fix this. Why is this happening? Why does it keep happening?”
[00:14:23] Eric: That’s the kicker. I see you nodding your head over there, Steve. It’s why this keeps happening. That’s what destroys morale when it’s the same problem again and again. You cannot get upstream. You can’t get someone else to fix it for you, and you don’t have the authority to fix it. That’s when you are checking out the help wanted ads on your personal laptop.
[00:14:44] Steve: You’re right. Ryan, you brought up a good point. It’s the wrong data. From what we look at, it might be data that has the same benefit or a detriment. You are spot on. If you are working with the wrong information or old information, that’s going to cause not only a morale problem, but you might make decisions that could affect your business in an unfortunate way.
[00:15:09] Eric: When you bring automation into the picture, too, it gets exponentially more dangerous. I will throw that over to Brian to comment on.
[00:15:18] Brian: Automation is a double-edged sword because it lets us move a lot faster. It gives us developer velocity and all that, but it creates a whole new class of problems. One of which is you are alluding to it, in terms of who’s responsible upstream is accountability. If you move to a get-off style of deployment where you are continuously deploying your software and using microservices, where it’s not one big release, but teams are all releasing independently, if something breaks, it’s very hard to understand if it’s something that you did or something that happened upstream. That’s the double-edged sword of automation. Now that we are moving so quickly, we need different and better instrumentation to be able to understand what’s happening.
You are seeing a move toward that with some of the techniques and observability that have come to bear in the last few years, like distributed tracing. That creates another challenge, which is data overload. There’s so much data about what’s happening in our systems and how you parse that and figure out what’s important.
[00:17:44] Eric: We are talking all things ops with a handful of pretty smart guys on the show. We have got Steve Wallo from Vcinity, Ryan Yackel from Databand.ai, and Brian Singer of Nobl9. Brian, I will throw it over to you to start off this segment. I love this concept of service-level objectives. Tell us again about how that maps into an SLA. SLAs can be relatively arbitrary.
There’s a reason we have them because we want to make sure people are doing what they are supposed to do. I get that, but how do you get there? How do you make that determination, especially in this new world where we have all this observability, new systems, new data, and endpoints? It’s gotten wildly more complex out there. The modern data stack we talk about is another layer of complexity too. Walk us through how your focus can help any one of these domains.
[00:18:44] Brian: What’s cool about service-level objectives is that there’s no financial penalty associated with them. We are just worried about customer happiness. We can be honest about the goals we are trying to hit. Either the goal can be stringent or relaxed. It depends on what the use case is and what the customer’s expectation is.
I like to use the example of where you are using your mail app. If you are opening that in a browser, you have an expectation about how fast that’s going to load on a cold start. Maybe if it loads within the first couple of seconds, you are happy with that. We could create an objective that says, “On a cold start, load this page within 2 seconds 99% of the time,” and now you have an SLO that you are tracking.Advances in the pipeline that allow you to skip steps but get better results, in the long run, are all about getting information in the time that you need it. Click To Tweet
The cool thing about that 99% of the time is that 1% of the time is acceptable to not load it in two seconds. We refer to that as an error budget. An error budget is how much unreliability we are willing to accept in this service. That’s very important because the more error budget we are willing to accept or the more error budget we have and that we can spend on reliability, the more we can do things like release new features, experiment tests, and so on.
You have that example of opening the mail app, and maybe that’s okay, but if you have a service that receives the mail, you don’t want the mail to get dropped on the floor. For that service, you might say, “Maybe 99.99%, which are four 9s, we are going to get all of our mail, or maybe it’s six 9s.” You can start to set different goals and objectives depending on what the customer’s expectation is and what’s meaningful. That’s the power of service-level objectives.
[00:20:32] Eric: Did you guys come up with this? Is that your concept?
[00:20:35] Brian: I wish I could say that we did. It’s something that’s been around for a while. They became very popular inside Google probably about several years ago. It started to be used by the cyber reliability engineering teams within Google. Within that organization, they are used everywhere, and they have caught on since then. You would be surprised how many modern DevOps organizations are now relying on service-level objectives to set their goals when it comes to reliability operating services.
[00:21:03] Eric: I will throw it over to Ryan to comment on this. We talked about you don’t want to have too many alerts. You don’t want to have too few alerts. You always want something in between. Increasingly, the goal is you need to find intelligent ways to aggregate these alerts into what means something. I will throw out historical context here. Many years ago, I had this super brilliant guy Zohar Gilad from Precise, on the show. We were talking about troubleshooting which is a big part of what SREs do. They do troubleshooting. A big part of what IT people do is troubleshooting. I took this detailed briefing from them. They were the top-notch solution back then or that thing.
Even still, you had to be freaking smart and knowledgeable to use this technology because all you are doing is looking at histograms and things. You are like, “CPU usage went up, and the network slowed down.” It’s not readily apparent, looking at the screen, what the heck has happened. You have to know the underpinnings. That challenge remains the same in this day and age. Even though we are simplifying things and we are getting closer and closer, the environment keeps getting more and more complex. What do you think about that as a general dynamic in the industry? How do you help folks deal with that?
[00:22:22] Ryan: That’s what we deal with all our customers. You mentioned the modern data stack at some point. If you go and look at the modern data stack in Google and Google images, you’ll find tens or hundreds of different logos and all these different little places and stuff like that. The market for the modern data stack is insane. It seems like a problem produced by maybe marketers. I know stuff like this is produced by marketers. The reality of it is this. If you think about our customers, they have hundreds to tens of thousands of data pipelines they are sourcing from tens to hundreds of different third-party sources.
It’s very complex and fast, and that’s the pipeline level. When the pipeline level is like a train, the train is taking it from one thing to another. On that path, they have got to see how fast that train is going. If it doesn’t hit at a certain time, that’s a problem. If the cargo on the train is completely wrong or we added a new hold to the cargo and didn’t know about this, that’s a problem.
Lineage comes into play when you are taking the training and dropping it off someplace. Maybe get on a different track. If it delivers the wrong data to the wrong person who’s going to consume it, that’s a problem. You are right about the complexities these companies are taking on, and they are buying software that they may not even be using or are thinking about it. What are the critical pipelines they want to be able to monitor?
What we are able to do is put in intelligent alerting around their critical pipelines. Brian said, “You want to be woken up every single day about all these random pipelines.” We can alert you on your most critical pipelines, at the right times, to the right person, and all these other things that we have, but different severities in those alerts to say, “Only notify me if it hits a certain threshold of an anomaly.”
We are able to do those types of things that remove a lot of the noise but also make sure that you know exactly where those critical pipelines are in case they break. One of our customers told us that they had a critical pipeline that was reading off of a database at one point. They wouldn’t have any idea if that database went down, only if we had been monitoring that pipeline. We were able to tell something they don’t even know about by monitoring all these complex pipelines that we have, so we are able to resolve and fix it.
[00:24:40] Eric: That’s some pretty cool stuff. That keeps the trains running on time. It keeps people focused on positive outcomes and what they are supposed to be doing all day instead of band-aiding things left and right. Steve Wallo, I’ll bring you back to comment on that, going back to this issue of morale and productivity. The reason I talk about morale so much is that when morale is high, productivity is through the roof. You can have a small staff and limited resources, but where there’s a will, there’s a way, but if you have a big staff, lots of money, and low morale, you are burning cash all day long.If you're working with the wrong or old information, that will not only be a morale problem, but you might make decisions that could affect your business in an unfortunate way. Click To Tweet
[00:25:16] Steve: When we look at the makeup of the group here, we are talking about service-level objectives, which are finding the metrics associated. We’re talking about observability, the stuff that Ryan brings, which is an understanding and being able to see those things. We start talking about the pipeline and all those things. Think about this. From a morale perspective, I will use an example of an event that occurred somewhere. Maybe they are looking for something. Maybe it’s based on security cameras or something like that in the city. One of the biggest issues we have seen in the past is that I want to take those to a cloud service, but it takes me half an hour to load information up.
If you look at the people that want to get that information, “Where is this event occurring? What are the other things associated with it?” If I have to wait half an hour to get that information, I’m completely working with the wrong data. Something could be going in a completely different area that I thought was great. That’s one of the things with morale.
It’s unfortunate, but it’s solvable. I will go on to Ryan saying about the train, which is a pretty interesting little analogy here. If you think about it from either a cybersecurity perspective or Snowflake, which you mentioned, the quicker I can get to that data, the more I can monetize it for myself and my customers. If you think about a train, what if I could say, “We don’t need to train and work with people. Maybe I can use data teleportation, where all of a sudden that completely changes the game.” That’s what we are bringing into that and how that affects morale and all that stuff. Those are the intangibles down the stream, but the idea is to provide the right information that people need.
[00:26:45] Eric: That’s an excellent point. I will throw this fun comment over to Brian Singer. I have a developer friend who’s worked with some big brands that everyone recognizes here. When you get into these web first plays, developers are half the ballgame. We need marketers, salespeople, and stuff like that, but you are out there creating functionality by code. This is a great quote. I will get all of your comments on this. It’s hilarious. He said, “Busy is the enemy of creative.” What do you think about that, Brian?
[00:27:18] Brian: That’s why so many organizations now are focused on developer experience as a leading indicator for productivity. If you talk to a CIO, VP of infrastructure or VP of ops type people, one of the things that they are most focused on is developer experience. That is an area that requires thoughtful investment because if you leave it to chance, you are going to end up with people doing one million different things and creating a terrible developer experience which saps productivity, morale, and so on.
What is that developer experience? It is how your developers work in your stack. The organizations that understand that are able to recruit, retain, and build better software on top of better and more reliable infrastructure. I find it so fascinating. If you look in the news, for example, Ford, their CEO has been vocal about how we have to become a software engineering company as far as the investment is all in the developer and developer experience. This is an automotive company. They historically make engines, but now they are so focused on software engineering, software developers, and that developer experience.
[00:28:33] Eric: On a show, we talked about a shocking graphic. I saw not too long ago that there were about market caps of manufacturers of cars. On one side, you see Tesla. On the other side is every other auto manufacturer in the world combined, and that’s equal. Tesla is equal to every other manufacturer combined, and you’ll be surprised.
Part of that is because of the fact that the markets are forward-looking. Ryan, I will throw it over to you. I guarantee a big part of that is because Tesla thinks big, and he didn’t make some incremental changes to the car. They went back and re-orchestrated everything from the ground up. He has this ecosystem where he got SpaceX. He is able to get lots of investment from the Federal Government and subsidies to do all this stuff.
What a clever guy. It’s vertical stack orientation. He owned the whole stack, and apparently, he’s going to build an airport now in Texas near his house. I love this guy. I love how big he thinks, but that goes to show you how big things can get if you think big and then work hard enough to get up into orbit. What do you think, Ryan?
[00:29:42] Ryan: I agree with that, and I go back to a little bit of the stories that I came back from the Gartner Conference. I was talking to lots of different people. It’s about the fact that you have these two groups that are going on in the data space. You have the analyst, science team, and engineering team. The engineering team and the platform team want to achieve the goals of the business. They want to be the next Tesla. They are getting a ton of pressure to compete in this insanely competitive space that we are in. They don’t want to be giving data or producing data to consumers that they can’t take action on, it’s dirty, or it’s bad.
I also say this morale problem where it impedes this innovation. At times, it silos. It’s never going away. I don’t know what’s going on. We were talking about breaking down silos, creating more silos, and trying to break down the silos. I don’t know what’s going on. Maybe there’s something going on there, but it still happens. When we talked to the analysts and science team, it’s not even like not meeting your SLAs or SLOs, which Brian was talking about. It’s more of the happiness and the confidence on this side that they don’t control. That’s not like they can reprimand the data engineering team, but they are a partner.
There’s this disconnect constantly of like, “Both teams want to go and be the next Tesla.” There’s still this disconnect between the data being correct or not. We want to help break down silos and make sure that when they are delivering data, it’s in a consistent space so they can scale versus not being to scale. The last thing I will say, too, is the amount of data that’s falling through or require more ML pipelines and deep learning pipelines. You’ve got to be able to scale that in a way that you can monitor the data because garbage in, garbage out. You are not going to be able to have that pipeline if the data is bad.Automation is a double-edged sword because it lets us move a lot faster, but it creates a whole new class of problems. Click To Tweet
[00:31:42] Eric: That’s a good point. You are right. Silos, break them down. Organizational structures are changing. Everything is changing, and the key is to stay on top of all this, and that’s why we have the show. Let’s dive right back, and I will throw it over to Steve to start us off. There’s that great quote, “Busy is the enemy of creative.” There are so many ways that you could do things, and what Vcinity does is very interesting and seriously game-changing.
I remember going back to the earliest days of this show, when we would talk in 2008 about data warehousing, ETL, and all this stuff. I have wrapped my head around all the ETL that is going on. I’m thinking, “This is crazy.” You are moving this data around and around again. It’s like musical chairs. If you are in kindergarten, you played musical chairs, and I was the only one that lost my chair.
[00:33:05] Steve: You are spot on. We have seen advancements in many different things, especially from these top core platforms and the tools that the guys created. I can move agile over the place from an operations perspective. The problem is you still have to have this little data puppy that follows it around. You are talking about puppies earlier. I can move the app here, but then I sit around and wait for the day to get there. We were talking about Tesla thinking big, and it’s also something where you have to think differently. You get to thinking big, and I’m going to make a bigger part and all these kinds of things.
It must be because the purpose of what we are doing and how we can do it differently than the norm is optical in the status quo. As we walk through, it’s the way that people use information, the way that people get information, the way that they monitor, observe, build, and all these things. In the pipeline, we have to find ways to remove the old and put something different.
One of the biggest challenges we have as a company is people don’t believe it. As my CEO says, “If we figured out how to make elephants fly, it’s not dumb, but it’s everybody else.” When you come up with these things, as an institution, enterprise, or technology, it’s a resource for the people on how to figure out, how we expose them to these new things, and how we bolt them into other things in a simple way, so they understand and use it.
Whether it’s an increase in morale, productivity, insights, observability, and all those other things, that’s part of the challenge, but that’s part of the fun. That’s at least from my perspective. I’m sure you guys have seen this as well, where we are creating things that the market needs. It’s getting people to understand it, use it, and see the value of world-class.
[00:34:39] Eric: I will bring Brian in here to talk about these SLOs again, and then we’ll go to Ryan and talk about groups, silos, organizational structures, and such. I’m a big fan of the cross-functional team or getting a task force together, and you get someone from operations, marketing, admin, finance, or C-Suite, and sit down in a room and talk for an hour.
What you should be looking at are these SLOs. That’s the thing you want to look at and talk through things because that’s how you are going to come up with new ideas of how to change things. The key is to get out of behind the eight ball of what a good friend of mine used to always call the tyranny of urgency, which I heard that I got chills thinking about it. If you are constantly in code red, you are putting out fires all day, and that’s not good for the business. What do you think about that, Brian?
[00:35:29] Brian: You are spot on. I’m guilty of the tyranny of urgency sometimes, and I apologized to my team for that. The interesting thing is that all of these folks are coming at the problem with different contexts. Your CFO or head of finance is looking at, “What’s all this infrastructure costing me? What is the ROI on the investment?” Whereas, maybe a salesperson or a product manager is saying, “I have to ship these features to make the next customer happy or to meet a sales target or whatever it is.” The engineer or the ops person is saying, “This is built on a house of cards, and we have got to go refactor the code and make it all work.” The funny thing is they are all right, but they lack the data to make a decision.
That’s what we try to get folks to do together and discuss the actual goal here for how this service will operate. Once you agree on that, everything else starts to fall into place. You say, “We all agree that for customers to be happy, this needs to happen.” When you say, “We need to provision more infrastructure to meet this operational goal,” the VP of finance says, “I agreed with you. This is what has to happen. If we have to provide more infrastructure, I understand what I’m paying for.” If we say to the product manager, “We have to pause releases right now to be able to get a handle on this tech debt,” you have data on which to base that decision because everybody came together in the same room and decided on what the operational goals were.
[00:36:54] Eric: That’s a great point. Ryan, I will throw it over to you. We are going through this remarkable metamorphosis in business because of automation, data, observability, artificial intelligence, machine learning, and natural language processing. All of these things are converging at once. That is fundamentally changing how we do our jobs.
There’s this interesting dynamic too, where one day I wake up, it’s going to be a future of more specialization. The next day I woke up, “No. It’s going to be the future more generalization because some of the specifics are going to be handled for you.” It’s a strange balance of both, and you have to know your corporate DNA. What are your assets? What are your liabilities? What are you good at? Where are you in the market?The quicker you can get to that data, the more you can monetize it for yourself and your customers. Click To Tweet
You have to assess and synthesize all of that, which you are going to do with data. You are going to look at the data and see, “This is what we are seeing.” I’m guilty of thinking, “This is a brilliant idea. Why aren’t people buying it? They are not buying it.” Eventually, I feel like, “We have got to do something different because the bills won’t get paid if we don’t.” It’s a very fun time, but it’s also a challenging time. What do you think?
[00:38:03] Ryan: If you think about data engineering teams in general, their skillsets are vastly different than data analysts. It seems like you have so many different data role types. There are data engineers and data analysts. They are analytics engineers now, data sciences. There are ten other titles that are out there. They all have different specialty skills that they are good at. Even within those organizations you have, skillsets are very focused. For example, you could have a code-driven data engineering team using things like Apache Airflow, Spark, Python, and PySpark. They are doing all this transformational type stuff that’s code based.
Then you have data engineers out that are very SQL-focused. They are all over the place. I understand whole bring everybody together, but there’s also the benefit of making sure that you have focused attention on the skills you need to make sure the data is going to be used in the most accurate way. I wouldn’t suggest a data engineer that loves to write Python code and use Airflow automatically go, “I want to learn more about being a data analyst,” or the data analysts say, “Learn how to do coding with Python so you can understand data and engineering.” To me, we got to make sure that for our space that the silos are focused and specialized, but we don’t want to propagate problems and continue to have problems within those certain skillsets.
With data engineering, the reason why observability is so important is because without it, you don’t get the insights you need to make your pipelines better and your tasks that are reading from downstream pipelines better that benefit the end consumer. If you don’t have that, they will keep propagating, and you will have issues continually.
That’s what we are trying to solve. It’s to make the data engineering team a rockstar data team. Make them a shining example of how to work with other teams by making sure that you are continuously monitoring what’s going on in your practices so that you don’t have a random Jimmy or Sally telling you, “This dashboard didn’t update. What’s going on? This is the wrong data. What’s going on?”
[00:40:18] I will throw it over to Steve to comment on. There are new jobs, new roles, and new ways of doing things. Almost everything can be reinvented. It’s a fun time. What do you think?
[00:40:33] Steve: Let’s bring Jimmy, Sally, and all these different people. Let’s say he took the company and put Jimmy in his own room and Sally in her other room. We split everybody apart. The lack of collaboration and structure that incorporates the entire team is a challenge. That’s what we are trying to break down with these silos.
One of the biggest things behind that is we are working on different pieces of information. We can share everything so that everybody has the same insights and feedback from everybody else as a collective. That could change the game, and that collective bowls around to the collective of, “I can all get to the data when I needed, where I needed.”
[00:41:12] Eric: I think about Google Docs. I love Google Docs. It’s like, “How did Microsoft get blindsided by Google on their bread and butter core business model, which was the Office Suite?” Google Docs revolutionized my life. I love it. We are talking all about ops, different ops, and how work is changing. That’s the bottom line. It’s a fantastic show. Brian had a great idea for the bonus segment, all about how to use some of this data to create SLOs to make better decisions. Tell us what you were thinking.
[00:42:01] Brian: Ryan is able to collect a lot of information about the accuracy and freshness of our data. One of the most useful SLOs I have seen is something called the data freshness SLOs. It tells us a lot about how we are doing in terms of keeping our customers happy with their data. It works like a latency or availability type SLO, except you are talking about how fresh the data is.
For this particular application, we would expect data to be fresh within the last five minutes, call it 99% of the time. That creates an error budget of 1% where the data could be a little bit scaled, and then you can create different thresholds for that. You can say four 9s that we expect it to be fresh within 20 minutes. That gives you a little bit of wiggle room, and then eventually, the data are going to be accurate.
[00:42:51] Eric: What I like is that you are creating these views of the world. We all understand a dashboard. What you’ve got that I’m looking at is you’ll have lots of different views of things. You can see it in totality and make some sense out of something. This is an age-old construct. Steve, I will throw it over to you. What gets measured gets managed. I weigh myself every day, so I pay attention to where I am. The lowest weight I had in a long time was 185. I was pretty impressed by that, trying to get into the whole 70 range that I haven’t been since college. The point is, if you pay attention to it, you will notice things. If you don’t have metrics around it, then it’s somebody’s opinion.
[00:43:34] Steve: This topic is a TF for at least what I do for a living. There is perishability with data. You mentioned that before. You want to work on the right things, but you have to understand what the right things are through the rest of the team on the call, which comes into play which is okay. What do I get? Is it good? Is it bad? Is it old? What’s going on here? Let’s say I want to use the cloud. Am I able to move my data if I have cloud-based applications? Maybe there’s data sovereignty. I’m doing a lot of work in the Fed government space, which is sensitivity. Can I even move my data to these tools that you guys offer to do something with it?
Maybe it’s too big and changes too frequently. These are all the things that get in the way. At least from our perspective, we are the glue. You guys provide the management piece behind that. You mentioned operations or management. The more copies I got, the more I got to maintain and the higher the risk associated with somebody getting a hold of something. Our job is to minimize it in a piece so that we are not blasting people with the same thing twenty different times. It’s a complement to what you guys talked about on the show and where the industry is going.
[00:44:36] Eric: I will throw it over to Ryan for some final thoughts here. Data is the lifeblood of business. Every application uses data. Every application will use data. Without data, applications are meaningless. We are getting better and better at being able to see data and put data in context, and I have these leading and lagging indicators on data and what it means and where it goes. The more we focus on that, the better off we are all going to be. As I look at the metamorphosis that is going on, it’s in the process now. It’s changing. You see it changing in job titles because you don’t have these jobs anymore because that part got automated. That’s great. You do have to start somewhere.
I use this example. Years ago, when I started in the newspaper business, I would lay the paper out as the editor. I had to write so many stories. I have stories written, and then I would get from the production department where they called dummies, and dummies are pages laid out with all the ads so I can see how many inches of content I need for each page. If I don’t have a lot of content, I’m up late writing articles, or you are coming up with pictures or doing something.
The point is you always had to start somewhere. There’s no way I can look at the totality. Even 14, 15 articles, 5 pictures and 18 pages to fill, you can’t do that in your head at once. You have to start somewhere and then build out from there, which is what I would always do. That’s what I see as the big challenge for businesses. It’s like, “Let’s start somewhere and figure out what Susie and Bill are going to do.” It’s going to change. You are not going to be doing the stuff you did yesterday. That’s probably good for morale because it’s fun to learn new stuff. What do you think, Ryan?
[00:46:14] Ryan: The insights you are getting specifically around the data that are in your space allow you to take on more workloads. For example, we had a customer scaling up their ML pipelines. Around 60% of their pipelines had at least one data incident, which is way too high. They needed to add more pipelines to the mix. People want to take on more work. They want to be able to do more things and fun stuff. We were able to base off on monitoring the insights around the data, as it’s flowing to say, “Here where all your problems are in these pipelines. Fix them,” and now it’s less than 1% of their pipelines out of data instance.
It’s a great use case for fixing stuff so you can go faster and have a better time doing the stuff you want to do. You slow down and do that. Brian was giving an advertisement for data observability when we first started this segment because he was talking about data freshness, being able to see record counts, and data operations. That’s exactly what observability does. It’s like, “You are expected to get this thing.” I did a webinar called How to Guarantee your Data SLAs with Data Observability. I didn’t want to tell you that because you are an SLO guy, but I was using the term data. I like your term better. That’s a better term that the data engineering team should use versus SLAs.
What you did was cool in showing that SLAs have financial implications down on the consumer side. If this product isn’t up and running because of the data that fed it, that’s a problem for the business. There are also those SLOs that the data engineering team and the consumer team need to agree upon so that they are good to go when it comes to data. One of the things we do is like what you said, “We expect the data to get to at this time, at this day, between this threshold, we are okay with it.” We are able to do that by setting certain anomaly detection in place if we cannot find it if it doesn’t hit that data freshness metric.
[00:48:27] Eric: Check all these guys out online. They are brilliant folks. I’m sure we’ll get them back, and I’d love to get a deep dive on both Databand, Nobl9, and the latest on Vcinity. Folks, hop online. Send me an email if you want to be at the show at Info@DMRadio.biz. We’ll talk to you next episode.
- How to Guarantee your Data SLAs with Data Observability
About Ryan Yackel
CMO | Product Marketing OG | Go-to-Market | PLG | Demand Generation | Brand Building | Sales Enablement
I like to tell stories, run go-to-market programs, and evangelize SaaS products.
About Brian Singer
Chief Product Officer @ Nobl9
Technology leader with experience in product development, strategy, and marketing. Skilled at translating customer requirements into scalable business models and executing on ideas to drive revenue growth. In-depth knowledge of cloud and SaaS technologies and markets.
About Steve Wallo
Steve currently serves as Vcinity’s Chief Technology Officer, overseeing insertion of advanced technologies and strategies into customer architectures and future IT decision methodologies. He is responsible for bridging future IT trends into the companies existing portfolio capabilities and future offerings.
Prior to Vcinity, Mr. Wallo was CTO at Brocade Federal, responsible for articulating Brocade’s innovations, strategies and architectures in the rapidly evolving federal IT space for mission success.
Mr. Wallo has served the U.S Government, as the Chief Architect for the NAVAIR Air Combat Test and Evaluation Facility High Performance Computing Center.