Really Real: Real-Time Analytics Happening Now! With Greg Michaelson, Fadi Azhari And Damian Black
There’s never a wrong time for real-time! Historically, the challenge around real-time analytics was mostly the cost, but times have changed. A whole host of new solutions have recently arrived, with others coming down the pike. Check out this episode of #DMRadio to learn more! Host Eric Kavanagh interviews several expert guests, including Analytics veteran Greg Michaelson, formerly of DataRobot, along with Fadi Azhari of StarRocks and Damian Black formerly of SQLStream. Tune in as they discuss analytic databases, MPP architecture, the power of indices, and how to leverage vectorization for the optimal query engine.
Really Real: Real-Time Analytics Happening Now! With Greg Michaelson, Fadi Azhari And Damian Black
Eric: We’re going to talk about something that frankly got me into this business a long time ago. I wrote an article about real-time data warehousing way back in the year 2002 or something like that. It takes a while to get there these days. We do amazing things in the technology space, but real-time data warehousing was a topic many years ago. It is the real thing nowadays in many ways. We’re seeing a whole evolution of real Renaissance around real-time analytical platforms, whether for building apps, for data discovery, or data science, for example, lots of different things and lots of ways to skin the cat, too.
We have all-star casts. We’ve got Damian Black, my good buddy of the SQLstream team, Fadi Azhari from a company called StarRocks, and Greg Michaelson, a veteran of the space from DataRobot, who’s been doing this stuff for a long time. We’ll talk about what’s happening, why it matter,s and what it means for you. First of all, I’ll bring in Damian Black from SQLstream. SQL stands for the Structured Query Language. Damian, tell us about your time at SQLstream. You guys saw this coming a long time ago.
Damian: In fact, it was inspired by internet and telecom data that was gushing out. It was defeating at the time, the ability of data warehouses to be able to process that because you have millions of records coming per second and analyze continually what’s going on. At the time you wait for queries to complete, you’re out of date by billions of records. This all could have gone full circle now. I now work at Guavus, part of Thales.
Thales acquired SQLstream. It’s been merged with Guavus. They do real-time analytics over telecon data. What we’re seeing now is this 5G wave, which is pushing for ultra-low latency, distributed applications operating at the edge. SQLstream is the technology that’s underlying all the application suites that Guavus is putting to give your machine learning-driven analytics of what’s going on in all of these new enterprise applications and services.
The SQLstream is available also generally in the market. We still sell out a separate self-contained platform. It was the pioneer. It operates and allows you to be able to query and analyze using SQL language, turning it inside out. You run the queries against the live strings of data. In millisecond latency, you get an instant stream of answers coming back. It’s clearly important. The whole world wants to be able to process distributed information and get answers immediately.
Eric: Fast answer, that is what everybody wants certainly in telco and in other intensive industries where you have tons of telemetry data flying around and you want to be able to solve problems quickly. The whole analytics industry came out of the data warehousing space, which was designed many years ago arguably. At least those architectures of persisting the data, then doing your analysis on top of that relational data model basically. Now there are lots of ways to do things. It’s the blessing and curse of our industry. The key is understanding what your use case is, what sources of data you have, and what’s going to be the most effective way for you to get that done.
Damian: The keyword here is telemetry. It’s the continuous distribution and availability of updated time-based information that you have to make sense of to make the right decisions in this highly complex connected world. SQL turns out to be part of the holy grail for parallel processing ironically. I started my career right back at university. I won’t embarrass myself or anybody else and how long ago that was.
Data flow architectures were the next big thing for parallel processing, but people were looking for a specialized language. You may remember Occam and also single assignment languages to try and provide a way of programming systems, so you could extract the parallelism. The problem with the functional approaches with those recursions is it stops the effective use of the hardware, but right in front of all of us was the SQL language.
Once you apply SQL to these infinite streams, you suddenly got the ultimate data flow language. You can do your parallel processing. It’s so beautifully adapted to the world of large numbers of CPU costs, inexpensive high-performance memory, and everything connected with high-performance inexpensive networks. It’s the right technology at the right time. With everything being connected, you want to make those real-time and take advantage of all of the information. The SQL and relational way is the only way you can do this in an optimized, reliable mathematically sound way. That’s why the relational database is still at the apex of the data pyramid.
Eric: You talk about SQL. We often talk about the history of our industry and how we got to where we are. Pre-show we were joking about Hadoop, which came out of Yahoo as the engine for indexing the web. You had companies like Cloudera, Hortonworks, and MapR, building out what was the foundation of an ecosystem. There were a couple of big mistakes made along the way.
As I recall, Cloudera and Hortonworks thumb their nose at that SQL and the vision was, “It’s going to be a future of MapReduce, which was this construct designed to index the web.” That turned out to be pretty hard to reverse engineer for a lot of analytic use cases, time series stuff, for example. It didn’t work as well.
We had this whole movement of SQL on Hadoop. We had people trying to bolt on a SQL interface to the Hadoop file system that also didn’t work terribly well, then what happened? You have Snowflake come out as a relational database in the cloud. There were already other databases in the cloud that were doing pretty well. Redshift is among them, but a Snowflake took off. I came back from their conference and they’re talking about supporting all these different languages. They’re adding Python.
SQL is the foundational language used in that environment, but they’re talking about all kinds of data, workloads, and everything they’re focused on. It’s an ambitious agenda. One of the challenges here is that the environment keeps changing and evolving. Companies that want to use the latest and the greatest have to pay attention to that and figure out how they’re going to get access to that. What do you think about that?
Damian: The relational model in terms of describing everything in tables and streams is a great way of describing the attributes that you care about. Those attributes can be drawn from any information. It could even be from a voicemail, an image, semi-structured, or even unstructured information. SQL is a way of managing information. In SQLstream, we support Java, C++, and Python that will be dynamically loaded.With everything being connected, you want to make data real-time and take advantage of all the information. Click To Tweet
You need to do that because those are the languages that are the output formers, the machine learning model, and you want it to be able to load those to provide the smart AI interpretation of data and data streams. You want to have that working in conjunction with the good, old reliable SQL view of the world. It is structured where you know what you want to do.
I view the AI approach, and the machine learning model as the unstructured approach because you don’t know what the logic is that connects the inputs to the outputs. You want that to be defined by the machine learning itself. You want those systems to work with structured reasoning as well. The relational world is the only one that provides that foundation. You’re seeing Snowflake and all of the big data solutions, even the no SQL databases all rushing to support and do support in most cases SQL because it is the only way people can manage information ultimately.
Damian: It’s a de-facto standard. It’s everywhere. I don’t think it’s going away anytime soon. Certainly from what we saw in the Snowflake Summit, it’s popular. There are a lot of people who know SQL. Once there’s a language that’s affected, the people know how to use, that people want to use, you have to go with the flow and deliver what people want to have delivered. Let’s now bring in Fadi Azhari from StarRocks, a company doing interesting things and a massively parallel analytics database. Fadi, tell us about yourself, what StarRocks is doing, and why you’re coming out with this information with this technology.
Fadi: I’m the VP of Marketing. StarRocks has been in existence since 2020. We are completely SQL compatible, real-time fast analytics. Our focus is on delivering lazy performance in a way to make it easy and democratizing to use with analytics. We all know that the space has been crowded. Our focus is not to be the Swiss army knife to everybody.
We’re focusing on delivering that experience to everyone in the organization and on making sure that the speed they expect from real-time analytics is there to serve their needs. Organizations are going through a huge amount of digital transformation, adding new services, connecting their operations to their end-user services, and so forth. They need such an engine. That’s the focus of our organization and what we’re doing. We’re based in the San Francisco Bay Area.
Eric: You got back from the drawing board. You are focused on vectorization and leveraging the power of vectors. Can you go at least to a high level about what that means and why it’s important for analytics?
Fadi: If you think about the problem, “Why is real-time analytics so difficult?” there are a couple of things that are there come to mind there. I’ll give it a bit of context and then I’ll dive specifically into that area that you asked about. Data freshness and responsiveness are both important. It’s not just about getting the data, but being able to query and analyze it. One of the things that a lot of companies have been doing is some schema of normalizing the data right into flat tables.
By doing so, you can gain more access to data. It makes access to data difficult to analyze if you are changing constantly your data. We believe that we are a vectorized engine. The way we approached it, we’re able to address both flat cables as well as Snowflake schema. They are able to work on both and achieve the significant performance of the metaverse. The vector or the engine that we implemented, addresses that in such a way that makes it simple for people to adopt either approach and still achieve this X-performance advantage over what’s out there.
Damian: I looked up after I came back from Snowflake, one of the founders of snowflake came from a company called Vectorwise, which is a part of the Actian portfolio. I remember learning about that years ago. They were one of the early database companies to focus on vectors. I remember from back in my early days in the late 1990s or 2000s, I started working with a web company in the flash space, Macromedia Flash, which came from a company called FutureSplash way back when.
They built The Simpsons website back in 1998 or 1999. I was blown away by it because you can zoom in on this graphic and it would remain clear. Why is that? It’s because they use vectors. They use a formula to tell the computer how to generate the image instead of a pixel-based approach, which is when JPEGs, GIFs, and other things like that are used. The benefits are tremendous. There are lots of benefits. One of which is the file size is much smaller because you’re giving a set of instructions to the computer to do something. Is it similar to vectorization in this context?
Fadi: Let me dive a little bit more into what vectorization does. We believe we’re the only query engine that allows you to execute across CPU memory and the storage layer. You’re taking the advantage of all the layers and being able to do that in parallel. In this case, it allows you to do the execution, but do it in an economical way. You’re honing in on the results you want to get out of your analytics, but doing it across all those layers in a parallel way. That’s how we use the vectorized per engine in our implementation.
Eric: Maybe tell us a bit about some of the use cases. Who are you targeting? For Damian, the telco is a hot industry for real-time data, but who do you target?
Fadi: Typically, we target organizations that are going through a major digital transformation, existing watch organizations that are adding new services, and SaaS services at the edge of the network. Think of it as all the innovative SaaS companies that are delivering services that are constantly changing and need analytics on the fly. One of the customers we have in the US is Airbnb. They’re using our engine to drive the analytics they need for Minerva, which is their metrics platform at the edge.
For instance, use cases are able to quickly analyze their users and their operations team to understand what’s going on, and even deploy that in a highly concurrent way because of the number of users who can support tens of thousands of their end customers. Think of it as an analytics engine that allows people to flatten the amount of analytics you get from the edge to the user all the way inside of the organization. That’s what we’re focusing on.Data freshness and responsiveness are both important. It's not just about getting the data but being able to query it. Click To Tweet
Eric: That’s a useful approach, quite frankly, because I refer to it as a sort of a B2B2C play. You’re providing services to your customers who are in turn extending the power of those services to their customers. That’s a wonderful reality for lots of different reasons. One of which is because it helps your clients better serve their customers.
Once you can interact with data, whether it’s some small dashboard inside your SaaS environment, for example, or it’s a little recommendation you’re getting from the engine about what to do, who to reach out to, how to respond to certain things, once you start getting that real-time speed, you don’t want to go back. It inspires people to get more active, to start asking more questions. That begins this whole conversation where you’re genuinely leveraging the power of analytics. You’re no longer just talking about being data-driven. You are being data-driven and analytics-driven in particular, and that’s good stuff.
There are lots of ways you can get that done. We talked to Damian Black of SQLstream, they’ve been doing real-time analytics for a long time and Fadi Azhari from StarRocks, they’re building out an engine to do just that. Greg Michelson is up next. He is from DataRobot for many years and has been around doing this stuff for a while. Greg, tell us a bit about yourself and where you see the industry going, and maybe where the inflection points are or are not yet.
Greg: I joined DataRobot back before the rise of automated machine learning and the citizen data scientist. I help build that ethos or that push for putting AI in everything. Certainly, over the last years, I’ve found that there are so many vendors out there that are producing products that are in search of a solution. You have these innovation teams at these large organizations that have big software budgets.
They go around to trade shows, talk to vendors, and play. Making the connection between those products, actual business problems, use cases, and so on is a super-challenging thing. My first question, whenever looking at a company or technology is, “Do organizations need this and how hard is it to convince people that it’s something they can apply and use?”
Eric: There are lots of reasons for that. You mentioned these innovation teams. I’ve been an advocate and will remain. Every mid-sized to a large company has an information strategy or a data strategy group to understand. A lot of times you’ll hear it called a center of excellence where you help educate the business about how these technologies can apply. That’s part of what we need to have if this stuff was going to work. You always want to make sure it aligns with a business goal, but then how do you do that? Do you think a center of excellence is the way to go?
Greg: The market for this technology is significantly smaller than most people think it is. If you’ve ever worked with a chief financial officer, you go to him and say, “Mr. CFO, we want you to use time series forecasting machine learning for your revenue forecast that you share with the board.” He will say, “If you have a dashboard, I’ll happily look at it, but thank you so much. We have a process and we don’t need stinking machine learning. We’re going to show it to the board.”
The people that are running businesses are perfectly happy to look at analytics. If they happen to be around, it happened to be useful. Even in the biggest companies, there is still a significant level of resistance to this technology. I don’t think that COEs and innovation teams are actually making it better, in most cases. Largely speaking, innovation teams and centers of excellence have a reputation for being black holes of value, cost centers, not profit centers, to be sure. To me, the big challenge for vendors and buyers out there is a customer segmentation problem.
There are 5% of companies that know exactly how they want to use analytics, whether it’s real-time data, machine learning, dashboards, or whatever it might be. They know. They’ve got a plan. They’re going to go and buy the tools. They’re sophisticated in what they do. They use that technology to make their business run better. Everybody else has no clue. You look at these companies. They’re in the category of, “If this is easy and valuable, then we’ll use it. Otherwise, stay over there in the COE and leave me alone.” That’s what I’ve seen in the last few years.
Eric: There is some truth to that. I wonder how long that can last, especially as we go through a bit of a slowing down in the economy. One nice thing about low tide is you get to see where all the bodies are buried.
Greg: If you look at all these real-time data companies that are out there, probably less than 5% of them use their own product for real-time data analysis, even machine learning companies. Do they use machine learning to run their business? That’s a big question mark. To me, the big thing here is that there are organizations that know how to use this stuff and do it profitably. No vendor is going to be able to get a company to that point. That’s something that their executives have to drive, the big telcos, the big retail companies, Walmarts, Kroger’s, those types of companies, and your largest healthcare companies are doing. Everybody else either doesn’t care or is completely clueless about how to use this technology.
Eric: That’s not surprising because you have to have some business use case to get things rolling. I was impressed with what I saw from DataRobot. You wrote something on LinkedIn. If someone wants some good entertaining content to absorb about DataRobot, just look up Greg Michaelson’s LinkedIn profile, and you’ll find some comments there. Are we going through another AI/ML winter like we’ve seen it before?
Greg: It certainly feels like a bubble to me in some ways. There has been so much venture capital out there. It’s been so easy to raise money and do all of these different companies and ideas. There are cool companies out there. There’s a company called AtScale out there that does some innovative stuff. I talked with a company called Snowplow that’s doing neat stuff around click data and click analytics. There are diamonds out there, but there’s also a lot of noise. Expecting organization businesses that are not in that space to figure out what the diamonds are and separate the wheat from the chaff is a pretty tall order.
Eric: Let’s bring maybe Damian Black back to comment on some of this stuff because you’ve been doing this stuff for a long time, Damian. Many years ago, you had to do some explaining to people about SQL on live data and on the streaming data. Now within the confines of the industry, the insiders understand what that stuff means. Part of the challenge is that there is this big, long tail of legacy technology, legacy processes, and legacy mindsets. Where do you think we are on the learning curve for leveraging real-time analytics for real business solutions?Everybody else either doesn't care or is completely clueless about using analytics technology. Click To Tweet
Damian: First of all, I want to comment on the challenge of deploying AI. In the real-time world, a lot of people are making decisions on operating in the real-time world. The CFO is pretty one of those people that’s furthest away from that. If you look at all of the connected things that are out there, whether vehicles, aircraft, all forms of transportation devices, and if you’re sitting in Ukraine and you’ve got a missile coming towards you, then you have to make decisions in real-time. You can’t wait to sit around for weeks and months. You’ve got to make decisions there then.
The real-time use cases are coming now from the fact that everything is becoming connected. Anything that’s moving and generating a lot of data that affects people’s lives and safety becomes important to making real-time decisions. In terms of the question you were asking about, people do move slowly. They change their perception slowly. I often do think that a lot of people would be quite happy in their buggies where the robot force is dragging them along as fast as they possibly can because that’s the way they look at the world.
A lot of people still think that what they need is an in-memory database because they want to do things faster. What they want to do is to be able to adapt incrementally to new information that’s coming in. That is starting to take off, but it’s taken the full fifteen years of people to realize that. SQLstream is still the only company that I’m aware of that was designed from the get-go around SQL standards.
Everybody else has grabbed a SQL interface on top as an afterthought, almost begrudgingly. That probably tells you something about how the world looks at things. If you do design and embrace it properly from the get-go, then you find you’ve got an approach system. For example, we are somewhere between 10 and 100 times faster than Spark Streaming.
It is still a micro batch-based system, which means you can’t react and respond to each new record. That new record might be an indication of a missile. If you’re a Ukrainian soldier, it’s rather important that you react and respond to every record. People start to understand the importance and they start to see this like shoals of fish. They will suddenly turn, but you got to get to that pivot point, that breaking point when everything changes. We’re getting there because of the connectivity. There’s so much information that people want to react and respond to.
Eric: I’ll bring Fadi back in from StarRocks. What is fascinating to me is watching this industry and how we can learn from the mistakes of the past. Not that it always happens, but you’ll recognize things. Take Snowflake, for example, a couple of things they did and I thought were clever is Number 1) This whole separating computes and storage, which is a big deal, but Number 2) They figured out that when a schema is wrong or suboptimal in the old world of data warehousing, changing that schema was a pretty significant challenge because then you have to reload data.
You have to do a whole bunch of stuff, and it’s a pain. They figured out, “If we can let people just tear it down, redesign it, and then rebuild it quickly, that’s a much different story.” That’s one of these little innovations that we’ve picked up along the way. From what I understand, one thing you folks in StarRocks figured out is, “It would be great to have this vectorization across the whole process, so we can optimize start to finish and be able to deliver that real-time analytical capability like people have been expecting for years.”
As Greg was musing about here, part of the challenge is that there have been promises like in that Lady Gaga song, Promises. There are lots of people in the enterprise who are like, “You told me that many years ago, and it didn’t quite work out.” I do feel like we have learned lessons and you can see it in the StarRocks architecture. What do you think?
Fadi: One of the things that we wanted to do is work backward from the customer pain points. We realized that people were not satisfied with what they had. They were treating the problem or the pain points by looking at it as a tree problem rather than the forest problem. It’s not about normalizing or denormalizing. They started out by saying, “We could denormalize the tables,” and all of a sudden, we’re going to get better performance. Instead of looking at the pain point that customers have, I don’t want to be able to not to have to worry about denormalized or not normalized to make it easy. We talked about consuming that database so easily and the engine.
We looked at how we can build innovation in our product to allow you not to have to worry about this. For instance, if your data is changing, how do you make sure that that data is still fresh and you can use it? That’s number two. The third problem with the real-time analytics issues that we’re not addressing is the number of concurrent users that need to use the platform. That’s something the organization has failed in doing.
Most of the solutions that are there to support maybe about 100 or so, maybe 1,000 on the concurrent universe. To make real-time analytics and reality for everyone in the enterprise, more people participate in that analytics platform. You can enrich the platform and deliver better services. You got to have more things. We figured out a way with our architecture to give that scalability. Those are the things that are promises that we’re not being able to deliver.
At StarRocks, we’re seeing it with some of the customers that we have. For instance, we have another eCommerce customer that is using it. It’s not an eCommerce platform. It is a social platform that millions of users use to instantly understand what’s going on and deliver new information that’s relevant to them. That’s an example of a use case that illustrates how we solve those problems.
Eric: That’s an interesting point as I think about it. If you look at who’s going to win and in particular markets, I won’t say it’s an outright Highlander market these days across the board where there can be only one winner. To a certain extent, we are in a monopolistic phase of our economy. If you look at Uber, for example, there’s Lyft, but Uber is the dominant force for search. It’s still Google. Google gets 70%, 80%, or maybe 90% of the searches.
I saw an article from the Atlantic that said, “Google dying.” I’m like, “Here we go with a media hyperbole.” Google is not dying. They were complaining about search results or something. It’s like, “We’ll go search on Yahoo, and then let me know what you get.” They’re not going to do that because they’re going to go to Google because everyone goes to Google. For their particular things, there are vendors that seem to be winning the space.Solving the problem at the heart by getting that real-time insight and fresh data quickly and efficiently across the organization is what people are looking for. Click To Tweet
We are all things real-time analytics. It’s not new, but it’s newly popular and it’s everywhere these days. Fadi, I’ll throw this one over to you even though I know there are lots of hiccups along the way, there are still some amazing things happening now with data and real-time data. I was hinting at this last segment. If you look at the companies that succeeded and are doing well in the world, almost without exception, they have some deep analytics play.
They have figured out how to leverage speed whether it’s in booking flights, whatever engine that they have, ad networks, and things of this nature that are data and speed intensive. Do you think ad networks are the center of gravity in terms of innovation around analytics? Give us some sense from your perspective on how StarRocks and similar technologies are changing the game and shepherding the pivot points in business.
Fadi: You see it right in front of us in the customers we’re engaged with and the use cases that we have. The problem we were trying to solve, which we’re delivering with our platform is to allow them to speed the time to value their services. At the end of the day, their business intelligence is driving our organization, what new services they need to deliver and what operations they need to run in order to deliver those services. Solving the problem at the heart by getting that real-time insight quickly on fresh data and easily across the organization is exactly what they’re looking for.
This is what’s starting to power to your point. Some of those organizations are adopting and using it. We’re seeing business analytics and business logic driving it at the heart of their organization instead of some edge projects inside the organization. This is adopted and maintained inside the organization with the knowledge of the entire position, the CIO.
Eric: Damian, I’ll bring you back in here. Sometimes when you stop caring about something, it’s not because it went away. It’s because it’s everywhere and accepted. We’re seeing an interesting pivot point in the industry where analytics is no longer just something for the dashboard. A lot of times, it’s being baked right into operational systems and that’s where you want it. You don’t want it just to go to someone’s dashboard so they can think about it and go, “Maybe I should change this or that.” You want these analytics baked into operational systems so that you’re changing the behavior of people on the front lines and how they operate. What do you think?
Damian: You don’t rely on people to make split-second decisions reliably when you have so much information that’s coming from everywhere. There are a lot of bread and butter decisions that have to be made, whether it’s turning the kettle off when it’s boiling. When you look at everything that’s going on, you need to have that automation continually running reliably. That’s the bread and butter analytics that’s powering the whole world that we live in.
It’s a connected world of information. It’s the antithesis of people carefully pondering and making that strategic next move. What we need are the brakes to be applied in the car when someone’s stepping in front of it. We need all of those safety systems and systems that keep things running smoothly. If you look under the hood of how the systems are being put together, that’s where the analytics stuff is being placed now.
Eric: I’ll bring Greg Michelson back in to talk about how things are changing and maybe focus on the people who are getting it, figuring it out, and making changes. One thing I loved about DataRobot was how easy it is to use. This stuff is moving towards becoming under-the-hood technology. We’re still may be some distance from that. When you have to stand up and had a new cluster to get started on loading massive amounts of data and analyzing it, that’s not going to work anymore, at least it’s not going to work for a lot of use cases.
You need a tool that’s going to help you immediately understand something. That was always one of the perks of the DataRobot approach. It was how you’d load a data set, and it would throw twelve common algorithms at it, come back, and give you. What it suggested was the one algorithm that had the greatest covariance or the greatest efficacy. We need more solutions that work and fewer solutions where it requires a whole lot of effort on the part of the team to build something. What do you think?
Greg: Certainly, DataRobot is a cutting edge. It’s an unbelievable, amazing piece of software that will massively accelerate the rate of building these types of models. Particularly, at the DataRobots time series phase, components are revolutionary and substantially differentiated from anything else out there on the market. One of the things that I learned from our sales team at DataRobot was the three whys.
When you’re working with an organization and they are potential customers of yours, you have to answer the three why’s. Why anything, why need a robot, and why now? If you get a good answer to those three questions, then you probably have the potential to have a good relationship there. The problem that I see is that most sellers skip the why anything questions. They assume, “Of course, you need real-time analytics or machine learning.” They go right to why DataRobot, SQLstream, or StarRocks? “It’s because our product is awesome and you need it right now.”
The most important of those three questions, in my opinion, it’s the why anything question, “Why do I need this technology?” If you have a good answer for that, then certainly the world is yours as far as vendors, products, and solutions out there. The hole in the market is getting to the place where you know what you need. It wouldn’t surprise me if some consulting companies that came along, some of your big consulting firms, or even some boutique firms made a market out of helping organizations answer that why anything question. I don’t think for many different technologies that there are good answers out there at all.
Eric: We have different teams. Fadi, I’ll throw this one over to you. I’m on the outside of all this, but I’ve talked to all the vendors, a lot of the end-users, the consultants, and so forth. It’s hard to get an exact feel for things. Your data warehousing team tends not to be your data science team, for example. Your governance team tends not to be the security team as an example. The lines are starting to blur maybe because there’s so much pressure from outside. I’m not entirely sure why.
I throw out this information strategy group as an idea. Greg talked about these innovation teams at large organizations, but you always need on the client-side to have connections and all the right people at the table, so you do make the right decisions. You don’t make these significant mistakes. I’ll throw it over to Fadi. Do you think we’re getting somewhere on that front? Do you think that the C-level executives and the boards are now figuring out, “This is real? We can’t play around. We can’t make mistakes in this environment. ”You don't rely on people to make split-second decisions reliably when you have so much information coming from everywhere. Click To Tweet
Fadi: If your company engages in providing an analytics platform, that explains why it matters in the first place. This is a good opportunity to engage the organizations not just multi-level, but across departments. You can bring the conversation around why do you need it for your business? Before talking about your product, what are the pain points that you’re trying to face?
Let that be the connect connective tissue across the organization. With the advent of making analytics easier to use, you’re able to have these conversations. You’re able to solve business. You can show how is it easy to use and how you can solve quickly information with data that can be directly tied to the business. That conversation is starting to happen.
Eric: You’re also getting a lot of folks who are spinning out of companies like Google, Facebook, and LinkedIn. They’re going out into organizations. One of the cooler things that happen is you’ll see a fairly new company with lots of clients right out of the gate. How does that happen? It happens because lof someone was working for a bigger company, maybe like an Amazon or something like that.
They have relationships. They’re able to quickly tell a story and get venture funding. That’s the upside of the amount of money that Greg was talking about. We joke, it’s not millions or tens of millions anymore in our business. Hundreds of millions of dollars are getting thrown at these companies. We talked about a Series G for DataRobot that was a $300 million series. That’s a long way down the cycle of raising money, but it goes to show you that this industry is real. It’s alive.
With the show bonus segment here. We’ve been talking about all things real-time analytics. I want you to give some career advice for our guests. I’ll throw it out to each of them for their thoughts on people looking for a new job or challenge, what might you want to know about yourself and about these jobs? Fadi, I’ll throw it over to you first. What’s some advice you can give to someone who read about this stuff and say, “I want to learn about that. That sounds great?” What are some hot career paths and some suggestions you would have for people to get rolling?
Fadi: I would start out by saying follow the data in the sense that whatever career move you’re making, look at careers that get you involved in either data analytics or AI that uses the data analytics, things that are core to driving the next generation economy. That’s at a market level. There’s an opportunity for a lot of folks, too, because it’s a new market that’s evolving so quickly to define a business purpose for your role.
Typically, those are roles that drive new products, be a champion, be a leader in defining what that next-generation product might look like that has a good fit with a need from customers. My advice is to get involved in roles you understand what the real need is, you see the opportunity, and you help the organization drive that in the market.
Eric: I’ll throw it over to Greg. You said LinkedIn has a similar mantra on their site when you go and comment on someone’s page. LinkedIn will tell you, “Lead the conversation. Start a new thread about this.” They’re trying to get you to build out their platform and content. LinkedIn certainly has been focused on content for a while now, which is probably a good strategy. If you look at Twitter versus Facebook versus LinkedIn, they do have their own persona and set of practices of acceptable topics to discuss.
You’re not supposed to talk politics on LinkedIn. Take that to Facebook or to Twitter, for example. I like that about LinkedIn. It’s focused. Greg, I’ll throw it over to you. That’s good advice from Fadi to figure out what you enjoy. I always talk about the Venn diagram of things that you’re good at, things that can make you money, and things that people need. That little sweet spot in the middle is where you should go. What are your thoughts or career advice?
Greg: I’ve thought about this a lot. Before I joined DataRobot, I changed jobs about every year for two years, jumping from one thing to another. I realized that my problem was wherever you go, there you are. There are a lot of people that can relate to that. Certainly, I’ve talked with many. They don’t really know exactly what they want to do. They are younger. They’re trying to find what that path is. What I always say is there’s no one right answer here. You can make lots of different choices. It’s a question of contentment. It’s not so much about finding that one perfect thing where the Venn diagram all meets up. It’s more about figuring out what are your values.
Some people want to make a lot of money. That’s a path, and you’re going to have to sacrifice some other things. It’s like in engineering, there are three things cheap, fast, and good. You only get to pick two of them. It’s the same with careers. You can either have A, B or C. If you want A, then maybe you’re not going to have as much to B and C in your life. Broadly speaking, my advice is to figure out what are those things you want.
After that, I wouldn’t rely so much on higher education. I earned a Ph.D. in Applied Statistics from the University of Alabama. It’s a great school. I learned a lot while I was there. I think that certainly in the data, computation, analytics, and AI space, most universities are behind where the actual market and where the vendors are.
The right approach, if that’s the path that you want to take is to just go out and solve some problems. There’s certainly a vibrant open source community. Anything that you want to learn about machine learning or data analysis is freely available on the internet. The only people that are going to be successful at that are the people who want to do it. It’s hard to go out and learn yourself.
Eric: Damian, what’s your advice to upwardly, mobile people?
Damian: Depending on where you are in your career, if you look at the beginning, you want to change and get into the data space, one thing that is different is that everything is so easy. The technology and the products are easy to get access. There’s so much open source. There are so many trials before you buy and software as a service type systems that you can make yourself different from 95% of the candidates in any area while they get taking the time to go hands-on, try to do a few things, and think for yourself about whether this product is good or does it say it’s good? “Is it easy to use? Can I get a result?”
You go to an interview and you can talk about, “I tried this product versus that product. This is what I found.” You’re going to be different from almost everyone that they’re talking to immediately. Bring business challenges that you’re looking to solve with that. You’re going to be a rarefied company and you’ll be in a good position to get a job.
Eric: There are so many products you can download and try for free. I was looking at StarRocks. Download it. Try it for free. If you do that, if you are leading in this direction, download a few different technologies, and play around with them. Note the differences. This was easy. This was not as easy. The result was better over here. If you have that detail walking into an interview, they’re probably going to break out the pen right there and hire you. This has been a great show. Look these guys up online, Damian Black of SQLstream team, Fadi Azhari of StarRocks, and Greg Michaelson of K&M Foods. We’ll talk to you later.
AI technology analytic databases MPP architecture real time analytics real time data vectorization