Faster Than A Speeding Bullet? Probably! With Li Kang And Kevin Mergruen

Let's talk Data!

Faster Than A Speeding Bullet? Probably! With Li Kang And Kevin Mergruen

October 18, 2022 Transcriptions 0

 

How fast is fast enough? Every hour, every minute, every second? Data has to move fast. Especially with the technology today, you have to be up to date on everything. The market is changing rapidly, so you need to get your analytics in real-time. Once you get to sub-second demands for analytics, that’s a different class of solution, especially if you have hundreds or even thousands of analysts querying that data!

 

Join Eric Kavanagh as he talks to the SVP of Sales at GigaSpaces Technologies, Kevin Mergruen, and VP of Strategy at CelerData, Li Kang. Learn how In-Memory Data Grids work and find out how you can have real-time analytics. Remember, speed, consistency, and availability are mandatory regardless of what business you’re in.

Transcript

[00:00:54] Eric: We are going to talk about a pretty hot topic, a fast-moving topic. Is it faster than a speeding bullet? Probably. If you talk about how fast data moves around, it does move faster than speeding bullets. It moves at the speed of light, typically, but has to go over networks sometimes and has to make its journey across different systems through APIs and so forth. There are impediments. Some of them are very important impediments but we are talking about fast data and what it means, and why you want it.

We will be hearing from Kevin Mergruen of a company called GigaSpaces and Li Kang from CelerData, also working on the StarRocks Project. I want to talk real quick about fast data and why you want fast data. There are lots of different ways you can do this. In the old days, we used to buy bigger computers and big machines. You would scale up and get a faster machine. We then had this whole scale-out world, which we are still in. Cloud computing has been built from these scale-out data centers. The hyperscalers call them hypervisors.

You’ve got Microsoft, Google, etc. They provide you with what you need but sometimes want data fast. If you have lots of apps that are relying on this data, you don’t want your data center to be the bottleneck. If you have needs for thousands of analysts pinging certain data sets, some of the newer solutions are probably going to fall over if you try to do that. No one wants latency anymore. You want your stuff right away. You want it now.

The internet has changed expectations in that sense. I’ve heard interesting commentary that Facebook, for example, figured out the importance of speed and interactivity. Giving you what you want right when you want or at least seeming to do so. That keeps you around. If you have to wait for stuff, who wants that? A little spinning wheel of death on a Macintosh. I know what that is. I see it and I’m like, “I did it again. I tried to grab, copy and paste some vast amount of data into Microsoft Word.”

Don’t do that. Don’t try that. Paste and match the style or paste without formatting is your key. There are lots of different ways you can solve this. We are going to hear from Li Kang from CelerData and Kevin Mergruen from GigaSpaces. We will start with Kevin. Tell us a bit about yourself and what GigaSpaces does, and how you are delivering super-fast data.

[00:03:08] Kevin: Number one, Eric, thanks for inviting us to be on the show. It’s very much appreciated. You guys have a great show. I always enjoyed reading your past episodes. I’m the SVP for Sales and Operations with GigaSpaces out of our US location in New York City. GigaSpaces is based out of Israel. We are out of the Silicon Valley of Israel, which is Herzliya. It’s an amazing technology area of Israel.

We help organizations address the challenges of going digital with their business. If we look at what’s happened over the years of the pandemic, every organization out there is how to go digital with their business and to provide better services to their constituents, customers, consumers, and whatnot. For the most part, most organizations, when they went down this path, found themselves using, typically, point-to-point integration.

DMR Li Kang | Fast Data

Fast Data: Companies are using point-to-point integration when they go digital. That’s too slow. Most digital applications these days require integration into multiple backend data sources.

 

The majority of digital applications require integration into multiple backend data sources. When you start going down that path, what most organizations have found is that this is only not delivering the speed, scale, and availability to meet the requirements of the market, which is changing so fast. Gartner came out with one of their innovation insights reports and cited the fact that organizations that were going down an API-only integration model were a prohibitive approach in terms of delivering speed availability to meet their market needs.

They cited a new direction that they saw happening, this new architecture called the Digital Integration Hub. Essentially, move away from a coupled request-based architecture of what people were doing to an event-based decoupled architecture so that people can now deliver data faster to match the digital speed required and expected by consumers.

We at GigaSpaces are one of the leading providers of what Gartner called an out-of-the-box digital integration hub to meet the need for speed, flexibility, and agility of how organizations can now deliver new digital apps faster to meet market demand than ever before. What we provide is a digital operational platform to meet the requirement and consistency of that organization’s needs. It has been phenomenal to see the experience of our clients because the end-user, the consumer, expects to see a very agile and fast response to their application.

If you are using a mobile app for your banking application, for example, people want to have access to all of their data, whether it’s account data, loan data or whatever the history may be rapidly and without any hiccups. If your mobile app isn’t working, if your provider is not doing the right job, that customer will find someone else to deliver it. Speed, consistency, and availability are always mandatory, regardless of what business you are in.

Speed, consistency, and availability are mandatory regardless of what business you're in. Click To Tweet

[00:05:55] Eric: How do you deploy? You are an in-memory solution, if I understand it correctly. What you are doing is you are expediting the delivery of data from various source systems to an application and, I presume, facilitating the right back coming from the app back to the data source somewhere. How long does that take? What does it look like?

[00:06:18] Kevin: The overall challenge is that I have multiple systems of record. I need to bring that into a format that can be very agile and available to be able to push it out to the digital apps. The challenge is how do I provide the means to ingest data, number one, and be able to bring the data into a platform, a high-performance data store that enables me to first bring in the relevant data for maybe the last 36 or 90 days’ worth of history based upon the requirement of the application.

What we do is provide a complete architecture, which includes all of the adapters, the ability to do the ETL, the batch load, and the change data capture requirements to push the data into our platform to our in-memory data grid, which is a distributed in-memory data grid. When we bring it into our grid, one of the challenges is that if you want to be able to deliver consistent performances, how do you use RAM versus SSD to have the best balance?

We have this concept of tiered storage, which enables us to put your highest priority, most relevant data into the RAM environment, the hot storage, and the next level, the secondary level of priority or relevance, into SSD and then serve it up through a unified data model into our digital layer. The digitalization layer enables our clients to build out their applications, whether it’s in Java, SQL, dot-NET or REST, for example.

DMR Li Kang | Fast Data

Fast Data: The concept of tier storage enables you to put your highest-priority data into the RAM and the next-priority data into the SSD. Then you can serve it up through a unified data model into your digital layer.

 

They can now align and deploy new microservices based on their requirements, and being that it’s all within this unified data model, they have the microservices closely aligned to where the data is as opposed to having microservices at the system of the record level. It’s a much more efficient model, and when they deploy it, they can deploy it initially on-prem, and hybrid in the cloud, as well as multi-cloud. The system also enables real-time replication of data.

We have one of our clients, which is one of the largest French banks, and one of their challenges was, “How do we deliver a customer 360 model so that we can see all transactions and all trades that are affecting a client and be able to see trades worldwide after they have been done.” In their instance, they have their main data center in Paris. Using our clustered model and our replication capabilities, they can, as close to near real-time as possible, automatically have their transactions replicated to their Hong Kong, London, and New York City data centers as close to real-time as possible.

That’s enabling them to deliver a 5-9 experience in terms of availability and speed to all of their traders so they can do the best job for their clients. In that case, speed is the real key aspect, having a sense and best design to flow data. The whole thing is the fluidity of how you move data from your backing systems of record into high-performance data storage and deliver them in an event-driven approach to the front-end applications. That’s what this whole system has been designed to do, specifically for operational, transactional requirements for digital business.

[00:09:20] Eric: That’s very cool stuff, and there are a bunch of companies in this space. There’s a company memSQL that has now renamed itself the SingleStore. They were focused on that memory-based approach that you are talking about. There’s this little company called Snowflake that came along, and at their big conference in Las Vegas, they announced Unistore.

They were a data warehouse in the cloud. There’s still a data warehouse in the cloud, but now, they are trying to focus on some of these transactional systems as well. I bring this up because you are talking about getting the functionality that’s in these containers, the microservices, right up against the data so that they can get the job done right away. That’s the ultimate environment. You’ve got all this activity happening, and you don’t have to start reaching across different systems and doing all these API calls to bring it together. How do you differentiate yourself from those other vendors? What’s your secret sauce, if you will?

[00:10:13] Kevin: Number one is that this is being built from the ground up as a fully integrated environment. It’s not a do-it-yourself environment where the user has to patchwork everything together. That’s the first thing. Most of these other systems out there is standalone environment. A standalone and memory data grid, they have not completed the Southbound and the Northbound aspects of things and have a complete configurable systems administration environment to manage, configure, and monitor performance in a profitable fashion. It’s an all-in-one environment.

Number 1) It is going to help you to expedite the process of building, deploying, and maintaining something faster than you would otherwise be able to do. Number 2) Is the fact that this has a distributed in-memory data grid that has been built from the ground up to support high production use requirements in terms of concurrency, throughput, and ingestion. Those are very key aspects.

The other key area that we are seeing is the challenge of once you’ve built your 1st digital app, now you have to build your 2nd, 3rd, and 4th. How do you do that where you can leverage the environment you’ve built and already have a high-performance data store with key data in there? Now, you want to enhance that data. For the majority of other systems in the marketplace, you have to bring that system down to add data tables and add a new database completely into your digital application.

You need to bring it down if you are adding a new digital application to your platform. Where the GigSpaces is a smart, digital integration hub, you are not required to do that. One of the system’s key benefits is high availability, where you don’t need to bring it down to enhance what you are doing. Again, in terms of new tables or new data sources, or for that matter, complete new digital applications, we built on top of the existing data. That helps you be much more resilient and responsive to the marketplace without having the hiccups that many times people see in the marketplace now.

[00:12:08] Eric: That’s interesting. You are providing a future-proofing architecture in a sense, right?

[00:12:13] Kevin: Correct.

[00:12:14] Eric: Our TV show is called Future Proof. Every time I hear one of those, I’m like, “Here’s an example of future-proofing. It is very clever.” Observability is a huge topic these days. There are all these new vendors that are doing observability. They are mostly paying attention to data feeds. What’s changing or what’s new? How is something different? How is that going to affect downstream data pipelines or whatever?

How do you identify cold data and then automatically move it to lower storage levels? How is that done? A big part of what you are talking about here is the ability to dynamically assess the hotness of data and thus, where it should go. It’s not technically garbage collection but it is moving things around. How does that work?

[00:13:00] Kevin: First off, it is that you establish that in your business rules. We always have to look at how apps are designed but building a new digital app is a top-down approach versus a bottom up. One of the differences is that business owners are the ones who are driving the need for the digital application itself. You are designing from a top-down perspective. I’m looking at what the requirements of the app for the consumers and for the business model that I’m trying to build out. I’m then looking at how I need to source that.

Part of the definition of that is, what data do I need to have in hot storage at all times in the CDC stream? You are defining the business model, the methodology, and the rules to make sure that you are bringing the data in that you need to do. You are using the monitoring and the performance aspects of who’s hitting what area to continue the updating of that and tuning it but it needs to be done in a process-oriented manner.

Define the business model, methodology, and rules to ensure you're bringing in the data you need. Click To Tweet

That’s where you establish the rules so that you can do that properly. Now, in terms of how you scale up and scale down the system has elastic capabilities so that you can increase and decrease as need be but you need to establish the business rules of what should be in the hot data because that’s the most logical approach to getting it done.

[00:14:15] Eric: It’s a good point and segue for our next segment. I will bring Li Kang in from CelerData for our next segment but as a nice segue to that, app design. I had a great quote from none other than Michael Stonebraker. He made this great comment. He goes, “The fact is that 80% of code in the world now should just be thrown away.”

His point was that it was designed for old architectures and systems. This design point is quite important because when things change upstream in terms of how you can get data to your app, things should change in the app and how that app is designed. Do you see that as a big driver for businesses to recognize, “We have to reinvent things to leverage this fast data that we have now?”

[00:15:04] Kevin: It’s fascinating because one of the major benefits of the smart TIH in terms of being an event-driven architecture and having your data in an always-on environment is the fact that now companies can spend 50% to 75% less time on the developer’s time on the data integration side. They can spend more of the gray matter of their developer’s time on the UX and the business rules associated with digital services.

The fact is that you better be agile. You better be able to develop new things very quickly because you have to create them at digital speed. The market is changing rapidly, and your competitors are changing so rapidly in the respective industries that if you are not able to modify and drive out new microservices very rapidly to affect change, you are going to be stuck looking at your competitors taking your market share.

What’s going to be one of the major things is the ability to adapt quickly and build out new code based on the requirements of the marketplace. We don’t know, God forbid, when the next pandemic is going to come or what’s going to happen dynamically. Your ability to react at digital speed is the most important thing, and focus your development resources in the proper manner.

[00:16:18] Eric: We are talking to Kevin Mergruen, and up next is Li Kang from CelerData. Kevin is from GigaSpaces. We are talking about all things fast data in this episode. Is it faster than a speeding bullet? Probably. There is data that is faster than bullets. Bullets go pretty fast. Data can go pretty fast as well. Next up, we have Li Kang from CelerData. They are doing some very cool things. Li knows a lot about production, getting the software up and running, and this whole open-source movement. Li, tell us a bit about yourself and CelerData. What are you working on to deliver fast data?

[00:19:46] Li: I’m in charge of go-to-market and product strategy at CelerData. CelerData is the creator of the StarRocks Project. StarRocks is a new analytical database. We are serving customers with a large amount of data and concurrent users and need to get insight fast and quickly from a large amount of data. Our customers include companies like Airbnb, Alibaba Cloud, Lenovo, and Trip.com. These are some of the large enterprises.

Talking about fast, our mission is to make it fast to real-time analytics for our customers. Real-time and analytics have not been very successful working together. Kevin talked about streaming and processing this day’s modern applications in terms of transaction processing, and it can be with streaming data. It’s becoming faster and faster. When it comes to the next phase, which is analytics, it has always been very difficult and has not been successful for enterprises. Our mission is to make it easy for companies to analyze data in real-time and to power this day’s modern applications. That’s our mission. That’s our goal.

Real-time and analytics are known for working together. Click To Tweet

[00:21:06] Eric: We talked about this. There are lots of ways over the years that we have tried to expedite the delivery of data and the generation of analytics or insights but if you look around the world now, especially in light of the pandemic, which forced many businesses to go virtual, to go into the cloud to reinvent their business processes, analytics is now crucial. You could argue that we are not only in the information economy but the insight economy.

It’s where you have to learn things about your customers, prospects, network, the marketplace or whatever it is. You need to set up these analytic pipelines to feed you the insights that you need to make decisions. If you don’t, then you are going to be doing things the old-fashioned way, which is gut instinct or whatever the case may be but that dynamic is changing that seems to me very quickly, and that’s why you see a StarRocks Project.

That’s why you see lots of different companies jumping on this bandwagon of new paradigms for developing an app. You think about how much things have changed. Even the last few years it’s quite remarkable. The cloud is a big part of that. Edge is a big part of that but fueling all that is going to be analytics and your vision was, “We are not going to be able to deliver that speed and that concurrency of analytics with these old architectures and databases.” Is that about right?

[00:22:29] Li: Yes. Absolutely, that is a great point. What does speed or fast mean for the analytics database? We talk about query speed and query latency all the time from a database standpoint. With a large amount of data, you want to run complex analytical queries. You want to get the result back instantaneously. We talk about sub-seconds and latency. That’s all great, and people are working towards that goal with all the advancements in hardware and software technologies but we tend to ignore or forget another aspect of fast is what I call data latency. How fast can I get this insight from the moment the latest transaction or latest event happens? We used to wait a day or a week to get this latest insight reflected in my analytics or applications but that’s not the case anymore.

As you said, the business environment is changing. We are moving to a virtual environment. We are doing more online businesses, and with all this information, with all these events, we need to see this latest event or latest transaction being reflected in my analytical applications instantaneously. I can’t wait for 2 hours, 4 hours or 24 hours. That’s not relevant to me anymore. Things like online eCommerce and recommendations.

The moment you purchase or browse some products, then you may get a promotion instantaneously. Things like a real-time social network. The moment you forward or share a post or like a post, that might affect another merchant’s promotion or advertisement strategy. That kind of application is becoming more and more relevant or it has become a must-have capability in these modern applications.

From the analytical standpoint, not only do we need to have a fast query speed but also, we need to have a fast or reduced data latency. Get the latest fresh data delivered. What does it mean in terms of application or analytical applications? As you said, when we talk about analytics, we always think about the long-term strategic type of applications or workload. It’s for the executive sitting in the boardroom looking at the next quarter, next year or maybe the next three years.

People will tell you, “Real-time is not that critical. I don’t need last-minute data. I don’t need the latest transaction.” That was true before but in this world nowadays, not only do we need to support strategic decision-making but we also need to support operational business operations like those applications I mentioned. That means this type of analytical capability cannot be standalone functionality or capability anymore. Before, I could have an analytical engine to support dashboards or reporting applications. That’s all I need, but now, they need to be able to support web and mobile applications to support your eCommerce workload or support fraud detection workload. They all happen in real-time.

DMR Li Kang | Fast Data

Fast Data: Analytic capabilities cannot be standalone. They need to be able to support web and mobile applications, e-commerce, and much more. Analytics capabilities need to be embedded in the application.

 

In that sense, analytics capabilities need to be embedded in the application in addition to being a standalone capability. That’s one aspect. Also, it used to be able to support a group of internal, almost elite users. Support the executives and the business analysts. Those are a very selective elite group inside the company but now, with modern applications, I need to support every frontline sales guy or marketing person making a data-driven decision.

Not only that. I need to support my advertisement partners and my supply chain partners, and one step even further is that I need the support of my consumers. Customers need to make a decision about a product and services. They need to query the transactional data sets that you accumulate over the years. That user base is different now. It’s not only internal users. We are talking about external users.

From the time standpoint, before we are talking about batch-based daily, weekly, and quarterly. With these modern applications, we need to be able to provide analytical capabilities in real-time. A batch-based pipeline is not sufficient. These are some new developments in modern applications, and they demand a new set of capabilities from analytical databases. That’s what StarRocks and CelerData are trying to do.

[00:27:18] Eric: You brought up a lot of good points there. One is that we are not only talking about small incremental changes in the marketplace. In chemistry, this concept of a step change, where it’s a very significant change from one step to the next. It’s remarkably different. We are talking about orders of magnitude here in terms of pressure, whether that be the pressure of concurrent users, pressure in terms of large data sets being queried, and things of this nature.

You recognize that you are not only going to be able to tackle this downstream in some app with an API call. You have to reinvent the engine that is generating and doing these queries. There’s another big side of this equation I will throw over to you as well for the rest of this segment, which is that whole discovery side. You think about the difference between a dashboard and looking at what our sales are quarter by quarter or year-over-year. That’s useful for a director to go, “We need to get some more marketing or do something different to get the pipeline filled again.”

That’s a very different use case than operational analytics, where you are trying to manage a fleet of trucks that are going into Florida to get the power back up and things of that nature. These are very different environments. One of the real keys to success is the ability of the analyst or the business user to throw different ideas at the data very quickly and get back views in a very rapid-fire fashion so you can make these operational decisions that are mission critical. Again, it’s a very powerful use case that is anywhere. You are going to need that analytical capacity, that speed, and that diversity of source systems. This is the market that you are tackling. Talk real quick about the discovery side, about being able to come up with new ideas, and throw them at the data.

[00:29:09] Li: You have a large amount of data. A lot of those times, you need to be able to explore the data. Ad hoc queries and trying out different ideas and doing different A/B tests or even a different scenario analysis. That requires a query engine that is able to support this type of query without a complicated backend data pipeline. You move the data wrong and try to aggregate data in different ways.

This query engine or database must be flexible enough to support this ad hoc query workload. StarRocks is trying to help our customers by providing these querying capabilities enabling a large number of business users to run ad hoc type of queries against your full data set. You don’t have to say, “I need to download a small amount of data to my desktop so I can run my ad hoc query.” I have to build my own little database or even export it to my spreadsheet. That’s not going to work in this day’s business world.

Again, this type of workload and the real-time analytical workload are quite different in nature. From the technology standpoint from the architecture standpoint, do you want to have 2 separate pipelines and 2 separate analytics engines to support this type of workload or if you can have the option to have one engine to support both types of workloads? That is critical for this day’s business, and that’s also why StarRocks is trying to help our customers to provide a unified architecture to support both types of workloads.

[00:30:48] Eric: That’s interesting, and this is also a trend we are seeing in the database. Multimodal is one term that’s used to describe these databases but is it different modes? Are they different environments or it’s the same exact foundation that is catering to both these different use cases?

[00:31:05] Li: If you look at why we have two different engines, the main reason is that real-time analytics has been very difficult because the query engine cannot handle a typical analytics data set very well, especially in the analytical database when data is usually organized in the star schema where you have to join multiple tables.

DMR Li Kang | Fast Data

Fast Data: Real-time analytics is very difficult to do because the query engine cannot handle analytics data sets very well. When data is organized in a star schema, most query engines will suffer.

 

That’s where most query engines will suffer, and they cannot deliver this insight fast enough when they query against a star schema. That’s when people start doing different things like combining them into one table and trying to break the star schema, and that way, we can get better query performance. Fundamentally, we developed a new query engine to address that issue to provide the best query performance.

[00:31:58] Eric: There are different ways you could do workarounds. Those happen all the time but workarounds don’t last too long. Band-Aids get ruined quickly. We are talking about query engines, and you were talking about your query engine. Tell us a bit about that. Query engines and databases take a long time to build and vet. I heard a great quote from my buddy, Mark Mattson, years ago that said, “It takes a decade before you find the edge cases that will crash a new database.” That’s the old days. It’s probably five years these days but it does take time to see what happens when you get overloaded with either different types of data or fast data. Tell us a bit about your query engine, Li.

[00:34:42] Li: If you look at this day’s databases trying to solve these real-time query problems. People talk about different approaches like building indexes in real-time or building a materialized view to speed out queries. We believe we need to solve this problem fundamentally to solve the root cause. Indexes can be helpful. A materialized view can be helpful but the root cause is the query engine. It’s not about any revolutionary or groundbreaking new theory or anything. We took the latest development in hardware or software, and we made sure. We pay attention to the engineering details and develop a new generation of query engines.

Technologies like cost-based optimizers which has been around in the transactional database world for many years but most of the analytical database vendors don’t use a cost-based optimizer because it’s too difficult and hard. They tend to use a different optimization strategy. We decided to build this into our product.

Another aspect is vectorization. Vectorization as a programming trend or method has been around for several years. Some analytical databases try to use this technology. They did it here and there. We said, “Let’s take a step back. We look at it as a query engine and fully apply vectorized processing from end-to-end, so there are no data conversions during that whole process, all these query processing steps. Data has always process in a vectorized fashion, and that will dramatically improve query performance.

Other things like materialized view. We build that in our query process or processes as well. Other vendors have a similar materialized view but they make it more intelligent and smarter. When data comes in, you have updates, deleted, and that operation. Can you make sure your materialized view stays in sync so you don’t have to spend time rebuilding materialized view?

All these things a lot of engineering effort went into this development process. As you said, we are continuously identifying those corner cases to refine it but even our customers, some of them, are very seasoned developers in the academia world. They looked at our source code, gave us seller feedback, and are very impressed with the quality of our product. That is the main reason we have been so successful in the last couple of years and have helped many large enterprises and even a lot of small startups as well to build analytical capabilities in their product or their services.

[00:37:25] Eric: As I’m thinking about future-proofing architectures, I will throw this one over to you, Kevin, and then maybe back to you, Li, for commentary. I’m guessing, Kevin, that because you have built this model, this architecture is designed to expedite the delivery of data regardless of the source system. Meaning where it is coming from or targets where it’s all going for whatever the app is, which means you can absorb new technological advances like new chips, faster SD RAM, and things of this nature. You are able to bring that into your architecture such that there’s no disruption to the client, and in fact, things start working faster for the client. Is that about right?

[00:38:03] Kevin: That is very true. When we look at the way that things are being containerized and the way that people are delivering distributed environments where you contribute across a cluster of nodes to even improve the speed of your processing in terms of throughput, concurrency, and overall data size, you are able to get parallel performance that you’ve not seen in the past from that standpoint.

We also are seeing clients that have a need for hybrid models where you can have your data local, and your apps on the cloud or you can have things distributed in whatever fashion makes the most sense from an optimization standpoint but that you can take advantage of the architectures across many different types of approaches. What we are seeing working with partners like HPE, for example, where the performance that you can get both on-prem as well as in the cloud is remarkable of what we are able to accomplish with the latest chipsets, so absolutely.

[00:38:55] Eric: That is a fascinating part of this whole equation. Li, I will throw this over to you. My business partner, Dr. Robin Bloor, our economist partner for The Bloor Group, had a slide probably a few years ago. It was a basic pyramid, and he talked about how all innovation starts at the hardware level, and then it moves up to the OS level and then the app level, and it keeps going up. Down at that hardware level, when things change significantly, that changes everything up the stack.

Being able to stay on top of what’s coming down the pike, the new processors that are out, the Power10 chips, for example, that IBM talks about. The ARM chips, for example, you guys, I’m sure, are paying pretty close attention to that and getting the chipset instructions and understanding how to optimize your code. Is that about right, Li?

[00:39:44] Li: Yes, absolutely right. Part of the query engine optimization we’ve done is to leverage the vectorized processing. When we talk to customers, we already have customers who are inquiring about the latest chip development of different vendor chipsets. We are even talking to some chip startup manufacturers or developers to make sure we understand the latest trend. Our engineering team and one of the chip startups are sharing ideas about how can we leverage the latest development in the CPU world to further improve the current performance. You brought up a great point, and that’s exactly what we are doing.

I would also want to point out being future-proof. Another aspect that we are doing is that we often source our software. All the source code is on GitHub, and we are not only publishing part of the code on GitHub and saying, “If you need the best performance, you also need to pay for that.” We don’t believe that. All these are performance improvements will make them available on GitHub.

The idea is that we let people study our source code and criticize it. When they see places where people can make enhancements or improvements, people can contribute to it. We have customers already contributed to that or given us new ideas, including leveraging the latest development in hardware or how we can make our product better. That is another aspect of making our product future-proof.

Open source your software. Let people study and criticize it, so they can help improve and contribute to it. Click To Tweet

[00:41:13] Eric: That’s a great way to end the show talking about open source. I will throw it over to Kevin to comment on open source as a mindset, discipline, and practice that has fundamentally changed enterprise software in the last few years. What do you think?

[00:41:27] Kevin: Open source absolutely has made a big difference. It’s a combination of open source, and how do you integrate that and coexist with existing legacy, and how do you help make organizations more agile with what they deliver? Open source is here to stay, and the whole aspect of having a community to participate in helping to enhance this new platform is a great thing. We’ve incorporated a lot of open source into our platform as well. Again, it’s about delivering at digital speed is what people are looking for to make them more agile and have the right cost-performance balance of what they are trying to achieve.

[00:42:02] Eric: We have talked to two experts here, Kevin Mergruen from GigaSpaces and Li Kang from CelerData. Look these folks up online. I’m sure they are both on LinkedIn. Send me an email if you want to be in the show at Info@DMRadio.biz. We want to know what you want to know. Tell us what the innovations are and what you hear that you like and don’t like. We will talk to you next.

 

Important Links