The Analytics Edge

The Impact of Generative AI on Business with Amit Prakash, Co-Founder and CTO at ThoughtSpot

Episode Summary

This episode features an interview with Amit Prakash, Co-Founder and CTO at ThoughtSpot, a market-leading business intelligence platform that helps anyone explore, analyze, and share real time business analytics and data easily with AI powered analytics. In this episode, Amit talks about his journey in helping create ThoughtSpot, how data leaders should specifically be thinking about large language models, and the impact generative AI will have on business.

Episode Notes

This episode features an interview with Amit Prakash, Co-Founder and CTO at ThoughtSpot, a market leading business intelligence platform that helps anyone explore, analyze, and share real time business analytics and data easily with AI powered analytics.

Amit has deep experience in building large-scale analytics systems. Prior to ThoughtSpot, Amit led multiple analytics engineering teams in the Google AdSense businesses, contributing $50M+ quarter-on-quarter growth to the business by improving analytical algorithms for AdSense. Previously, Amit was a founding engineer in the Bing team at Microsoft, where he implemented the pagerank algorithms for search from scratch. Amit received his PhD in Computer Engineering from the University of Texas at Austin and a Bachelor of Technology in Electrical Engineering from the Indian Institute of Technology, Kanpur.

In this episode, Amit talks about his journey in helping create ThoughtSpot, how data leaders should specifically be thinking about large language models, and the impact generative AI will have on business.

-----------

Key Quotes

“The trick is to capture the mental model of the end user so that you know what they're already anticipating. If they're already anticipating an increase during Christmas season, you can see that from the previous trend. If they're already anticipating the amount of revenue they produce from a particular state to be proportional to the number of stores in that state, can you capture that and then that way you can de-noise these insights and meaningful insights surface a lot more. And that requires a two-way conversation between the algorithm and the end user. And these are the kinds of things that you can do with LLM that wasn't possible before.” - Amit Prakash

-----------

Episode Timestamps

(02:29) How data leaders should be thinking about LLMs and generative AI

(06:46) Amit’s career journey and helping create ThoughtSpot

(15:06) Considerations around building a trustworthy AI system

(18:39) Unpacking large language models

(29:57) Institutional knowledge for good data reasoning and intelligence

(34:45) Modern data stack and data warehousing

(37:25) Moving towards generative AI and the impact it will have on business

(41:35) ThoughtSpot Generative AI Meetup

(43:57) Hosts’ after-thoughts

-----------

Links

Amit Prakash’s LinkedIn

ThoughtSpot Website

ThoughtSpot Generative AI Meetup

Thomas Dong’s LinkedIn

Vijay Ganesan’s LinkedIn

NetSpring Website

Episode Transcription

Narrator: Hello and welcome to The Analytics Edge, sponsored by NetSpring. This episode features an interview with Amit Prakash, Co-Founder and CTO at ThoughtSpot, a market leading business and intelligence platform that helps anyone explore, analyze, and share real-time business analytics and data easily with AI powered analytics.

Prior to ThoughtSpot, Amit led multiple analytics engineering teams in the Google AdSense business, contributing 50 million plus quarter on quarter growth to the business. Previously, he was a founding engineer on the Bing team at Microsoft where he implemented the page rank algorithms for search from scratch. Amit received his PhD in computer engineering from the University of Texas at Austin.

Please enjoy this interview between Amit Prakash, Co-Founder and CTO at ThoughtSpot, and your hosts Thomas Dong, VP of Marketing at NetSpring, and Vijay Ganesan, Co-Founder and CEO at NetSpring.

[00:01:18] Tom Dong: The Analytics Edge is a podcast about real world stories of innovation. We're here to explore how data-driven insights can help you make better business decisions. I'm your host Thomas Dong, VP of Marketing at NetSpring, and for today's episode, my co-host is Vijay Ganesan, Co-Founder and CEO at NetSpring. Thank you for joining me today, Vijay.

[00:01:37] Vijay Ganesan: Thanks, Tom. Looking forward to this.

[00:01:39] Tom Dong: All right, Vijay, you know our guest very well. Back in 2012, you both co-founded ThoughtSpot with a few others. Amit Prakash is now, of course, the CTO at ThoughtSpot. And ThoughtSpot is a market leading business intelligence platform that helps anyone explore, analyze, and share real time business analytics and data easily with AI powered analytics. Amit, thank you so much for being with us today.

[00:02:04] Amit Prakash: Thank you. It's a pleasure,

[00:02:05] Vijay Ganesan: Amit. Welcome. Been looking forward to having this conversation with you. I've seen you in action at ThoughtSpot, the brain behind ThoughtSpot search and AI Technologies.

[00:02:16] Tom Dong: All right, and today's topic is all the buzz, large language models or LLMs. Generative AI has really become a mainstream topic. Even in the media, New York Times, Wall Street Journal, they're all writing about it. Its impact could, of course, be of the same scale as the internet and mobile. It's been proven to outperform humans in many tasks, sometimes by orders of magnitude. But before we dive any deeper into today's topic, Amit, could you tell us how data leaders should be thinking about LLM generative AI more broadly.

[00:02:51] Amit Prakash: It's a really powerful tool, and it's gonna find usage in every aspect of life and every aspect of work, including data and in general, the trend is gonna be where, you know, you're doing repeated work that should be eliminated.

If you're not really adding anything new intellectually, so particularly in data domain, what happens is that on one end, the business users are so hungry for data and they can't get it because there's a lot of manual work involved in getting a particular view of data. And on the other hand, people who are working on generating data for people, making business decisions.

Are always in the loop, sort of doing the slice of data, that slice of data, which doesn't add a lot of value to their career in, in general a lot of value to the company, but they're stuck in that loop. So I think a lot of that is gonna be automated and data organization should think of more how to teach everyone to fish as opposed to feeding fish to them.

So I, I think data organizations, instead of moving from player to player coach roles is how I think about it. Okay,

[00:04:04] Tom Dong: so different ways to fish. L l M is one way to fish. Generative AI is maybe a slightly different way to fish. What are the specific definitions, like how should I distinguish between L L M and generative AI and maybe even AI and machine learning more broadly?

[00:04:19] Amit Prakash: So broadly, generative AI could be like a dolly generating images for you, or imagine generating video for you, or a G P T generating. Body of text for you. Large language models is kind of a narrow field inside generative ai where language modeling basically means given a body of text, you wanna predict the next word.

And the large part of large model is basically we are doing this with help of really, really large neural networks, in particular transform networks. That's

[00:04:51] Tom Dong: very, very helpful and obviously seeing many of the product announcements coming out of ThoughtSpot, um, where you're starting to embed some G P T capabilities into your SAGE offering.

So maybe let's take a moment and you can just walk us through your career journey, uh, in analytics, where you started and how you ended up at ThoughtSpot, helping found that company. Yeah,

[00:05:12] Amit Prakash: sure. So right after finishing my PhD, I joined a small group at Microsoft, maybe like five or so engineers, and that team grew a little and went on to build the very first web scale search engine that eventually became Bing.

And so over there I was responsible for piece of search engine that's called. Static ranking where before the query arrives, all the different signals that you can put together from the web graph and usage and other things to be able to rank webpages. And then after that, I spent about five years leading a team at Google where my team was responsible for both building the infrastructure for training large scale machine learning models, as well as training those models to be able to predict.

The likelihood of somebody clicking on an ad in a given context. So, as you know, most of Google's revenue comes from that ad. So what we were doing was critically linked to Google's revenue, and every quarter we were on the hook for improving models in a way that increases revenue by a couple of percent.

So as a result, we had pretty much infinite resources at our disposal, and we were training machine learning models that were two hours of magnitude larger than anything else. Happening in the world. So that was a lot of fun. We added a lot of revenue for Google through that team, and then I started ThoughtSpot where the key idea was that.

In order to be truly data driven, you can't really work with static dashboards and reports that were conceived six months ago. You need to be much more dynamic. You look at data and that produces a question in your head, and then you ask that question and then that produces another question, and that's how you get to your five whys, and that's just not possible in sort of traditional.

BI product thinking where there's a separate producer team that's producing data insights and there's a consumer team that's consuming these insights to make business decisions, and we wanted to shrink the gap between when somebody has a question to when they get answered from like what used to be weeks to a couple of seconds.

That's been the vision of hotspot. Always vision. I kind of worked on the foundation of this thing.

[00:07:31] Tom Dong: So it sounds like you've definitely, with ThoughtSpot really moved a needle on, you know, this democratization of analytics that everybody's been talking about for years and years and years. You know, reducing the complexity of the tools and, and even the process behind it, the people who need to support that L l M seems to be the missing link that you guys have been able to add very recently into the product.

I'm just curious though, does this introduce any new or different types of challenges with this type

[00:07:58] Amit Prakash: of capability? So what most people are worried about with LLMs is hallucinations, where they'll give you convincing answers to questions that are wrong and you don't know that it's wrong. We've been largely successful at L eliminating that for the use cases that we care about for two reasons.

One, we don't take the output of L L M right away. L l M generates sql. We run that SQL and then we give you the output. So the output is always grounded in real data in a database, but the SQL query could be wrong, right? The best tool for eliminating hallucinations in L L M is to use them only for reasoning and not for their memory because they have faulty memory.

If you can bring everything that is relevant for answering a particular question into the prompt itself, then l l m is not relying on its faulty memory, is just doing the reasoning part necessary to like these pieces together and translate that to sql. So that's, that's one thing. The other thing is that, Because we had already invested a lot in doing this without lms, we are able to take care of a lot of this complexity and not expose it to lm.

As a result, the LMS task becomes much simpler. To give you an example, suppose if you ask, how much revenue did I get from California? You may be asking it from a data model that has potentially a thousand columns and a billion rows. What we can do is that we can figure out that this question is about the revenue column and the state column.

That live in two different tables and we know the joint path and everything. So what we feed L L M is basically a very small table that perhaps contains revenue column and state column only, and it has only two rows. Maybe one of them is California. And so what that L allows l M to do is ignore all the other complexity about which column to go after and how to join these things.

Also not have a confusion about whether it's California or CA in the database. So it can generate sql. Now, this SQL cannot be run against Snowflake or Databricks or something, right? Because it's supposed to be SQL on this pretend table with just two columns. So then we can take the information in that SQL and translate that into real SQL that needs to be run across real data.

That's how we've been able to eliminate the hallucination problem.

[00:10:32] Tom Dong: Right, right. So yeah, to build trust, at least within your user base for this very specific use case. But there's a lot of startups out there who are trying to build, you know, the tooling and infrastructure around l L M. Just curious if you could maybe talk us through maybe some of the other considerations, um, you know, to be able to build like a trustworthy AI system.

Yeah,

[00:10:54] Amit Prakash: it really depends on the application. There's a continuum of approaches that you can use to build trust. The first one is obviously human in the loop, where L l M is just generating a first draft and then somebody's reading it, verifying it, modifying it, and then putting in front of the end user.

For example, if you're giving a medical diagnosis, that's the only way to do something like that, right? The next one is where you have trained another model that's just classifying the output as to whether it's acceptable or not. So for example, if you want to eliminate any offensive output from the model, you could put another model in front of it that.

Just classifies the output, whether it's offensive or not, that that's a much easier problem than naturally generating the text. Right? So in the next one, which is kind of where hotspot is, is let's say you're trying to go from natural language to sql. For our end users, that SQL may be impenetrable, like they don't know what's going on, how to understand it, how to modify it.

But if you. Introduce an intermediate representation that the end user can understand and modify themselves, then it completely changes the game. So in case of ThoughtSpot, you ask the cash question in natural language, it gets translated to the keyword search syntax that ThoughtSpot has. And then that keyword search syntax gets translated into sql.

That keyword search syntax is very easy for any business user to understand, so they will know right away whether that translation was correct or not. And if it's not, then they can modify.

[00:12:42] Tom Dong: Well, maybe let's switch gears a little bit here and talk through some kind of technical implications for all these.

Practical applications of L l M and O. Vja. You've been thinking a lot about, you know, applications of AI here at Nets springing, as well as over the years. Why don't you lead us through a quick discussion here and maybe some of the nuts and bolts of implementing L

[00:13:03] Vijay Ganesan: L M. Very exciting to double click on some of the things you alluded to about use cases of L L M in analytics.

So as I think about this, there's really three areas which I'd love to get your perspective on, where potentially L L M could be very effective in enhancing analytical capabilities. One is making it easy to ask analytical questions. So know I'm a business person, I have a question in my mind about my business.

How do you make it very easy for me to ask that question and get an answer without having to learn a new tool and stuff like that? So, so sort of extensions to some of the things that you've done at ThoughtSpot and what specifically l l M helps in that area. So the second one is, uh, interpretation of answers.

So the, so I'm looking at some visualization. You know, a lot of times people struggle with interpreting data that they're seeing, even if it's a pretty chart. It's, it's, you know, how cans help with. Interpreting the data and highlighting key aspects that they're kind of in the data that I'm looking at, but not quite obvious.

Right? And then the third thing, which is the holy grail of analytics is, hey system, can you tell me something interesting about my data, about my business that I don't know? And I don't wanna bother with asking any questions. You should know what I'm interested in and give me the answer. Right? So, These three areas and any other areas that you can think of.

How does l l M play a role?

[00:14:30] Amit Prakash: So the first one is, I guess, pretty obvious that I'm a business user. I have a data question. I have a particular way of phrasing the question based on how I think about these business entities. How do I bridge the gap between the way I can think about it versus the way the system needs it?

Right? I may be thinking how much a c v. Is likely to be closed this quarter. What is likely to be closed this quarter means, it means maybe that they has put the opportunity in a commit category, and it means that the close date of that opportunity is within the bounce of the beginning and the end of this current quarter based on today's date.

So being able to make those decisions. Starting from that sentence to all the way translating to sql, this has really profound implications for how much value you can get out of data. A quick example I can give is that we were training a set of people who work on trading floor, and their job every day is to pick up the phone and negotiate with a bank how much they're gonna pay.

For borrowing securities from some other bank for something that they're trying to do. So these are people who are spending most of their day kind of looking at numbers and making calls and negotiating with another human at the end. They're not data savvy, they're not SQL savvy. They're never going to do anything about data unless they had the tool like hotspot.

Within that training, one of these traders figured out that they were getting overcharged by a particular bank to ATT tune of $2 million a year compared to all the other banks. So there was some sort of a price gouging being done by one. Partner bank that they had, and this person had enough sort of business context to be able to ask that question and make that comparison, but without the ability to ask that question, this question would've never been asked because people who are sort of doing analytics are not thinking in those terms.

I've got example after example where like when you enable the person who has the strategic and tactical responsibility of making decisions, It gets enabled to ask data questions. They ask very different data questions and create a lot more data value. The second one he talked about, which is explaining what's going on with the analysis.

I used to chafe at that one, to be honest. I was like, the chart is right there. You can see the line is going up and then coming down. Why does somebody else need to tell you that? It peaked in April and then it dropped 20% after that. Like, What's the point of all that? What I've learned over time is that these things are useful for two reasons.

One, a lot of human labor goes into writing these things for purely regulatory purposes. And it's just like work that nobody wants to do. You're just reading a chart and you're writing something, and then it's just a requirement that you need to do. And automating that has a lot of value. But the second is that there are a lot of busy execs out there.

Who don't even want to go into a dashboarding tool and look at the numbers. They just want the summary of what they would glean from that dashboard and makes sense to extract that summary and send it to them. Right? What you can do is. You can figure out what are the most common things that people take away from the data, which is like what's been trending up, what's been trending down, what are the patterns where the correlations are?

And you can search for existence of that in the output data. And then it used to be that when you try to describe it, you had to fit the numbers. You computed into a template and it used to read pretty. Robotic and annoying to read. And now with the advent of LMS, you can actually have a very fluent language.

In fact, I've seen research papers where people are trying to figure out what are the right prompts so that the output that comes out looks more like what a journalist would write, and it has sort of that catchy. Title and catchy description as opposed to just saying it in Aland way. So there's some interesting work going on there.

And, uh, we are working on, um, sort of. Narratives as well. But what we are doing is trying to go one level deeper than what you can see on the dashboard. What you might see on the dashboard is that a particular KPI that you cur about dropped by 10%, but if we see that, then we will also do the root cause analysis of why it might have happened.

So, When you get the summary out of dashboard, it's not just what you could glean from the dashboard, but one layer deeper, and you can obviously go one layer deeper everywhere, but the intelligence lies and focusing where it matters to go deeper. And then the last thing that you talked about, which is being able to tell something that the person didn't even ask for.

You and I both know this example very well, where there was a bank down in Australia where they ran sort of our AI insight capabilities, spot iq, and they found that there was a lot of insurance claims that they had paid out, but they were supposed to be paid out by somebody else. But because of software bug, it was sitting in a queue that no one was looking at, and it was costing the bank to the tune of 20, $30 million and.

Sort of one invocation of finding anomalies in their data in an automated fashion surface. Something like this. We've seen other examples where abuse and travel policy costing company millions of dollars extra that got exposed through this. This was possible to do even before lms. There is an interesting idea that I've been brewing for a while.

Which is what happens is when you're looking at statistical anomalies, some of those are truly a surprise to the human being sitting at the other end of the computer. And some of those are just statistical anomalies, but not real anomalies in the sense that the person already knew. Like for example, if I tell you that you had.

A lot more sales during Christmas week, like the week before Christmas than any other week. They'll say, tell me something I didn't know. That creates a lot of noise and that hides the real insight from people's view because you have to go through 20 supposedly insights to be able to find one real insight, and you might get bored and never look at that 20th one.

Right. So the trick there is to be able to capture the mental model of the end user so that you know what they're already anticipating. If they're already anticipating an increase during Christmas season, you can see that from sort of previous trend if they're already anticipating. The amount of revenue they produce from a particular state to be proportionate to the number of stores in that state.

Can you capture that? And then that way you can de-noise. These insights and meaningful insights surface a lot more, and that requires a two-way and conversation between sort of the algorithm and the end user. And these are the kinds of things that you can do with L L M that wasn't possible before.

[00:22:17] Vijay Ganesan: That's very interesting. This is a very fundamental question about LLMs and analytics and analytical computations. So if you look at. Sort of the more common examples that you see of LLMs. It's very, uh, text and human language oriented. Now, how does that same set of concepts apply to analytical computations?

Does it translate? Does it something more that has to be done? Does it have to be specialized LLMs? How does that translate the

[00:22:47] Amit Prakash: most powerful application of LM that I can see right now? Is code generation. Anything that you wanna automate, including data tasks, hopefully you can describe it in simple terms and that generates code and that that code does the task for you.

And that's really the way I feel like it's going to be in analytics. The largest LMS out there today, like GPD four and Florida and Bard. Are all trained on enough code that they can do code generation outta the box. And then there are more distilled models that were specifically trained on code generation, like for example, codex, maybe just for code generation.

And Google also has something called Cody. And ServiceNow launched star quarter. So one definite advantage these models have is that they are much smaller, so the latency and the cost of inference is less. They may also be better at some of the core generation tasks than general purpose ones, but in general, the larger models tend to have more common sense reasoning capability, which makes it easier for them to fill the missing gap.

In the task description, natural language. So I think the jury is out whether you're gonna prefer distilled model just for core generation or a general purpose slash model. So you talked

[00:24:15] Vijay Ganesan: about institutional knowledge and uh, you know, for analytics to work well, you need a lot of context, right? You need to know a lot about the business to be able to make intelligent analytical insights.

There is of course the prompts that you can use in chat g p t about weather or writing an essay or things like that where it's using the corpus of everything that's available on the internet to be able to come up with a, with a good answer. But how does that translate to data? Because, uh, lot of data is sitting inside with an enterprises and databases, data warehouses, and it's not the same as all the information that's available on the internet.

So that institutional knowledge that is necessary for. Coming up with good reasoning and intelligence, how do you make that work on data, which is sort of sitting inside sort of walled gardens of enterprises? Yeah. The most

[00:25:08] Amit Prakash: interesting thing about these large language models is what's called in context learning.

Where you describe some information in the prompt and it's able to learn from it and generalize it very quickly as opposed to sort of classic training, right? So for example, if you insert a sentence that in most cases, In this company, a C v is a representation of revenue, and then you ask a question about a c v, it will be able to pick up that instruction and interchangeably use a c v in revenue, or if you describe the signature of a function like.

This is a function that turns a date into the quarter and then ask it to generate code that requires use of that function. It'll start using that function. Most of the times. The right answer is that you pick up the relevant institutional knowledge and inject that into your prompt, and that's why vector databases have become such a huge thing lately.

Where what you do is you take all that institutional knowledge that's available in a written form and you chunk it into paragraphs, and then you create a representation of that paragraphs, meaning in vector space through embeddings, and then you store that. In a vector database. So when some, when you're trying to answer a question, you will turn that question into embedding as well.

And then you do K nearest neighbor search in the vector database, and then you find K relevant paragraphs, and then you inject that into prompt, and then you ask rather them to do whatever it is you're trying to do, answer a question, generate code, or things like that, right? So that's one way to inject institutional knowledge.

[00:27:03] Vijay Ganesan: And this is sort of ties into what you were saying earlier about you want the intelligence and the inference, but you don't want memory. That's what it

[00:27:12] Amit Prakash: relates to. Yeah. The thing with this approach is that right now the context windows are somewhat limited, so the most common sort of context window that's available to people is somewhere around 2000 to 4,000 tokens.

That gives you a couple of pages worth. Text or maybe at most 10 pages worth of text that you can put in there. People like Claude, who's talking about a hundred K context window, GT four has a 32 K context window version of model as well. And then the papers out there with a million token context window, which allows you to put really a lot of information and that way you don't have to worry about whether you got the right information in there or not.

The other way is fine tuning, of course. So you can pick up a model that was trained with public data and then either you can retrain a few layers, like the last layers of that neural network based on your data. Or you can sort of let everything be perturbed with your training data and that way you're able to push the model.

A little bit towards your use case and let it learn a little bit more about your use case. If you have, I think I've heard numbers from anywhere, from like 10,000 to a million instances of training data, then it's a reasonable thing to do is to pick up a model and fine tune it for your use case.

Obviously it costs a lot more to fine tune than uh, to just do prompt engineer. But it's an investment that in some cases, at least totally worthwhile, and the cost of fine tuning is going down substantially. Switching gears

[00:29:03] Vijay Ganesan: a little bit, let's talk about modern data stack and data warehouses. So as we know, the data warehouse is becoming the center of the universe for data and enterprises, right?

You know, everything is coming to the warehouse. Even the kinds of data, like product instrumentation streams, IOT data that, you know, 10 years ago, Would never reach the warehouse today. Every, it's reaching the warehouse. So the warehouse is becoming the center of the universe. What does that mean for LLMs?

Uh, specifically, I was reading recently about Snowflake acquiring a startup to inject more generative AI capabilities inside of the database. Now, you know, data warehouses always had, Machine learning type capabilities embedded in where you can use these things as simply functions in sql. So are we going to see more of sort of l l M functions appear, um, that you can use in a warehouse in within sql?

[00:29:58] Amit Prakash: That would be interesting, wouldn't it? LLMs are pretty powerful generic functions as well. So if you're trying to do a classification task, you can give it like three examples of how to classify things, and it does a reasonable job after that of classifying things based on like everything else that it knows about.

So instead of invoking a classifier, you could just give a prompt. Obviously the advantage is that you can get this without. Having to create a lot of training data and training a new model, you can get these things done much more easier. Obviously, the inference time with these models is in order of seconds, and so if you're trying to process even a million rows of data, that's a lot of compute and a lot of wall clock time.

So I think things will evolve probably faster. Smaller models will see its way. To do these kinds of things. I've also seen people try and train LLMs over tabular data. Like Rosen columns. I haven't seen that many successful use cases out of that, but it's definitely an interesting idea that's worth exploring and we'll see how that evolves over time.

And then like all other complex applications, LMS can be a pretty useful assistant or co-pilot. For auto completing SQL and um, giving you the rough syntax of what you need and invoking stroke procedures and things like that. So I can definitely see that sort of stuff Also being there

[00:31:43] Vijay Ganesan: organizationally, how should a C D O think about LLMs?

Are we gonna see like ai. Groups being formed, uh, you know how ownership of data, you know, there's a lot of training data, you know, what happens to the ownership of the data and how does you know who owns the models? So transparency of these models and things like that. So just from a, a cto, CCIO perspective, CTO perspective, what would be your advice in terms of how they should think about structuring themselves organizationally with all of these new things coming in?

[00:32:17] Amit Prakash: A large majority of the cases, you don't need to train your own L L m. I think accessing public l l M through API is the right thing. Of course, you have to make sure that you have a secure setup where your data is not leaking out and into a public L L M and their ways of doing that, like when you have an enterprise agreement and you're making API calls, none of that is getting persisted.

With the blast providers, at least. If you do start fine tuning or training your own model, then you have to worry about your data pipelines being clean and reliable, versioning your models, and going after the right version, worrying about the drift around it and things like that. That will require a lot more sophistication.

There are some tools out there. For managing versions of your training data and your trained artifacts that you can use organizationally, I think it's still very new, so I don't think sort of a proper discipline has emerged out of it, but a lot of what happened in classic machine learning or data science still kind of applies.

I think those best practices will still continue to be a good thing where. You're doing your test and train separation. You're versioning your data and your model. You're monitoring your data pipelines for quality, you're monitoring your model outputs for drifts and things like

[00:33:52] Tom Dong: that. So speaking of best practices, then I, I know that we were chatting the other day about this new meetup for generative AI that you've been involved with.

Maybe here's a chance for you to promote that a little bit and tell other data leaders out there, you know, opportunities to learn more. I think this would be fantastic for all of us to hear.

[00:34:11] Amit Prakash: Yeah. Thanks for bringing that up. So last six months have been crazy and anybody who's been trying to. Stay on top of this thing has realized that that's a very hard thing to do.

And even like yesterday, our guests, when were saying that most AI researchers, he knows kind of secretly feel that they are behind as well in catching up. Because like every week there's a new paper and you need a few hours to be able to read it. So the idea behind creating this monthly generative AI meetup was that we can all learn together and help each other out.

This was the very first one that we did. About a hundred people registered. The managing director, of course, venture Swim was leading the discussion and it was fantastic and we hoped to continue this and have a fantastic speaker every month. And, um, have a engaged community around it where people are asking great questions and we are.

Learning together. So we don't have the date for the next one, but it'll be sometime in July and look forward to more people showing up.

[00:35:21] Tom Dong: And where should we go to sign up for that? So

[00:35:25] Amit Prakash: it's uh, I can share a URL that could put in the notes, but it's, if you search for ThoughtSpot generative AI meetup, it should show up.

[00:35:36] Tom Dong: Well, thank you so much for joining us today, Amit. This was a fascinating conversation and one of the hottest topics out there. It was a real pleasure to have an expert in the field like you share your experience today and wisdom and thoughts for the future. So again, thank you very much for joining us today, Vijay. Any final thoughts from you?

[00:35:57] Vijay Ganesan: No, it's always a pleasure talking to you. You know, deep insights well thought through and well presented. Really appreciate it and I'm sure our viewers would, uh, benefit greatly from this. Thank you so much for doing this.

[00:36:08] Amit Prakash: Thank you. It was a wonderful conversation. I enjoyed it and thanks for hosting me.

[00:36:19] Tom Dong: Well, that was a really fascinating conversation, uh, with Amit here. There was just so much there to distill. Vijay, why don't you take a stab at summarizing your key thoughts out of today's show.

[00:36:30] Vijay Ganesan: One thing that I found interesting in what Amit said was this, that, uh, you want to leverage the reasoning and intelligence of LLMs, but not necessarily the memory.

Right to avoid hallucinations. And so his approach of, you know, vector databases and leveraging that and then feeding the appropriate context to the prompt to avoid the hallucinations, I thought that was very interesting idea, which, uh, I hadn't thought about before.

[00:36:58] Tom Dong: Yeah, absolutely. And, and for me, kind of from the, my own journey and analytics thinking kind of the massive business impact and potential here, I think back to like 2011 and pop culture, when AI suddenly became mainstream with.

The Jeopardy challenge of Watson over at IBM and it's taken us, you know, 12, 13 years here at Dow to come up with like viable commercial, uh, applications of it. And, you know, some of the discussions that we had around, Trust around these systems is really been the challenge of really any analytics application to really receive kind of the widespread adoption, right?

People need to trust it before they're able to use it. And when we think about using them for reasoning, I think you dropped the terms or great assistance. They're great co-pilots. That's where we are today, right? And it's, you know, proving to outperform humans on, on many tasks by giving us that less, you know, that.

Ability to cover that first 80% with these technologies, but still have humans to finish that last mile or last 20%. And so really, you know, the potential is still there, right? We're only scratching the surface of what these tools and tech technologies can do. And as we become more confident, And trustworthy in the systems.

They will certainly move beyond just assistance, and I'm sure humans will happily in select cases allow it to complete, automate, uh, some processes today.

[00:38:33] Amit Prakash: Yeah.

[00:38:33] Vijay Ganesan: And, and he gave us a couple of examples of how a business person could get tremendous value from this system. Right? He was talking about a trader that was able to detect, you know, some fraudulent, uh, commissions or with a partner bank and so on, right?

And, and there's something interesting, um, concept there, which is. The way a business person thinks of analytics is very different from the way a data person thinks of analytics. So there's always an impedance mismatch there. And with LLMs, uh, it makes it easier for a business person to ask and get answers without having to translate that into something that a data person understands.

So, that really opens up. Phenomenal opportunities because the data person cannot think the same way as the business person thinks and vice versa. And so, so we're stuck in this world where the business person has some idea in their head, but then they, it's lost in translation. And so you don't get to the analytics.

Whereas in this world, now with LLMs, you don't have that impedance mismatch, right? So, so to me, that struck me as some great examples of how this is gonna really take analytics to the next level.

[00:39:39] Tom Dong: Thanks again, Amit and Vijay for helping me co-host today. That concludes today's show. Thank you for joining us, and feel free to reach out to either Vijay or myself on LinkedIn or Twitter with any questions or any suggestions for future topics. So until next time, thank you very much.

[00:39:56] Narrator: Thank you for listening to this episode of The Analytics Edge. If you enjoyed the show, please take a moment to leave a rating and a review and share it with a friend and connect with our hosts on LinkedIn. This show is brought to you by NetSpring. Visit netspring.io to learn what next generation product and behavioral analytics can do for your business.