Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello, and welcome to this session, why most data projects fail.
And I would say, more importantly, how to avoid it.
So let's start out by talking about what you see when
you go out on social media. When people talk about, why do data projects
fail? They think they fail because you chose snowflake or
databricks, or it was a data lake versus something
else, data warehouse, what have you, or it
was the programming language that you chose. You should have chosen
Python, should have chosen Java, should have chosen rust.
But in reality, is that why projects fail? Well,
let's talk about why. And to help back this up,
we have this number from Gartner. This also corresponds with
my experience of how many data projects fail. And that is 85%
of data projects fail. And it brings up
the question, why is this so bad? Because that's a really, really terrible
number, 85%. So if you invert it,
then you say only 15% of data projects
actually succeed, which is still really bad.
So why can't more companies create more
value with data? That's a really good question, and it's one
I've spent a lot of time researching and trying to figure out why.
And part of this is that technology is just one
small piece. It's initially smaller,
definitely important. Not going to lie, it is important, but it
is one small piece of your success with data.
It is not the entire thing. Normally, when people
talk with data teams, they just focus on the data, on the technology.
What technologies did you use? Programming language. But that's just one small
piece. What we're instead going to focus on are the
questions, the right questions that you should be asking. We need
to know who, what, when,
where, and how. These are really important questions
that we need to answer before we start these data projects.
Who is the right people at the right ratios?
This is incredibly important because oftentimes companies will
say, I'm just going to hire a bunch of data scientists, and those data scientists
can go do this project. The reality is that those
projects don't really work well because we're
not missing. We don't have all the people we need. We need data scientists,
data engineers, and operations engineers.
And more importantly, we need them at the correct ratios, where we need
more data engineers than we need data scientists.
This is a really important part. Oftentimes teams
will have no data scientists or
just a few data engineers relative to a data scientist.
So this brings up this question that I often get,
especially for management, and they say, which team is most important?
Which team? If I just can only do one, which one should I do.
And the reality is that all three of these teams
are necessary for success. We need
data engineering, we need data science, and we need operations.
Each one of these three things does a very specific
part or is an important part of
this ecosystem. I have another talk that you can
watch where I talk about what happens when each one of these is missing.
But suffice it to say, if we're missing data engineering, we're missing
key software engineering, distributed system skills
that data scientists do not have, and this is not their core competency.
Likewise with operations. If we're missing operations, we're missing
somebody who is making sure that things keep running well.
And likewise with data science. If we don't have data science,
I go so far as to say, why are we doing all this effort?
We're not getting that advanced analytics part.
That is a key part of the value creation, the optimal
value creation there. Next we go on to what.
What is the business value? This is incredibly important.
Doing data in and of itself, just for data or because
everybody else is doing it is not enough. What we have to do is
we have to say, what is the business value? If you're
just saying now, especially AI,
we're just going to do some AI, we will do this or
that. AI, that isn't enough. That isn't a business value.
AI is not a business value. Applying AI,
or applying data and AI to something to create business
value, yes, that's certainly applicable. But there
should be a clear and attainable path to value
creation. It is not just, we're going to go do AI
now, that doesn't work. That does not create any value.
It just creates a lot of strife and a lot of extra
work for the teams. So having a data
strategy isn't enough either. There needs to be a plan
and there needs to be execution. Oftentimes a company's
data strategy sometimes is just AI, frankly, or it says,
we're going to use our customer data to
understand our customers better. Not a data strategy either.
This isn't clear, this is part of why teams fail, is that
the executive management, perhaps the board c level, says,
here's our data strategy. We're going to use our data to
do something nebulous, something that you don't really
know what's happening, or there's no real way to measure it,
no direction to it. So what we need to do is we
need to have a data strategy that has a clear plan, it has
clear execution. If we don't do this, we won't
be able to go through and actually execute on it.
When are you going to generate value?
Oftentimes there's kind of two opposite ends of this win.
Oftentimes the win is as soon as possible
or this unattainable deadline. So these
unattainable deadlines just aren't feasible. They set the team back
from the very beginning and make it so that they are not going
to be successful. They're never going to be successful because
they've been set up for success, for failure with a
deadline or a timeline that is not feasible at all.
It usually starts with a misunderstanding of
what the difficulty of data projects and the value
or how much time it takes to do a good data project.
They just think or have bought into a vendor or perhaps
read something. And that thing that they read said, it's easy.
Data projects are easy. You just need to use my software
and they out they go. But that isn't feasible.
And then on the opposite end, we have this when it's ready.
The issue with when it's ready is that those teams often never
get something out the door. It's always, well, there's another
thing we need to do and another thing and another thing and another thing.
And we have this real issue with this dichotomy of
these teams never generate value because the
teams that are always behind, they're always behind, they're never getting,
they're never going to generate or live up to those expectations.
The, when it's ready, people, the biggest problem,
and I would say, especially now with the economy being
what it is, that projects that take too long
to generate value are going to get canceled.
And this is really important because oftentimes boards,
c level people will say, oh, you have enough time, don't worry.
Well, they also have short memories and they also have pressure from
various stakeholders, shareholders, for example, or money or
something's changed in the market. And when they said, you have
all the time in the world now, you don't actually have that time.
You actually have, well, you need to get something done yesterday.
And so what I really strongly advise team
leads, managers of teams, data teams, is to
not take what they say, not at face value, but to
actually have a more feasible timeline
in mind. So don't take too long to generate value.
Those projects do get canceled. So what will happen
is that they'll promise too much. They'll,
they'll just never deliver on those expectations. So be
really careful there. This is how you set yourself, you set your team
up for failure. Data teams need to have a
very clear and plan and architecture
of where each piece is going to be done, where are you going to put
this piece of, where are you going to run things? Very,
very important. You need to have this plan
and or create this plan if you don't know where it is.
One of the most important parts that I would say here for
companies is, is that the data teams are often novice.
Frankly, they don't know what they're doing. They've never done this before
and it's difficult. And so what they'll do is they will say,
reach out to a vendor. Sometimes that's a cloud vendor,
sometimes that's another vendor,
and they'll look at their website and the website gives them this.
Oh, wow, here's this plan. Oh, you just do this,
this and this. Oh, that sounds good. I can do that.
And so they'll go, follow that vendor plan. And guess what?
That vendor plan does not have any nuance to it.
It says, use our technology for every single thing.
And it's that way because it's written by a
marketing person who's paid to say, use our technology for everything.
That technology may not be the right tool, probably isn't the
right tool, but marketing people are paid to say,
use our technology for everything. And so as you
look at those vendor diagrams, those vendor white papers,
those vendors sort of things, yes, it's going
to be wrong because their recommendations are not going to recommend
that you use the right tool. They recommend that you use their tool.
And especially new teams, new data teams
don't understand that nuance. They don't understand that a team
or a team that is new will make these sorts of mistakes.
So it's really important that you get
either somebody who knows what they're doing, an architecture review,
something along those lines, to make sure that you are doing
the right and using the right things.
Next, it's how, how will
the plan be executed? So data teams
need a clear plan that they're executing.
And oftentimes when a team is tasked
or perhaps a brand new team is there,
they're going to be executing and they're going to say,
here's our plan. Except that plan will often change.
Or they'll go and say, we want to
do ten things all at once. That's not going to work.
I've seen it way too many times. Those sorts of teams that
go and they're brand new, or they're very excited about
this and they try to go do ten things, guess what happens?
They don't do any of those ten things. So what
I always talk about when I when I mentor a team and
I tell them, get a singular focus, one to
three things, ideally one or two things, not three things.
But sometimes executives get mad when
there's only three things or in their perception, not enough.
And so what we need is that focus, that focus on one to three
things. By doing those one to three things, you actually
get them done instead of trying to do
ten things at once. So what happens with ten things at once is that
you try to, you make very little progress. There's a lot of
switching in between them. And as a direct
result, instead of it taking maybe one
x the amount of time, so one times ten,
it actually takes two or three times that. Ten X, where if we would
just done the and focused,
we would have gotten them done at a one X amount of time.
So by focusing, we actually get more
done faster rather than trying to do ten things
and never get anything done. Very, very important to get that focused.
If we don't do this, the teams will get bogged
down in too many different directions.
Now, I have a bonus here for you, and that is why.
Why is our data valuable? Eventually,
you're going to be be forced to come
to an accountant or some accounting meeting or somebody who's
looking at numbers, and they're going to say, you need to justify
why you're spending this amount on people, on cloud
costs, on licenses, whatever,
and you need to have very clear justification for
what you're doing. Otherwise, that bean counter,
that accountant is going to say, oh, well,
it doesn't matter what you're doing. We should just cancel that, or we should reduce
your staff from 20 down to two people. You're just not generating enough value.
So it's always important to go through,
to have clear data on, or clear information or
clear numbers on what is the value of what you're doing?
Why is that data valuable? A very common question here
is where do you get that number from? And the number comes
from the business comes from the customer.
And how do you get that number? You ask them.
You actually work with your customer? Oddly enough,
yes. This may be surprising that we actually work with our business customer
not just to deliver on what they're asking, but to have
good numbers about what the value is that we're delivering.
Because eventually somebody is going to say,
what does ThAt Mean? What does that do for us?
I look for a ten x Roi in this investment. What that
means is if we're spending €100,000
on investing in either team or technology or something
like that, or on a project, we would look for a ten
X investment ROI. So spending that 100,000
euro should net. And we,
and I usually target that for a few reasons. One is that
as we talk to the business people about what the value is that we're
creating, they will actually perk up and they'll listen, oh, well,
we'll make a million, for example. They'll listen to that. They'll actually
put an investment into that. But it also gives
you some CusHiON where, let's say you don't hit that ten x for whatever
reason, you only hit five x. Well, you're now
at 500,000. So gives you some cushion,
and it gives you this ability to talk to the business in
the numbers or in the way that they want you to talk to them.
What we've seen here is we've answered those questions.
What you have to do, you as the audience, you listening,
you need to start by answering those questions,
then move on to the execution. If you're in the middle of a data
project right now and you heard one of those questions
and you don't have an answer for it, you need to answer that now,
because you will eventually need to answer that question
in some fashion. Another thing that's important here is that not all
gaps are technology. Sometimes gaps are people
and technology. We need to check for that. So we need to
check for those gaps in the people and technology.
So AI projects, they're rarely just,
hey, we need to put this new technology in place. Let's put
some spark in place now with Genai.
Let's put chap GPT in place, and that will solve all the problems.
It will, may solve a problem. It may add ten problems.
It's just a, it's just one piece of the puzzle.
So, thinking, oftentimes, technical teams especially
think it's, oh, we just, it's a Lego piece. We're missing that Lego piece.
Stick it in there. All good. What is,
is that it's often an organizational,
a people and a skill change,
where even if we make that change in data, we start putting
this data in place, what will happen is that the organization
isn't ready to use it or the people aren't ready to use it,
or the people in the data team don't have the skill.
Programming especially. Sometimes they've taken their data
warehouse team and said, oh, you're data engineers now without
the programming skills and they have no programming skills.
These are big problems. So it is really important that
we check for these gaps. These gaps are part of what are going to
make you fail in your projects. In my experience,
and I have a pretty significant amount of experience not just
consulting in this, but talking to others, interviewing others,
doing surveys. And that is around
when you get help. It's really important to do this because
these problems don't fix themselves without specific and
concerted effort. Put it different ways, if you
don't put effort into this, if you see a problem and
you don't put the direct effort into fixing it,
they will not change. It will not change for you. It doesn't.
There's the saying, time heals all wounds.
That isn't going to happen. What is going to happen is
that the same problems are going to keep on repeating. So there's a
lot of different types of help out there.
Outsourcing, technical consulting, management consulting,
lots of different possibilities, each one. For example,
let's say you're a company that has no software engineering prowess.
Maybe outsourcing is the way to go rather than building a team.
Or if you do have a team that's floundering, probably need some technical
consulting or management consulting. There's a lot of possibilities out
there, but unless you avail yourselves of them, you won't
get your help and your fix. So most of
you who are watching this, you probably have a project that's going to start,
about to start. Sometimes what I like to do is actually
invert it and say, turn a situation upside
down and say, rather than saying,
where do you want to go and
how do you get there? What? Maybe we just say,
what happens if all of our plans go wrong? What would have happened?
Now think about that for a second. What if sometimes you
come into these projects with rosy colored glasses that says,
oh, yes, we're going to, everything's going to
work out, but if we invert that and say, okay, well, what if it
goes wrong? What would have went wrong? Well, it would have went wrong
because the CEO will change his mind every millisecond,
or the marketing people will do this or
we won't get the resources for that. And by looking at that and looking at
the possible failures, you can start to change your plan and
make sure that your plan is more resilient to change.
Another question I like to ask is,
how good is a data engineer at a data science task?
So think about that for a second, would you,
if you're a data scientist or you're a data science manager,
would you say, oh, we need an advanced model. Let's have
a data engineer do that and say,
okay, why are we having our data scientists do data
engineering tasks? Because they're really not well, or they're not
good at it. Quantifiably not very good.
And I do have a lot of data on this as well.
So why are we doing this? Another inverted
question that we would say is, what would the business say if
our project was canceled or the cluster was turned off or whatever
is providing that data just went away. What value?
What cluster? I don't care. Those are really bad
signs that the project is generating. No. Or low value.
And that means that the business doesn't care about it. So when somebody's
looking to cut costs, somebody's looking to do something, those are the projects
that they're going to look at. But on the other side, if. When we say,
oh, the business is going to. If the business says,
oh, you can't do that. I look at that report every
day. I look at that report every morning, and I base all these decisions
on that. Those are two very different sorts of views of
data and value creation. And if you aren't on that side where
people are saying, I need that, I use that every day,
that's a big problem. So why shouldn't you
copy someone else's architecture? That's a pretty big problem.
You heard me talk about it a little bit and say,
hey, if you go to a conference, and maybe
even at this conference, people will say, here's our architecture.
More likely, you shouldn't copy it because
they're doing something different. They have a different goal. In fact,
I've worked and consulted at companies who are. Who are competitors
to each other, so same exact space, same exact industry.
And their architectures were different because they were trying to do two
different things. And what often happens in these architecture
talks is that they don't talk about the nuances,
they don't talk about the reason I chose this or we chose that.
There's some problems there. I think an interesting
example of this was Uber. Sometimes companies
will just show you this diagram all the way at the end of here's
the most complicated thing that's possible, but what they don't show
is how they got there. I think Uber was very interesting.
You can see the link down there to their diagram,
but you can see their
data, their diagram. They started pretty
simply, and their first generation
started very simply. Second generation got
a little bit more complicated. You see a few more things.
Their third generation got even more complicated, even more things.
So there is a progression. And so what I really want you
to think about and think through is don't overengineer
and don't copy. Just because you saw that diagram
from Uber. Doesn't mean that you, you should be copying Uber.
Uber did something for Uber. They didn't do something for
you. So just doing that,
that could mean overengineering. That could mean underengineering.
Could be doing something that is completely unnecessary,
complete overkill for you, and only move
when, when it's necessary. As you saw, each one of those generations
was in response to a limitation that they started
hitting. Maybe you won't hit that limitation. Maybe you
don't have that limitation. Another thing to think about is
I ran a survey for data teams in 2023
and I've received 81 responses,
responses from all different sizes of companies maturity levels.
You can see the URL there if you want to go read it.
And what I did is I tried to look at the worst and the best.
I tried to look at the value generation that they were doing and
say, who is the worst? And then looked at
what they did, what their responses were. For example,
here we see the lowest and highest creation. So the
red being the high value creation, blue being low value creation.
And these are their challenges. And what's interesting here is that
they both faced basically the same challenges.
You can see them pretty even across. However,
the high value creation, they faced some early mistakes.
High value faced more advanced things. But when
you looked at the methodologies, they were using more methodologies,
they had put effort into saying, this is how we're
going to organize ourselves. This is how we're going to structure
ourselves. Not everything is about technology.
What we see right here in this diagram is methodology.
Organization actually mattered quite a bit here.
We also talk about the number of teams. Sometimes teams
would say, oh, we can get by with just data engineering or just data science.
And we can see in this diagram that the highest
value creation in teams was with one
to three teams. Two to three teams. Really,
really important that if you are trying to struggle with, let's say, just data
science, you're going to hit that low value creation very
important. As well as best practices. The high
value creation teams did the most number of best practices,
whereas the low value creation teams did the fewest.
It's very important to do those best practices, to consistently
do those best practices. And that brings us to data
teams. My book is called a unified management
model for successful. What we need is all three teams
working together to create value from data.
We have our data science. These are the ones that are the consumers.
They consume those data products. They create that advanced
analytic and they work with data engineering,
where data engineering is creating these data products. They make
sure that the software engineering sound, that the distributed systems
are sound. And then we have operations.
Operations make sure that everything is running smoothly.
It allows the customers and the business to depend on a
working system. It's only once we have all three of these things
that we can truly be successful with our data teams.
Some of this is, I had to go pretty high level.
I have a book, it's called data teams, if you'd like to read it.
It goes much deeper into the teams, how the teams work together,
as well as case studies. Now, those case studies are really important because
I put a lot of effort into not just my research
and my own experience, but I wanted to hear others.
So there's full case studies of companies with ten plus years of
experience doing this and what that looked like.
So data teams is a great resource for understanding
this. So with that, I'd like to thank you for attending this session
and I wish you the best of luck in your data projects.