Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey everyone, I'm Amar and I'm thrilled to welcome you to this session.
as a senior manager of software engineering at Lowes, I have had the
privilege of, leading multiple teams, that design and deliver, platforms, that are
capable of, scaling, to millions of users.
but today, I'm going to dive into something even more
exciting and transformative.
how to build a resilient, real time analytics pipeline, a topic, that's
redefining the way the business operates in our, fast paced, digital world.
Think about this, right?
What if, you could process over a million events per second while
maintaining near perfect uptime?
imagine, the competitive edge, that the, that your business brings, whether it is
delivering, lightning fast personalized recommendations or, enabling real
time decision making or probably for that matter, proactively addressing,
customer needs, before even they arise.
In this session, I'm going to uncover the cutting edge architectures
and then strategies that some of the leading companies are
using to achieve these goals.
I'll share real time and real world success stories,
walk you through the process.
the technical, challenges and the technical design and probably the
solution and provide a roadmap to help you implement, these systems in your projects.
whether you are here to supercharge your technical expertise, gathering actionable
insights to tackle your current challenges or simply explore, the incredible
possibilities in real time analytics, absolutely are in the right place.
Let's, go on.
so, before, getting on to the subsequent, slides, I would like to
say that, you could connect me on LinkedIn at, Amarnath Imadisetty.
so, let's get on to the subsequent, slides.
Let's try to understand what's the customer expectations.
So, as I said, in today's digital era, 80, 82 percent of the customers do expect
instant and personalized responses.
this has an absolute, competitive advantages.
Organizations who are using real time analytics.
experience a 26 percent boost in customer satisfaction matrix.
And also this led to 3.
2 times higher customer lifetime value.
So let's get on to, the core benefits of, the real time analytics.
so real time analytics has a cap.
I think when we talk about real time analytics, the primary thing that to look
after is, the processing speed or the processing power at which we process it.
real time analytics, Has to handle, a huge amount of data.
So probably I'm talking 1 million events per second, right?
And also, the instant personalization is something that, is embedded as part
of the real time analytics, right?
so it allows business to tailor customer experiences in a blink of an eye.
Right.
and probably that also boils down to the business agility.
when I talk about business agility, companies using real time analytics can,
adapt to changes within milliseconds.
Let's understand, the, real time, Scenarios where the companies are using.
imagine a large e commerce site, and I hope most of you or all of you might have
shopped somewhere at that time at Amazon.
I'm taking Amazon as an example here.
imagine the number of transactions that Amazon do get.
when a customer clicks on a product, adds it to their cart, or searches for
something, or, view the cart, or maybe making a payment or search for the deals.
So, All this data has to be collected instantaneously.
With the power of real time processing, the system can track millions of these
events happening at the same time, and across the globe at the same time, right?
Ensuring a smooth and fast experience for every user.
let's assume, if you have ever been to Amazon and then probably, if you are
shopping for say, a refrigerator and the subsequent browse history will, let you
capture this refrigerator section and then start recommending the products for you.
Right.
even, when it comes to instant personalization, right?
Some other, classic example, most of us might have watched Netflix or Amazon.
for instance, when we, when we open Amazon, the platform quickly analyzes
the past viewing history and suggests the shows or movies that, We might like,
similarly an online shopping that I was just talking about Amazon, as we just
browse through the product, we will all noticed personalized recommendations
based on, our preferences, such as people who bought this also bought this all
made possible by real time analytics.
when it comes to business agility, as I was talking about, this.
Empowers the business to adopt or, to the situation in sub
milliseconds to milliseconds.
For example, let's take ride sharing apps like Uber or Lyft, right?
they do adjust prices dynamically based on the demand, traffic conditions, and
number of, available drivers, right?
when demand, spikes, during the rush hour, or probably a special
event or an event that's happening.
the system quickly increases, prices to encourage more drivers to get on the road.
This kind of fast decision making help business, stay,
competitive and responsive, right?
these benefits show how real time analytics is just not a technical
capability, but a game changer in creating better, faster, and more personalized
experiences for other customers.
Right.
As I just talked through, quickly, so this slide talks about, how
the major, retailers are doing it.
And then.
how the travel industry, does the, dynamic pricing adjustment.
Even it's the same thing.
if you are, planning to, look for, a flight ticket, so as more and more
searches are more and more, clicks through come in or more and more
demand comes in, the price of the flight, the, the ticket price at the
flight gets increased or decreased.
So, This empowers the business to do a lot many things in instant
personalization, probably the business agility is one of the primary thing.
So again, when it comes to the bigger aspect of the real time
analytics, I think we can certainly help, this helps us in anticipating
the required customer needs, right?
So this gives what exactly a customer needs.
So let's assume, if we go to, Starbucks and then they call your name as soon
as they see you, and then probably tell what is required, and then probably
what goes along with you, you would start looking to be a more associated
customer with that particular store or the, with that particular company, right?
And when we talk about churn reduction, I think probably the
friction, that it creates, it helps in proactively addressing the
issues before even they escalate.
So, this obviously helps in, improving the, complete,
customer, engagement journey.
so let's deep dive into how do we design and then, what are all the things,
that are, required in order to build the, the real time analytics pipeline.
So primarily I'll just, go high level and then start talking everything at,
So, data collection is a place where you start collecting, the data, right?
as, as I was talking about, unless you have a data to act upon, you would
not be able to predict or you would not be able to act upon it, right?
The first step in the process is to collect the data and then collect it.
Process the data.
When I talk about process, the processing, the data, probably it
could be cleansing the data or it could be on the lines of izing the data,
making the value insights out of it.
And then we, obviously we need a data storage to store the data.
and then.
The delivery, the data that is being, synthesized, needs to be, put
in a business acumen or a business format where there can be some
actionable insights around it, which is the data delivery component.
And all boils down to your monitoring and resilience, where unless we
have, and monitoring and resilience of the overall application in place.
the reliability, stability, the accuracy of the data will not be accurate.
So monitoring plays a bigger role in ensuring that the data quality,
data governance, data stability, data accuracy, comes down into the picture.
So let's, deep dive into, the first section of data source, right?
So when I talk about, data collection, so there are two aspects of it.
what is the data source?
And then, what do we capture, right?
So if you are taking an example of a data source, this is as I was talking
about, this is starting point of any real time analytics pipeline,
where the data originates, right?
So probably data sources are the platforms or probably the systems or
probably devices generating information that needs to be analyzed, right?
so these are most of the systems that we do use almost daily, in our, life.
So probably like most of us will be using websites, right?
so what do we capture and where do we capture, right?
So in terms of the places of the availability, it comes down to,
some of these data sources, right?
So websites where we do capture user activities, like what the user has,
viewed, or clicked, or probably like what the user has submitted, right?
Okay.
And, the second aspect is mobile apps.
I think mobile apps helps us in getting user interactions like
app navigations, probably product searches and, purchases, right?
I would, this, in the last, couple of years, I think, or maybe I would say in
the last, 10 to 15 years, IOT devices has become a part of our life, right?
So it could be on home sensors, probably the wearables or maybe industrial
equipment that continuously generate the data, telemetry data, right?
For the application health or monitoring the activity around it.
And so, at times we also need to parse through the logs, the server that generate
logs or probably application logs.
that track the system performance, errors, and maybe the user
interactions too, right?
And we all are equipped with social media.
So, I think real time opinions of the customers matter a lot and then
the trends that are happening and the mentions from the platforms
like Twitter, Meta, or Instagram.
And it comes down to some of this payment system where the transactional data that
records the real time purchases, refunds, or the subscription renewals that we have.
So, there's numerous places the data sources can is available.
so these are all the places that, that are widely used to derive some
kind of a real time analytics, right?
yes, the data is available.
Now, how do we capture it?
Right?
so as I talked about data in a, in a real time, you, you have to fairly
understand, or we all have to fairly understand, it's a continuous mechanism.
It's not a bad job.
As soon as an event happens, you are acting upon it or the
system is acting upon it, right?
So data is collected continuously and transmitted in real time, right?
Some of these things can be achieved by using technologies like APIs or
probably like the event trackers or data streaming frameworks.
Each, data source, that I just talked about, contributes to unique insights,
that helps business better understand their operations and customers, right?
probably let's, take an example of healthcare.
maybe, they might not be as interested, on the payment systems compared to,
the, website or the transactions that they are dealing with.
When it comes to the retail, I think they might be more interested
on social media to see how the guest purchases are happening.
And then probably, the websites at which they are browsing or the
competitive, sales that are happening, they need all that information.
So depends on the business, at which we are in, probably,
the source, various matters.
Um, so some examples I can certainly talk, through as a, real time example.
I think as we just talked about a mission, every time a user, searches
for a product, clicks on an item.
So, and then add an item to the cart.
and then probably go to the payment page and then complete a purchase, right?
So in this whole journey, so the data flows continuously, into the,
real time, analytics, allowing the system to act immediately, right?
As soon as, this process is happening.
We will start providing the recommendations and then track inventory
and then analyze customer preferences for the future recommendations.
and say simple example, for instance, if the user, was looking for the
wireless heads, headphones, right?
So, this particular real time analytics helps you to collect, I mean, collects
this input and then it immediately processes it to suggest what's a
related product or specific brands or accessories associated with it.
So sometimes, I think in these days, I think most of these larger, companies
do have, the mobile app based and then, the, web based application.
So, sometimes we have to understand, the ty in the session between the mobile app.
And, the web is highly required.
Let's assume that, we went to Amazon app and then tried searching for a product.
We wanted to show the recommendations when we go to Amazon web as well.
So in the real time example.
So this, when we start capturing the data, understanding, the business, operation
and business agility and business need, the data will be captured and the data
will be transmitted to the system.
so.
So, the primary step, in the complete, data processing is,
I mean, there's a bread and butter of the application, right?
We have data.
If you cannot make the use of the data, it does not make a lot of sense to
derive, the actionable inserts out of it.
so in this particular step, the raw data is being collected, from the
different sources as I talked about.
analyzed, transformed, cleansed, cleaned, into something useful.
So this step makes the data to be more understandable and actionable
for real time data making.
let's assume an example of, we wanted to do what a customer expects.
Unless the customer has a continuity pattern, we would not be able to
derive the real time insights.
So if I am putting in a real time example, let us assume that, we wanted
to understand, what does Amazon make, in terms of revenue tomorrow, right?
Unless we have the historical data, which is continuous, which is actionable,
which makes the average, which makes the mean, which makes some statistical data
around, you know, the historical data, we would not be able to act upon it.
So, as soon as we start on the day one, we might not be able to
derive lot many insights around it.
Yes, we can do some geo based and few other attributes, but A continuity or
a, a, historical approach is always required in order to make this,
real time decision making, right?
during this step, we will understand whether this data can be actionable
or this data will not be actionable.
Right.
so, what exactly happens, during data processing, right?
So when the data enters into our system, it, it might be messy or
probably pretty unstructured, right.
Because as I said, we can get the data from website, sorry,
mobile apps, and then probably some payment systems, IOT devices.
Yeah.
we have numerous, data sources that we are getting it from.
Each has its own data structure.
Sometimes it might be too much of data.
Sometimes it is too less of a data, right?
When it comes to processing, it primarily involves, data cleaning, as I said,
during this, phase, we will try to, fix, the errors, probably, duplicated data.
And, and we also check whether the data that we have received is complete or not.
say, like, simple example, if, we we get, an identical purchase from two sources,
the system has to keep only one, right?
yes, there are times where we might need the duplicated data for some
actionable insights, but many a times, when we talk about, making some of these
real time decision making, the data duplication is, can be avoided, right?
and data transformation, typically transform, into the format that,
is required in the subsequent steps around, for data visualization
or maybe a data delivery aspect.
so a simple example was, I think we, as I talked about, like the clickstream
data, the user data that is being captured on the website can be translated
into, something like, what's the most clicked product in the last 10 minutes.
And at this point of time, it also involves something like analysis and
insights, as I just talked about, is this data can be used or not, right?
something like we can use some kind of an algorithm or probably like some
kind of a machine learning models to generate insights around it, right?
Predicting which products a user might, mostly like to
buy based on, their behavior.
so, we talked about what are the steps involved.
Just I'm trying to run down data cleaning, data transformation,
and probably analysis and insight.
So, what are the industry tools, looks like, right?
So, I think when it comes to data streaming, without any say,
I think Apache Flink or Apache Spark are pioneers in this.
system when it comes to open source, framework, right?
So it can handle large amounts of data in real time.
I have, tested myself and I'm very well, aware of it.
And, there are times where, I have, seen a use cases where, we had to process
more than million records, per second, and then I did not have any problem.
Right.
and, when it comes to AWS, AWS Lambda, I think, it has the capability to
execute, small tasks, quickly, as and when new data comes in, right.
It will be quite helpful when it comes to, the delta variations that we talk about.
Um, and another thing, Google Dataflow, I think, it process, streams of data,
from various sources and transforms them, similar to a full fledged
product that's available from, Google.
Um, and as I talked about, why is this important, right?
without data processing, the raw data that is being collected is too much and
too kinetic to understand for reuse.
unless, we process it, organize it and analyze the data, to extract meaningful
patterns and trends, it will not be considered for, the actionable insights.
So, again, some real time example that I can talk about, right?
we just talked about a user searched for a product and within milliseconds,
the system process this data to update the personalized recommendation.
As simple as that, when we try to buy a washer, the subsequent recommendation
should be more on the dryer, but not like, say, some toilet or something, right?
So, it also analyzes how many searches are happening at the same
time, to adjust inventory and price, probably recommendations, et cetera.
so, like, in the case of, the Uber ride or Lyft ride that we talked about or
the ride sharing, any of the popular ride sharing apps, the app process the
data like what's the rider's location, what's the driver availability, how does
the traffic look, look like and what is the, event process, any subsequent
events are happening, the traffic volume, climatic conditions and probably and,
define the estimated phase, right?
So it has to process all of it, make a reasonable insight, and, in majority
of the cases, we see that, this will help, the process happens in real time,
ensuring, we get, pretty accurate, ETAs and probably, the pricing as well, right?
and when it comes to, some of these financial institute, institutions where,
During banking and fraud detection companies, when a transaction occurs, the
data processing checks for, say, probably particular patterns to detect a fraud.
if you go, this is where it happens and I think, Many of us would have
encountered if we have used some of our credit card in a shop that we have not
been or probably tagged for an unusual spending, it blocks then and there.
So at this point of time, the system detected our user pattern, the location
at which it happened, and the type of transaction that is happening
and the amount at which it happened.
And probably it also has to churn through what is the limits that we have kept in.
So, this has absolutely empowered, like, there's too many numerous examples, right?
This has empowered, the banking and, many industries to act on the real time,
which has a great potential, right?
So the data processing, plays a huge role to understand what are all the attributes
and how do I translate the data, right?
So again, when it comes to data storage, I think pretty much it is, the storage,
where the processed data is saved, right?
so it can be accessed, quickly or in near real time as and when needed.
Again, every section of this is quite crucial, Pat, because it ensures the
system has a reli and it has to be the fault tolerant, HA, high availability,
ZEO located because, it's a reliable place where the data is available to
act upon and probably retrieve the infor information without any delays.
Right.
what happens exactly at the, data storage, situation?
once the data has been processed, it needs to be stored, in a way that, it
supports or empowers real time access.
Right.
data must be accessible, instantly and then probably for any, analytics
and maybe the decision making.
Right.
And as I was talking about, you understand the volume at which it comes like, so
like if you take a credit card example, I just talked about, there are numerous
transactions, numerous ways, numerous credit cards, numerous locations happens.
so the storage has to be able to handle, extremely high volumes of data.
Right.
so it can, retrieve fastly, it can act upon fastly, right?
And as, as it boils down to, unless you have a reliable data storage, it
has to be stored, in a secure, manner.
When I talk about secure manner, I'm not talking about.
PCI, HIPAA and all depends on the industry and depends on
the data that we are storing.
This all boils down when I talk about security, it has to be, HA, high
availability, multiple zones, and then probably it should have, data duplication.
And some of this should happen in case of, one data center on one
system goes down, you have a big backup data available to act upon.
when it comes to, data storage, typically talking about in, real time
analytics, some of these classic, things are, in memory stories.
so, So data is stored, in memory, like for a quick action, right?
Let's, assume that an user was, trying to, act, say, trying to add to cart.
And then, the user was playing, in that pace for a good amount of time.
So at that point of time, we can immediately act on that event and
start throwing a promotion on it.
If, if, if that's what the business was looking for.
Right.
So, I think some of these classic, in, in storage, or in memory,
things for ultra fast access where, it has a TTL based and all, Redis
and memcache plays a bigger role.
I think mostly, the, Redis and Memcached is primarily used, to
pull recommendations from the cache data to respond instantly, right?
because, As soon as we, get onto the website, we want the
quicker recommendations, right?
So, and, when it comes to, another set of, databases, I think NoSQL
is another popular database.
I think this is primarily required for high scale and
probably large scale data storage.
where, we might have to make, too many, conditions to
derive, to the insert, right?
say, I think Mongo and Cassandra and DynamoDB, some of these are
pretty popular NoSQL databases.
a ride sharing app like, Uber, probably, store the trip details.
Right.
that's what we see whenever we go to the ride sharing app and, and, probably it
also show the driver availability for a quick, updates and probably retrieval.
so, and, in, in, like with the, machine learning and AI in boom, I think time
series, predictions are, used, heavily.
so, and one of the popular, I think for real time, analytics is,
using the time series databases.
so, ideal, for the, data that is, Changing over a time, right?
When we talk about your, xaxis, Xaxis derives your time, and then
why access is your, attribute data.
So let's assume that as I just talked, about like one of the example,
What does my company make tomorrow?
What does my company make day after tomorrow?
What does my company make, after three days, right?
When I take about what does my company make, it could be many attributes, right?
What does my sales look like?
What does my orders look like?
Or maybe what does my store make it?
So based on this data, many, Things can be done, in terms of, I would
be able to, project my inventory.
I would be able to project the volume of the data that is coming in.
And I would be able to project the, people that are entering into the store.
So I will be able to equip well in advance, right?
so, some of these time series, databases are, Influx is one of
the popular DB and then, we have TimescaleDB and then Druid is also
another DB that we all refer to, right?
and, so some of these, monitoring, systems like Prometheus, use,
time series gb, to store, server, performance data, in real time.
and, and another, thing when it talks about, the real time
analytics, say sometimes we wanted to store the data at a larger level.
and then, historical data that I talked about.
so in order for the.
machine learning models to accurately work.
Sometimes we wanted to give a lot of input data to accommodate
seasonal, to accommodate climatic situation, to accommodate how
the journey of the system is.
So we need large, data storages, right?
At that point of time, we, jump into, data lakes.
so, this is where, either we can store the raw data or processed
data, if you wanted to do, Large data processing or large data volumes at
the, uh, in the future case, right?
So Amazon S3, probably and Hadoop, HDFS are, the, some of these, data
lakes that are in the market, right?
so, if you take, Netflix as an example, it stores, probably the user data
in, data lakes for later analysis, such as, finding what's the long term
viewing trends looks like, right?
so what, when we talk about all these, four, different styles of,
data storage systems like in memory or could be NoSQL or could be time
series or could be a data lake.
What are the key features?
that we need to look after a data storage system, right?
it should have an absolute low latency.
It does not matter what kind of a DB that we use, right?
it should have a fast read and probably write, speeds as well
for the real time accesses.
and scalability ability to handle the growing volume of traffic or
growing data volume without slowing down, on the picture and, fault
tolerance, as I said earlier.
so there is initially started off, the data storage should
not lose any data, right?
Even if there is a hardware failure, right?
With all this it also boils down to how flexible to access it, right?
it should, do we need a structured data, semi structured data or unstructured data?
So, when we think about the data storage, fault tolerancy, low latency,
and then probably scalability and then accessibility, plays a bigger role, right?
So, this is what my, reading, through some of how Amazon inventory
data is being stored, right?
so in, so the real time inventory, data has been, like the, when we
talk about an inventory, the item inventory, is being stored in, DynamoDB.
Amazon's DynamoDB, and then it ensures, accurate, stock updates,
for millions of products, right?
When an item is purchased, the system instantly updates, inventory
level, so the customer only sees what is in stock, right?
so when it comes to, probably, Netflix streaming recommendations, right?
It stores data on, user viewing habits.
Like, what are the last movies or the shows that have been seen and what's
the preferences in, in a distributed DB like Cassandra, I mean, the data
is being accessed in real time to suggest, the movies or the shows
are tailored to each user, right?
IoT sensors, so smart homes, I think most of us are equipped with smart
homes in one or other way these days, smart thermostats and cameras.
send, DB, like InfluxDB, right?
this has been stored and processed to send alerts and
then adjust, things in real time.
I think, some of these thermostats also are equipped to, store all these
in near real time, especially some of these, thermostats where it looks
after the temperature and then it tries to adjust what are we using,
how are we using, echo things and then it adjusts the temperatures, right.
so, as we talked about, all these aspects of, data collection and data
processing, as much as every component is important, data storage plays a huge role.
This is, the place where we have equipped or we have stored all this, data, right?
without the efficient, data storage, I think, even the best of the best
real time processing systems that we have built in terms of collecting the
data, in terms of processing, it would absolutely fail, because it would not
be available, when required and then, high latency would, frustrate the users,
because most of the time when we open some of these, tools for recommendations,
the users do expect instant responses.
and then the other challenge would also be if the data is being lost,
we would not be able to make accurate reliability or, we cannot make accurate
insights out of it, in the long term.
Right.
and, as we talked about a lot of things about data storage, yes, we
have the data storage available.
How do we make the data storage available to make some actionable
insights or share it with the users?
Right.
so this is probably the final step, though I have put monitoring
and resilience as a thing.
I think that should boil down with every component of it, right?
So when I talk about data delivery, this is a final step, in the real
time analytics pipeline, where, the processed insights that is being stored.
Right.
associated with the users or applications, right?
This is a place where we share it with the systems, applications, users
to make actionable insights, right?
so, what happens right during the data delivery?
I think in the simpler terms, the data is made accessible to
the required destinations, right?
what could be some of these destinations?
You could either throw it in some of these visualization tools like super
site looker, where, you provide them insights about how the user journey or
how the user analytics are, for the teams or, stakeholders to act upon, right?
And, sometimes you have to provide, this data through, APS and microservices,
as well, to deliver data to act upon.
you might want it to put it in, some of these, Grafana or some of these systems
where, it does a real time alert, right?
So when you identify a, anomaly, like in the case of a credit card,
situation, where a fraud transaction is being detected, you wanted to
alert the user immediately, right?
and sometimes you also wanted to take some kind of a trigger, right?
where, you wanted to adjust the price of the, so the, in the, in the previous
case, you alerted it and then you block the transaction attempts the, in the other
case where you, decided, that there is an some anomaly or some, event, happening
like in the case of o Uber or Lyft Ride sharing app, where, we wanted to, update
a price of, the ride, on the fly, right?
so.
How the data is being, how data is delivered, it, it relies
on, multiple ways, right?
Probably it could be a RESTful API, or it could be a WebSocket, it could be a
Kafka topic where it misses brokers, or, there are some real time notifications
or, There are plenty of ways where, the data is being delivered to, the
systems to act upon, the things, for visualization or probably, maybe for
notifications or maybe alert triggers.
So what are the types of, data delivery, right?
I'm, I'm just going to take a minute and then talk through, types of data delivery
and some real time examples, right?
So a push delivery, right?
sending data proactively to users or systems, right?
say, a, a notification pops up on our phone, when a Uber ride
is, about to arrive, right?
Pull delivery, right?
Allowing users of the system, to request data when required, right?
so, simple example was a stock trading app lets you refresh the page to
update the market prices, right?
and, These are the ways that, the delivery happen in, in the case of push
notification, we practically send it.
And then the other case, the data is available, pull it when required, right?
so again, as we just talked about what are the key features of data
storage, what are the key features of, an effective delivery, right?
in this particular case, the low latency plays again a bigger role because
it has to ensure that there is an, near instantaneous delivery, right?
and, here the reliability plays a bigger role because it's not an asynchronous,
request that you make to ensure that, it's not a fire and forget, right?
There should be a way that it guarantee guarantees that, the
data is, delivered correctly.
Even during, system failures assume that if you are on Uber ride, you
are waiting for the Uber car to come up and then you never have received
a notification, then it's gonna be a juggle to identify which driver, which
car, what information and not right.
And also it boils down to the security aspect, right?
It has to ensure that when we send the data, it protects the data sensitivity and
then data during the data transformation.
so I think, this, brings down to, the section of, the, how do we design and
what are the popular structure, popular, tools that are available in the market
and how do we, get to a state where we design an end to end aspect of it.
again, when it comes to implementing the real time analytics, we have to assess
what's the current state, what kind of an infrastructure that is available.
maybe for everyone, Kafka might not be a feasible solution.
Maybe for everyone, AWS Lambda might not be a feasible solution.
So depends on where the data is available, how the data is available, we will
start with understanding the existing data infrastructure and then derive
what are the possible capabilities.
and, As the data is available, let's try to understand what's the objective
that we are trying to achieve, right?
When it comes to real time analytics, is this, a business
driven or can we make a data driven decision making out of it, right?
so when we start thinking about, the objective and understand the state at
which the data is available and the possible options available within the
company or within the system that you are working on, probably as I just
discussed, discussed about the various components of the technology pieces, let's
pick the appropriate technologies and, and again, it all boils down to, the,
how do you deploy and optimize, right?
So, I, as I talked about it, when you deploy and optimize, right,
the final piece of information, as soon as you are getting into
production or trying to get into your production, right, the monitoring and
resilience plays a huge role, right?
I think, this will ensure that the key components or the key features
that are required for either in data storage or data processing,
data collection or data delivery.
This piece ensures that the system operates smoothly and it can
handle the, unexpected issues.
and probably measure the performance goals like up 99.
99 percent, which is what I was talking about.
We wanted to ensure that we do achieve a no, no downtime,
which talks also about 99.
99 percent uptime, right?
So what is monitoring, right?
So primarily monitoring involves continuously observing the health, and
performance of the system metrics, right?
so this includes probably tracking metrics like response time, throughput errors,
and, what's the resource utilization.
if something goes wrong, monitoring systems, trigger alerts for
quick intervention, right?
What is resilience, right?
it's the system ability to recover from failures, right?
And continue functioning, without, any, any downtime or
probably the data loss, right?
So this will help to ensure that, the system that we are building, the pipeline
that we are building, is reliable and customer satisfied even under the,
high stress or, unexpected events.
So some like say probably when we talk about, the, like some of the
components of monitoring, we are talking about checking the CPU memory, disk
usage and network performance, right?
so it's more like ensuring that the servers don't get overloaded,
on a Black Friday sale, on an e commerce retail platform, right?
And we also talk about application performance monitoring, where we talk
about, tracking the behavior of an application, for latency errors and
probably response times, et cetera, right?
something like, monitoring a, ride sharing app to ensure that ride requests
are processed in milliseconds, right?
And, so probably like, you, we wanted to collect the logs to see,
for debugging the errors, right?
and, and when it comes to alerts and notifications, Grafana, Prometheus,
some of these tools helps to build the dashboards and then probably raise a
real time alerts, within the system.
in case if the data pipeline crashes, or maybe the operating, the operations
team, needs to be notified, or the team that has been responsible to derive
some of this work can be notified through, Teams, Slack, email, SMS,
PagerDuty, and some of these, alerts that are available, in the, system.
while we do all of it, we wanted to ensure that, it can handle failures,
and then probably is the traffic evenly distributed across servers to prevent
overload, and then automatically adjust the number of, resources like
auto scaling, in place, and make sure that is the data copy between multiple
regions or maybe multiple servers.
happening accurately, and, say probably, redirects the traffic to
the backup, backup system when there is a primary system fails, right?
So, I mean, we do have plenty of performance testing tools and
chaos engineering tools to derive, some of these things, right?
I have seen an article about how Netflix is, uses chaos engineering
to test, resilience by deliberately, introducing the failures into the system.
System to ensure it can handle the real world descriptions, right?
and, and, and most retailers, do have to thoroughly extremely test the
server performance, to ensure that it, it's ready for the big sale days,
like Amazon Prime Day or probably Black Friday, cyber Monday, and some
of these large sale events, right?
So at that point of time, they, how to ensure that auto-scaling,
can process these, data, right?
So in order to accommodate, in order to accommodate, the, the real time
data analytics pipeline, as much as we build these systems, we wanted
to ensure that the monitoring and resilience is built in every component
of it to handle, the, large amount of data that's available, right?
so I think what's the sentiment analysis on this, right?
So when it talks about real time monitoring, as I just talked about,
you continuously monitor and analyze customer emotions across media reviews
and support tickets and chat interactions with AI powered sentiment detection.
And then, Proactively reach out to understand how the customer, is
feeling and then try to translate that negative feedback into positive,
experiences by, before the customer reach, we practically can reach
out to identify what's happening.
And then at the same time, we can also.
Understand is this a trend across the board, right, to uncover
some of the larger problems that the application is facing.
So, if we start doing that sentiment analysis on the customer support side
of things where it is applicable, so it absolutely helps the customer
support with 67 percent faster processing requests where there is
a dramatic improvement in response time to discuss, to customer issues.
And, 90%, it's the first contact resolution, right, where the ticket
is not being bounced between levels to, resolve the issue quickly.
and this also helps with, an auto chart, application where we don't need a physical
presence to, address all those, things.
this is what, the industry and some of these articles talks about where,
average, ROI, could be, an impressive.
There are companies which has talked through that.
They have achieved 287%, within the first year of deployment.
And, And 94 percent of them, the customer, feedback, talks about the real time
personalization, plays a bigger role, in terms of maintaining the brand loyalty
and then, so, and it takes on a year, time to implement all these things,
stabilize the data and understand what works and what does not work, right.
So, as we, get into end aspect of the complete real time
data, analytics pipeline.
So, let's take an end to end example, right?
So, I'm just taking an Amazon as an example.
A customer searches for a product, on an e commerce website.
so, so the search event, the event that the customer has been, searched will be
sent to, this is your data collection aspect, website, and then the data, right?
The data.
is being sent to a Kafka topic, right?
so data, if you remember the architecture that we were talking about, data,
source, and then the, data capture.
So data source is your website where we have captured it.
And then data, Capture is placed where we have sent it to Kafka, right?
And now, as soon as the user searches for the product, and then it is
being sent to Kafka for ingestion, and then we have Apache Flink to
process that data to check what's the preference and suggested product.
And it is stored in the DynamoDB, for the quick access.
And then the data delivery talks about the real time recommendation
that appears, to the customers on the screen within milliseconds.
And then, Prometheus monitors across all these pipelines, to ensure that,
the system works in resiliency, right.
so, what does, the real time analytics, do in this case?
Right.
It helps to stay competitive, in the, in today's, digital fast-paced era, deliver
personalized, experiences at a scale.
And, absolutely drive, measurable, results in customer satisfaction,
loyalty and revenue, right?
so I think as we talk about, at the end of, this, so I think companies,
like Netflix, Amazon, Uber and, many retailers and many healthcare industries
and many industries, started setting the gold standards for real time analytics.
by following the design, pattern and architecture that has been discussed,
I think any business can, transform its operations and the customer experience.
with that, I would like to, thank you again for taking this
time to, get onto my, session.
if you have any questions, please do connect me on LinkedIn,
Amaren, in Madison City.
I would like to connect with you and start, helping you out as much as I can.
Thank you all.
have a good day and, good evening.