Transcript
This transcript was autogenerated. To make changes, submit a PR.
Are you an SRE,
a developer,
a quality engineer who wants to tackle the challenge of improving
reliability in your DevOps? You can enable your DevOps for
reliability with chaos native. Create your free
account at Chaos native Litmus Cloud hello
everyone. In today's session I am going to talk about
a streaming and near real time analytics this
near real time analytics, not a real time one.
So if you see in today's world,
most of the companies are looking for a
near real time analytics or real time analysing, which are of course
challenge to achieve most of the time. So gone
are the days when you would get the data,
next day you analyze for the next one week and try to figure out what
we need to do, how to react to a situation like a
fraud or anything like that. So things have
changed. To create value, companies must
drive real time insights from a variety of data sources that
are producing high velocity data and
of course in huge volumes. So having
analytics solution in place would enable faster
reaction in real time to events which are affecting business.
So the need for analyzing heterogeneous data from multiple
sources, whether it's internal, external, doesn't matter, are more
than ever, thereby making the analytics
landscapes ever evolving with numerous technologies
and tools and making the platform more and more complex.
So building a futuristic solution is not only
time consuming, but costly because it involves
quite a lot of things like selection of
right technologies, acquiring the right talent pool,
ongoing platform management and operations and
monitoring. So what
I'm going to talk is how to make this platform build
bit easier and less expensive.
Keeping in mind that most of the time building analytics solution is
an afterthought or other. Okay, you first build the features,
core features, and then we'll talk about analytics. So when
that is the time, you probably have less budget left.
So in this session we'll discuss and demo how you
can leverage AWS producing and services to create
a near real time analysing solution with
minimum to no coding.
Note, minimum to no coding for
an e commerce website. Of course it's a demo website. I've just
built a one pager website just to showcase how the data flows from one end
to the other end at the same time, and how
you can integrate with the preexisting data sources if
you need to. And most of the time you probably need to because you would
probably need to integrate with the other back end systems
and make a joint of analysing
and reporting. So of
course the solution need to have set of advantages which is no
different than any other one. In this case, it is easy
to build AWS, I said there's no coding or a very minimum
coding depending on how exhaustive your feature set
you want to build. Then it is
elastic and fully managed. So it is auto
scalable horizontally and vertically and
fully managed by AWS. It is pretty much serverless solution and
then it is as always, any other AWS product and services
is highly available and durable.
Seamless integration with other AWS services like Lambda
ECS, Fargate or EKS RDS
S three which is again a core aspect of the whole solution,
which is the data lake. And last but
not the least is the cost.
It is pay as you go.
Like if you don't use it, you don't pay it.
So let's quickly go over the agenda.
So over the next 1015 minutes, I'll quickly go
over why real time data streaming
and analytics principle of data streaming
and near real time streaming on AWS.
What are the options we have in hand? And at the end I'll go
over one use case and the demo covering
that use case end to end.
So let's quickly turn our attention to the why
real time analytics, as I briefly covered
earlier, is companies
must drive the real time insights from a variety of data sources.
And Gartner in 2019 emphasized that,
saying data integration requirement, which these
days demands more of a real time streaming,
replication and virtualized capabilities.
Gone are the days when you do
offline processing in days or weeks or months,
right? So I think that pretty much sets
the scene. Now, before I go to the details, I just
wanted to take you through a quick case study about this.
Epic Games Fortnite. So real time
data streaming analysing guarantees in this game that gamers
are engaged, resulting in the most successful game currently
in the market. So for those who are not really that
familiar with, what is this all about? Fortnite is set in a world
where players can coordinate among,
cooperated rather on various missions, on fight
back against a mysterious storm, or attempt to be the last person
standing in the game's battle royal mode. It has
become a phenomenon, attracting more than 125,000,000 players
in less than a year. So it is quite popular in that sense.
So what is challenge here? So it is a free to play game
with revenue coming entirely from in game microservice
transactions, meaning its revenue depends on continuously
capturing the attention of the gamers through the new content and
continuous innovation. So to operate this way,
Epic Games needs an up to date or up to the
minute understanding on the gamer satisfaction.
Helping guarantee the experience is one that
keeps them engaged so that's
a challenge, right? Because you need to understand every gamers,
how are they reacting, how make
sure they're happy all the time. Their experience is seamless. At the same
time, collecting data, sort of the solution solution was so
Epic collects billions of records on a daily basis,
tracking virtually everything happening in the game, how players interact,
how often they use certain weapons, and even the strategies
they use to navigate the game universe. More than
14 petabytes of data are stored in, in the data lake,
powered by Amazon S three. So Amazon S three plays a significant
role here, growing two petabytes per month.
It's a massive amount of information.
So as you can see, data loses value over time.
So it is published by Forrester. And if you
see from left to the right,
the time critical decisions are
made within minutes. And as you move towards the right,
it becomes more of a historical how it should be used for
batch processing for business intelligence report or machine
learning training data. So the value
of data diminishes over time. To get most
value from the data, it must be processed at the velocity
in which it is created at the source. So organization,
in pursuit of better customer experience will inevitably need to
start driving towards more reactive,
intelligent and real time experiences.
They just can't wait for data to be batch processed.
And this making decisions and taking actions too late.
So reactivity will differentiate your business.
To achieve a better customer experience, organs need to work with the
freshest data possible. So I think that's pretty much,
pretty clear here. Now,
if I go further down and
analyze the different use cases of the data,
and what kind of use cases would
you have in organization, in terms of timeline,
of using the different data? As you can see here, the messaging between microservice
is a classic example where you need a millisecond of delay.
You can't afford to have minutes here. So response analytics, it's a way
of an application and mobile application notification, they need to
happen within milliseconds. When things are happening at the back end of the front end,
the micro interactions are happening, then long ingestion
that could you consider IoT, device maintenance or CDC.
When you capture data from source to the destination
database, those scenarios you can think of having in
seconds delay, whereas in a typical ETL,
in a data lake or data scenario, you can have minutes or hours
or days of delays in terms of analysing.
So again, this clearly articulates the need
or the importance of data over time
as we move forward. So what is the trend?
So, one of the great things about data stream is that many customers find they
can be used for messaging and they enable the
development team of real time analytics application down the road.
As a result, we are seeing customers replace message queues with
data streams to provide an immediate boost in the
capability of their architecture. So they're
effectively, they're moving away from batch workflows to
a lower latency streaming build application and then
data streams are event spinal cord for services. So streams have become
the backbone of eventually Microsoft
service interaction. Their messaging is also there, but it's slowly
getting into the real time stream kind of mode. And of
course we have KmS. We'll talk about a little bit around one of the AWS
services, which is managed Kafka service.
Again, we talked about CDC change stream
database and any stream machine learning, any ideal time
automation that is also slowly getting becoming popular.
So the fundamental message is here
the world is moving towards stream based, which is
near real time or real time stream based interaction
across systems. So what
happens here? So effectively, you ingest
data as they're generated,
you process without interrupting the stream.
That is important because when you are processing data,
your ingestion should not,
whole streaming process should not get disturbed,
right? Or the delayed in the person. And so you have ingest
and then your ingestion, then you are processing the data,
and then you are creating analysing, which will be real
time, near real time, or completely a batch based analysing.
So the idea is to decouple each one
of them. They're making sure they're all frictionless at the same
time. Create that real time, near real time
experience for the consumers,
right? So fundamentally, if you
look at the principle of data streaming,
there are producers which
data can be produced, captured and processed in millisecond data,
then data buffered, enabling parallel and
independent I o. And data must be captured and
processed in order that they're produced. So there are three fundamental
need. So number one, in order to
be real time, data needs to be procured,
sorry, produced, captured and processed in
milliseconds. If not, you can't react
in real time. That is important. So you have to be able to produce,
capture and process in milliseconds in seconds.
Second, you need a system that scales up to support the
ingestion needs of your business, but also allows you to build
your own application on top of the data collected. Otherwise, you'll need
to be chaining data fits together
and which adds latency and erodes your ability to
react in real time. Third,
ordering is critical because your application need to be able to tell the
story of what happened, when it happened,
how it happened, related to other events in the pipeline.
So that is also while you're talking about real
time, the sequence of event
or sequence of record process also is extremely important.
Otherwise you lose track of when things happen within the
when things are happening in really fast.
So all three are equally important.
AWS I articulated. Now moving on.
So what are the challenges of data streaming? So organizations
face many challenges as they attempt to build out real time data streaming capabilities
and embark on generated real time analytics.
Data stream are difficult to set up, tricky to
scale, hard to achieve, high availability, complex to
integrate into broader ecosystems, error prone,
complex to manage. Over time, they can become very expensive
to maintain. As I mentioned earlier as a part of my introduction,
these challenges have often been enough of a
reason for many companies to shy away from such projects.
In fact, they always get pushed for those various
reasons. The good news is, at AWS,
it has been our core focus of the last five plus years to build
a solution that removes those challenges.
So let's talk about a little bit of what is
there for real time, near real time streaming
on AWS. So how do you address those in
AWS? The AWS solution is easy to set
up and use, has high availability and durability
default being across three regions is
fully managed and scalable, producing the complexity of managing
the systems over time and scaling as demands
increase, and also comes with seamless
integration with other core AWS services such as
elasticsearch for log analytics, s three for
data lag storage, redshift for data housing purposes,
lambda for serverless processing, et cetera. Finally,
with AWS, you only
pay for what you use, making the solution very
cost effective. So basically
what I'm saying, you pay only
for the part you use. If you don't use, you do not pay for
it. So that makes the whole solution is
also very cost effective. It's of course one of the biggest criteria for
making a decision for building analytics how much it's going to cost
right on a first time, as well as on an ongoing
basis in terms of maintenance. So let's talk about
streaming data architecture. So,
data streaming technology lets customer systems ingest,
process and analyze high volumes of high
velocity data from variety of sources in real time. We have been talking
about it. So you enable the ingestion
and capturing of real time streaming data,
store it based on your processing requirements.
And this is essentially what differentiates this from MQ type
of setup and process. To tap the real time insight,
you can set alerts, email notification, trigger other
event driven application and finally move the data to
a persistence layer. So what are those steps?
So your data sources,
devices and or application that produce real time data
at high velocity. Then your
stream ingestion data from tens of thousands of data sources can
be written into a single stream. You need to have a pipe where
you can push the data through. And then
once you push the data through, you should be able to store that. So data
stored in order, in the order received for
a set duration, and can be replayed indifferently during this time.
So some of the AWS,
not some of the AWS, the kinesis products like
Kinesis data stream you're talking about. You can
store those streaming data for
a year, so you can effectively replay as many times you want and if you
need it in future debts. But it can be up to one.
You can decide to keep it for a month or one day or any duration,
but it is up to three years, up to up to 365 days.
And then once you store it, then you
then process that data so records are read in the order they
are produced, enabling real time analytics or streaming ETL.
So that is the time we'll again cover this as a part of Kinesis
data firewalls, the product we'll be using for our demo.
And then at the end you store the data for
a longer duration. It could be data like s three or database or
any other solution as you might think of what
are those near real time or real time streaming producing
within AWS. So this is the kinesis set of products.
We made it very easy. Customers can collect, process and analyze
data and video stream in real time without having to deal
with the many complex cities mentioned
before. The Kinesis services works together and provide
flexibility and options to tailor your streaming architecture to
your specific use cases. We have Kinesis
data stream allows you to collect and store streaming
data in a scale on demand. The firehouse Kinesis data firehouse
firehouse is a fast and simply and simple way to stream data into
data lakes or other end destination again at scale,
with the ability to execute serverless data transformation
as required. And then you have data analysing
allows you to build, integrate and execute applications in
SQL and Java. These three services work together to enable
customers to stream, process and deliver data in real time.
Then you have MSK
AWS managed streaming for Apache Kafka, which is
fully managed service for customers who prefer Apache Kafka
or use Apache Kafka alongside Kinesis to enable specific use
cases. The fully managed service and then at the end you have
the Kinesis video stream allows customers to
capture process store media streams for playback,
analytics and machine learning.
Out of this file, we'll be using two of them. The first two Kinesis data
stream and Kinesis data firewalls as a part of the demo.
Moving on, so little more about Kinesis
data stream and the firewalls
I'll cover. So KDS is popularly known as
Kinesis data stream is massively scalable and durable real
time data streaming service. It's continuously capture gigabytes
of data per second from hundreds of thousands of sources
such as website click streaming. That is the one we'll be
using. Website click streaming as a part of the demo database event
stream, financial transactions, social media feeds, it logs and
location tracking events. The data collected is available in milliseconds
to enable real time analytics use cases such as real time dashboard,
real time anomaly detection, dynamic producing
and many more. So make
your streaming data available to multiple real time analytics application
to s three or to AWS lambda in events
milliseconds of the data being collected.
That is fast, right? You can't probably get
better than that and getting the data from which is being ingested
and pushing it to the s three or lambda within 70 milliseconds.
It is durable and it is secure,
easy to use. It has got call it
KCL Kinesis client libraries, connectors,
agents and it is easily integrated
with lambda and kinesis data analytics and
data firehouse. It is elastic, dynamically scale your
application currency data sync can scale from megabytes
to terabytes of data per hour.
Scale from thousands to millions of put records per
second. You can dynamically adjust the throughput of your stream at any time
based on the volume of your data input data and of course low cost
can see detectives has no upfront cost. You only pay for the
resources you use for as little as zero point
15 events. Zero point $15
per hour of course. For the latest
pricing you can go to the website AWS website and get
the information in more details then moving
to the firehose.
All right, data firehose
again is the fully managed service is
the easiest way to reliably data load streaming
data into data lakes again. We'll be using this as
a part of the demo to load the data in S three.
It captures transfer and load streaming data into s three redshift
elasticsearch splank, enabling near real
time analysing with existing business intelligence tools. That's what
you are already using today. It is fully managed service that
automatically scales to match the throughput of your data and
requires no ongoing administration. It can
also batch, compress, transform, encrypt data before
loading it, minimize the amount of storage used at
the destination and increasing security. You can easily create firewalls
delivery stream from AWS console, configure it
with few clicks and again, we'll cover this. How easy
to configure it because we'll be literally finishing the demo
end to end. For an ecommerce website,
click streaming data collection in 15 to 20 minutes time
and within that we will be setting up Kinesis data stream.
Kinesis data firewalls lambda s
three and then at the end Amazon quicksite.
All can be done within the demo time of 20
minutes. So in realtime you can spend few hours to set
it up and get the data for the first time. When you
attend for the first time with ekinesis
data firehose, you only pay for the amount of data we
transmit through the service and is applicable for
data format conversion.
There is no minimum fee or setup fee as such,
like many other AWS services.
So what we'll do, we'll now
see the different.
We have seen the five steps
of data stream architecture and we see
how those steps can be aligned with AWS product
or how
they fall within the AWS product we have been talking about.
So this is an example. Like from the left, if you see there are
data sources which producing
data in millions of
records from different devices, different applications, and which
is getting streamed. And then
the stream is being stored in
KDS, which is kinesis data stream. And then the stream
is being processed using Kinesis data analytics
or kinesis data firewalls. Or you can actually use
Kinesis video stream which is not shown on the screen here.
And at the end you
store those data for a longer term analytics.
For a longer term analytics. So that's pretty much I
wanted to discuss and take you through the
AWS streaming solution before getting into
the use cases and the demo.
Now let's start the demo. But before I actually go to the
browser console and take you through the entire demo, I just wanted to
spend few minutes to explain you the use case
and the architecture behind the entire demo.
So it's a very simple use case. I have a demo website
which is listing list of products and it has got two actions
is a buy and view details. The objective
of this demo would be to capture the
user actions like how many people are clicking on buy and which
product they're buying. As simple as that.
Now quickly. So from the left,
if you see the users logging in or
signing up. Then they browse the product, then they view
the product details, then they decide to go ahead
to buy or not. That's it. Now let's move into
the actual architecture.
So the simple website is hosted in s three bucket
and the website is being accessed through the cloud front. And in front of
the cloud front you have the web application firewall WAF,
popularly known as now. Every time
user clicks on something, action HTTP request goes
to the cloud front and that
click is streamed through the click. Information is streamed
through the Kinesis data stream.
And then from Kinesis data stream,
it is consumed by kinesis data firewalls.
And once it is consumed by data firewalls,
for every record which is being consuming by data fire, you can trigger a
lambda to do additional processing of the data which is being consumed
by data firewalls. And then
once it is processed by lambda, it goes to
the s three. Of course, kinesis data firehose can
send data to many destinations, including s three, redshift and
other dynodB, et cetera. So in this
case we are actually pushing the data, all those click string information from
cloud, from, through the Kinesis data string to the firehose,
to the s three. And once it goes to s three, we're using the
serverless ETL platform of AWS
glue and crawler and accessing
further, creating a data model for the
whole analysing platform using that glue and crawler.
And then we are seeing the data through QuickSight and
Athena integration. Again, everything is serverless, few clicks away from
actual data coming through the quicksight, the entire operation
could take up to, depending on how
you automated the whole setup would take up to
few minutes, less than five minutes, definitely.
Here in the kinesis data stream, there's a minimum
buffer time of 1 minute. So you have to have a buffer of 1
minute. Means when the data comes from cloud front to the kinesis
data stream to fire up, there's a minimum delay of 1 minute because it accumulates
for a minute and then pushes to the next level. So by
the time it comes here in s three, maybe a couple of minutes, and then
you do the crawler trigger and the workflow
goes through, puts the data and you can do data
integration with your other systems. And then whole integration
might take few more minutes, and then eventually in a quick site it will appear.
So let's say in five minutes, you will get the data from the
time somebody clicks on a product and by
the time you see in a quick site. So idea is to,
in this case, to see which maybe you can see end of the day
or maybe every hour, which product, when you have launched a product,
which product possibly is more popular,
which one people are buying more,
reviewing more those kind of information. So you don't need to build any other analytics
platform, but just use the Clickstream data to see in your
Quicksight which could be used by the business users.
Now with that, I will now move
to the browser to
show you the entire architecture.
So we'll start with kinesis data stream.
We'll create that I've already pre created, but I'll show you how to create
it. And then we'll move to cloud from and see
how the cloud from is binding to the kinesis data stream.
So let me switch to browser
now. Ah,
all right. So I have kinesis
data stream here. I've already created a data
stream here. Conf fourty two website demo. I can
create one, but it takes few minutes to deploy. So it's very
simple, straightforward. It basically asks you what is the capacity of
your kinesis data stream and how is the capacity is calculated,
is depending on what is the average size of
the records and what is the number of records per second is coming. So let's
say you have ten records
per second coming in and you have only one string to capture.
And then it will calculate how many shards do you need. So for more information
about the shard and everything, you can go to the edible documentation
when you have time. So I've created that.
Now let's move on to cloud front. So I have created a cloud front
distribution which is pointing to my s three origin.
So this is my s three origin. Basically they all pointing to
the same s three origin, which is basically a single
page website like this. As simple
as that. I have buy and buy buttons and
I have view details button.
That's pretty much now
let's move into cloud from once I have created, let's work
with this. Once coding with
QY. So what I've done is I have just gone to the logs.
In the logs you have real time configuration.
Real time basically takes the click stream, which whatever
request goes through the cloud front, it can capture that.
Now let me create one,
but I'll not save that because I already created one. But I'll show you what
are the information you need to provide when you
create the configuration.
So give a name, give a sampling rate. This is nothing but what
percentage of the click stream data you want to capture. Is it 50%?
Is it 100%? Let's say I put 100% and
then which field of the request you want to capture.
All these fields you can capture which is coming as a part of the HTTP
request. But most important bit for us would be
to capture this uri query
parameters. Apart from that you can capture other information like
the country, the port and IP addresses, et cetera.
And then what is the endpoint, where will the data get pushed?
So remember I created KDS, which is kinesis data stream.
Confront conf 42 website demo which is nothing.
But here in the other tab, if I can
go to other tab, yeah, this one, I'm connecting to this
kinesis data stream now. Once I connect it,
okay, now back to cloud
font, real time configuration as I showed you. You can pick up
whatever information you want from the request
which is coming in now. We have decided to go with 100%
subliminates every request we considered coming in. And of course it
is getting delivered to this kinesis data stream which we already
created earlier. Now let's go
to the architecture bank and see how much
you have covered.
I'll just pull together the architecture slides. Here you go.
So here we have already have the s three bucket with the
website. We have configured the cloud front, we have configured the
data stream. So now the requests are all coming from,
every request is going to the cloud front. The stream is also getting
delivered to kinesis data stream. Now what we'll
do, we'll set up this so that it can consume the record, those records,
click stream records from Kinesis data stream and
then process by lambda and then pushes to
s three bucket. Let's go back to the
browser.
So we have kinesis data files, I've already created
one, let's see, what information do you have in that?
So as always,
this is the source, which is, as we have seen, the architecture
diagram. We are getting the source from kinesis data stream.
Then we are consuming that, transforming the records
using a lambda function. And the lambda function is nothing but
just taking the input, see all those attributes
which are coming from the header from the request and then messaging it,
taking the product id, product name, status, et cetera. Very simple,
straightforward, but you can have of course much more complicated lambda
function which can create different alert, et cetera, whatever if
you want coming back. And then I'm putting that into
the destination bucket which is conf 42 website demo
September 2021. And then I'm
just creating a prefix for the bucket so that I know which day, what month,
which month, year and day and hour the request came through
for my further analysis. And I have some
other configuration which, for the timing
you can ignore like encryption,
et cetera. Now I also have a set of
a bucket for all the error. In case there is a processing
error happens, it will go into that
bucket. Now let's
go to s three. Since I've already sent
some request. My s three bucket is already populated with
some of the records. If you see here, the website is my year
2021, and then I have month
and then I have day ten and day eleven events
of September. And I had at 01:00
UTC I have request came in and then I
had 02:00 UTC the records came
in.
So now data is all there in s three. Once it goes to s
three, if you remember, we have the glue and crawler.
So I have a workflow already created here.
And what it does, it actually goes
through the s three, creates a data catalog, database catalog.
So I have the database called Conf 42 demo,
and I have the table which it
has created called website. Because the
prefix of this s
three bucket is website here. So it has picked up that and
then back to console here. If you see in this table,
it has actually picked up all those attributes
which are there, part of the request, it's all picked
up and it has done a partitioning based on near month,
day and hour, so that the analysing your
analytics can run faster and does
the processing based on those partition. And those are unique.
Now I have the data here,
I have the tables created.
Then I just need to trigger this workflow. What it
does, it is again crawl through the latest data
which has come in s three bucket and puts it into the table.
So as the visitors visits the website, the data keeps
coding in and you can schedule your
workflow to run as
frequently as you want. And then every time the
workflow runs, your s three data lake
will get populated with the latest data and
then the same data. We can pull it from the quicksight.
But before I go to quicksight, I just want to show you how
the workflow looks like. Right. So this is typically workflow. It is crawling and then
it's populating the database within s three,
the tables within s three.
So I can see that show that there are multiple run happened
for this workflow. And if I open one of these and it tells you
every steps has gone through successfully, and for last time, I ran here.
All right, now let's go to Quicksight.
So, quicksite you can create quicksighted dashboard
from different sources. As you can see on the screen
out of the box, we could have done from s three, but only
thing in that case, we couldn't have done the data integration from different sources,
whereas in our case, in Athena,
you can get data from different sources, you can create
a workflow, and then
using the workflow, you can integrate that and create a real
data set for your analytics tools to pick up.
So I'm not going to create, because I already created the
data set, and I'll show you the data set which I created.
This is the data set I created.
It has already 34 rows in it.
Just refreshed it
into last twelve minutes, twelve minutes back. So let's
see whether doing a refresh again,
it increases the number count. I am not sure whether
I did any more request in between, but while that is happening,
what I want to do is I want to show you how the data is
actually coming into the
quicksight. Right?
So this is the data which you
can see in the quicksight by
day and hour. You can see the
dark blue is the data on 10th, which came
10 September, which is at 09:00, 10:00 twelve and 01:00
as in 11th, it is at 01:00
and 02:00 in the afternoon. And then you can do
whatever dissect, dissection of the data you
want. If you want product name, you can get the product name,
which product was clicked more. You can
do by hour as you have seen,
or you can do by month. But in this case, there's only one month of
data available on the 10th and 11th.
You can see that. So that's pretty
much. So if I go back to the presentation
for a moment,
it. So in summary, I think we have covered pretty much end to
end. If you take a look from the left,
the user is accessing the website which is
hosted in s three through cloud front. Every request
which goes through Cloudfront. Also the request
log stream goes to the Kinesis data stream. And the
Kinesis data stream is connected to firehose
as the Kinesis data firehose as a consumer. And when
the records comes to firehose, the lambda gets kicked in
for everyday record. And you can do anything you
want with that record using the lambda.
And you can use that lambda to communicate further downstream
systems if you want. For any specific scenario, if you have or
you can send a notification, SNS or email, you can trigger email, anything you
want as such. So let's say you have a very high
value product which people are clicking quite a lot but not
buying. So you can have a scenario where you can count those
and keep counting it and keep the count in a dynamodb
database. When the count reaches a certain number,
lambda can send a notification raising concern, right, so you
can do all those different scenarios and then after that
it goes to the s three. And again we did a prefix
which is website near month, day and
time. And you can do whatever you want to do. You can define
that. Then that is picked up the
crawler glue job and the crawler picks up the data from s three based
on the frequency fix which has been set
up. And then eventually through Athena Quicksight
integration, the data is available in the quicksight. So if you see the
entire end to end architecture, there was no coding involved, only the lambda was
used just to manipulate the
click stream, which is coming for a better for the clarity of the
data which goes through the s three to the quicksite. Apart from that, there's nothing
else effectively, you know the user
behavior, how they are clicking the different
products in a quick site in minutes.
That's pretty much. Thank you for joining
in. Good to have you guys.
And if you want to learn more about AWS
analytics platform, there are quick site, of course those are all available.
You can go to AWS website and based
on interest which area you want to focus, you can get additional information.
Thank you once again. Have a wonderful day.