Data analytics in browser with AlaSQL.JS
Video size:
Abstract
Today’s PCs are more powerful than the chips used in the Apollo mission. Participants will learn to implement a robust frontend data analytics framework using AlaSQL.JS. Data can be imported to AlaSQL very fast, then custom queries can be written to do different complex analytical functions without hitting the server. Participants will also learn to build a custom reporting framework that sits in the browser and is customized for different users’ needs, which is highly desirable for multi-tenant architecture.
In the end, participants will get to know about a few more interesting use cases and open source projects to contribute to.
Summary
-
Today we are going to discuss technology and approach to solving data analysis problem on any devices that run JavaScript. AlaSQL is a lightweight, easy to use client side in memory SQL database designed to work in browsers as well as in node JS. Activating data at age makes it possible to ask patent questions and get more timely answers.
-
AlaSQL also provides a way to use the browser's local storage and DOM storage as a data storage. This is silver bullet for all the edge operations. There are certain limitations. The number of files you increase can make the application suffer.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Jamaica real
time feedback into the behavior of your distributed systems
and observing changes exceptions errors
in real time allows you to not only experiment with confidence
but respond instantly to get things working again.
Close hi,
I'm Michael Korra, CEO and co founder of Globe.
Today me and my co speaker Bhavijit are going to talk about
how we can leverage the processing power of client devices
to do data processing and data analysis.
One such framework is AlaSQL that we are going to
go deep dive into. So by the time this
talk will be over, 12 billion megabyte of data will
be generated across the world and a massive amount of this
data will be generated outside the traditional data center. So cloud
will extend to the cases and it will not be cloud
versus edge, it will be cloud with edge. So that's the future
that we're going towards. Activating data at
age makes it possible to ask patent questions and get more timely
answers. Age can be your mobile, your laptop,
smartwatch, gaming consoles, home appliances
can be anything. Today we
are going to discuss technology and approach to solving data analysis problem on
any devices that run JavaScript. And this talk
is not going to explain the differences or comparison
between different SQL libraries
available for the browsers. It only focuses on
ALS. Now some background
when someone shares an analysis implementation of JavaScript,
you're not just seeing a static snapshot of the work, you are basically running
it real time on browser live. So you
can go beyond passive reading by querying data, tweaking assumptions
and asking more questions, searching and everything.
La SQL is a lightweight, easy to use client side
in memory SQL database designed to work in browsers
as well as in node JS. It is open
source. It has strong focus on query speed,
data source flexibility for the relational database.
Like you can import cases, Excel,
SQLite databases and everything over to here.
Moreover, it can handle schemaless data as well as graph data.
It can handle both traditional relationship tables
as well as nested JSOns which is nosql. You can
export store import data from local storage
index, DB, Excel and many more.
So AlaSQL was initially created in 2014
with the need for running NoSQL capabilities
on browsers because at that point in time there are very few
engines which are capable of running NoSQL features
and then there are a lot of other significant releases.
So here are the details for you.
JavaScript is one of the most flexible language
or frameworks available for communication.
It is flexible as well as it supports
experimentations. Just think of you can just open the developer tools
and do experimentations directly on a wave
application. It also supports collaboration real time.
So think of you are using analysis with
your colleagues remote over the Internet
on a fixed set of data which is running in individual browsers.
Now there are multiple databases available for
client side processing. AlaSQL sits into the requirement
of use cases where we don't require a persistent storage,
we just need fast in memory data processing for
joining, filtration, grouping of data, searching and everything.
Moreover, it is also suitable for a use case where there is a large volume
of data on the server. You query to do the initial filtration,
take the data back to the browser and then run processing,
pre processing and other sort of queries on top of
a few specific features of AlASQL are fundamentally important
for this topic where we are discussing about processing
data on age on browsers. So it is
widely supported, it is extensible,
it has the ability to execute SQL against any data set,
against JSON arrays and all.
So it is fast, it is in memory. SQL data processing
is very much important for bi use cases, for ERP
applications, fat clients and many more. ETL is
also one of the very important use case. It also
supported in most of the major browsers,
mobile applications, node cases and all.
It supports inbuilt compilation,
query optimization like any other SQL engines.
It supports indexing,
pre indexing as well. Complex join operations can be
run very efficient. Now I'm
going to discuss about few fundamentals of AlaSQL,
so consider a use case where you need to augment
the capabilities of SQL statements with actual JavaScript
function, custom or inbuilt. So you can also do that. I'll show
you some examples after the stop after this slide.
It has flexibility to import data from various sources,
be it Excel cases, JSON and many more.
If you're working in a use case where you need to extend the in memory
capabilities for persistent storage so that you can also do them.
Export the data to any of your required format,
then few of the performance threads so
the compiled statements and functions are created when
you actually write the SQL statements where
expressions are pre filtered for the joins,
joint tables are pre indexed.
ArieSQL uses hash tables for its indexes
and upon index creation all entries in the table are hashed
and stored in the database object. Sorry, in the JavaScript object.
Now let me talk about the engine. So when
a statement in SQL is parsed and
processed, so it is segregated into an abstract
syntax tree which is kind of a nested JSON.
So I have given you one example. So where this is
the SQL statement and after the
parsing and formation of the abstract
syntax. This is something that is actually processed
in the engine. Now I will hand it
over to my co speaker Akajit for a technical deep
dive. Before going into the details, I would like
to share a small application which I have created which can shed some light
into the potential of this library. What I have created
is an application which can pick a local file
and then render the records and then we can
apply any operations you like. For the example purpose,
I have created two simple examples. Search duplicate,
count and average. Search duplicate is going to give you
all the duplicate records and count
and average as it states can provide you all the count,
average and sum. These can
be done by basic JS logics, but what about
the complex join or joining two files at a time?
Those will not be as simple as such to me.
LSp provides variety ways of reading files
importing from csv text files Excel
files this feature can be useful in cases where we
want to create a software where we just pass an excel
or csv file and do some operations over it.
Now I can now
what if a requirement comes which can't be solved by quake?
AlaSQL provides its user defined functions which we
can use to create our own functions and then use it in the quake.
AlaSQL also provides a way to use the browser's local storage and
DOM storage as a data storage. Let's say
for a scenario where you have to send a large data
set to the server for processing. What you can do is
you can do the processing in the client side and then send the data to
the. AlaSQL also provides a way to
use the browser's local storage and DOM storage as a
data storage. Let's say for a scenario you want to send
a huge amount of data set to the server for processing. What you can
do is you can process the data in the client side and then send the
process data to the server.
This can be customized in the other way around.
Also, let's say you have a huge amount of data set present in the server
but it needs preprocessing every time client
interacts with it. What you can do is you can send the whole
data to the client side and then do the processing and render the
data in the client side rather than requesting it every
time any kind of changes is made by the client.
This is silver bullet for all the edge operations.
There are certain limitations. Let's say you are bringing in a
huge amount of data set from the server side to the client side the
amount of time it will take to take the data from the server
to the client side. That time will make the application
suffer. Then there are join operations.
Let's say you are uploading multiple files to apply join operation
across three to four files. Now keep
in mind the number of files you increase can make the application suffer
because it's going to apply the operations over multiple records which
the data is been added in library. Also, as they have
mentioned in their talks, joining a subselect does not
work. Thank you.