Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, I am Kalyan Prasad and I'm going to talk about
financial network analysis using Python.
Thank you so much for joining my talk and
I am really so excited to be here today.
Some cheap marketing. I am a self taught data scientist and
analytics manager. Yeah, of course I'm a community person.
I love being involved with different communities and I try to help those
communities as much as I can. Currently I'm associated
with following organizations called Pycon India, Pycon Hyderabad,
Hyde, PI and humans for AI where I perform different
roles and responsibilities. In all of these organizations I
always love to give back to community, so I always look for an opportunities
to share my knowledge and I also do
mentoring in hackathons and also in other
community activities. So these are my
social platforms. Feel free to follow
or connect to me and in case if you have any feedback
or suggestions or anything for me, feel free to
write me. I'll be responding to each and every message
that's pretty much about me. So here is the outline
for today's talk. We'll start understanding history and graph.
Then we'll talk about what are networks and how we can construct
the network structure. And then we'll see the financial evolution
networks. And then we'll try to understand the power of an importance
of python graph. And then we'll straightaway
see some actions on financial network analysis with
two different case studies. So without any further delays,
let's get started.
History and graph as we all
know that data visualizations is a powerful way to simplify
and interpret the underlying patterns in the data.
The use of graphs is one such visualization technique
and it is incredibly useful and help business for
making better data driven decisions. Now,
what exactly the graphs are? In order to
understand the concept of graphs, we first need to understand the concept called
graph theory. So here I'll be quickly talking about
the origin of graph theory to get a better understanding
of graphs. Graphs were first introduced
in the 18th century by swiss mathematician called Leonard
Euler. So here's the one has
attempt and ultimate solution to the famous coins
with Mitch problem which you are seeing here are
generally referred as an origin of graph theory.
So we'll try to understand what exactly the Coinsburg
bridge problem is and how Euler has solved that problem and
how the origin of graph theory has been raised.
So first things first. So Coinsburg
bridge has four main areas and seven bridges.
The question asked here was pretty straightforward. Can you
cross each bridge only once and return to the starting points?
So while you are creating bridge, you should keep two things
in mind. First one is you should not
uncross any bridge. Second is
each bridge should not be crossed more than once.
So, Euler insight for this
problem here has the only relevant data is main areas
and bridges connecting them, meaning that Ehler
recognized the relevant constraints are four main areas
and seven bridges. Then he has
drawn a first visual representation of a modern graph which
you can see here. So this graphs basically represents
a set of points which are known as nodes
that connected to set of connecting lines which are known as.
So this was the problem and this was these insight he has shared.
Later, after experimenting with multiple graph, with alternating
the number of nodes and edges, Eulera has abstracted this
problem and created a very generic rule case
on the nodes and relationships that apply to
any connecting system which you can see here.
So you can see the nodes and relationships that can be applied to any connecting
system. So from there, the origin of graph theory has
been demanded for decades. In modern
times, graph algorithm, graph applications and graph
analytics has been booming and exploiting
in multiple industries. Now.
Now, what are networks? Network data
are generated when we consider relationship between two or more
entities in the data, like highways, connections,
cities or friendship between people or their
phone calls. In recent times, a huge number of
network data are being generated and analyzed in multiple
fields. For example, in sociology,
there is a huge interest in analyzing block networks
which can be built based on the citation to look for the discussion
in the structure between political correlations.
Networks has been extensively studied in
the graph theory, an area of mathematics. So networks are known as
a graph in mathematics. In a
nutshell, networks is a system with nodes connected
by linkages. A node can be a
person, firm, industry, or even a
geographical area. Correspondingly,
different types of relationships are represented as linkages.
Each nodes and edges can hold specific properties
which describe its characteristics.
Don't worry if you are not able to catch what exactly the
node is or what exactly the edges and how they
can hold specific properties and all those stuff.
I'll try to explain all these things with an interesting example
in the next slide.
As I mentioned earlier, that network consists of
two main items, nodes and edges,
which generally form a network or graph networks
are also associated with a metadata,
meaning that networks can hold some metadata
with them. Now let's try to understand the
network structure with some interesting example.
So the best part of any conference is
all about networking. Either it can be a physical
conference or a virtual conference. We all
love to do networking. Do you agree with me or not? I'm sure you
will definitely agree with this statement.
In a conference we all love to do networking. We all love to connect people.
We all love to build relationship friendships. So considering
the same conference example. So I'll
try to explain this network structure.
So let's say that Kalyan and Mark,
who are two friends and these are
connected on 20 been January 2022 at
conference conference. So the nodes here are the Kalyan
and mark, and they also have
a metadata associated, which are stored
as a key value pair back to python
dictionaries, where we have a key value pairs.
So the key, these are the age and location,
and the values here are the number and the country and
the conference friendship is represented as a line between nodes.
And it also has a metadata associated with data,
which is known as a date, which meaning that the date
when we actually first connected.
So this is
how our friendship has been built through conflict
network. So that is why I named it as confirm
network. So, coming to the exact point
of a networks structure here, this is exactly how we define
the network structure for any problem when we are
dealing with network analysis in real time.
I hope you got a
better understanding on the network structure.
So here are some real time examples of
a network analysis. First is social networks
like Facebook, Instagram, Twitter. So in
social networks, we model the relationship between
people, for example, we try to identify
these influences in social media, and we model the relationship
between those influences. So those sort of analysis
we do in social networks when it comes to biological
network. So in a human disease network, we study that when
two diseases are linked, at least if they share,
if they try to share at least one common gene.
So those kind of studies we do in human network analysis,
when it comes to financial networks, we study the
correlation between stock based on their daily prices
or any other parameters. And there are also many other examples
in different domains. So all of these
complex network analysis can be understanding better if we
see through the lens of a network. So I
believe that with this, you got a pretty fair
understanding and connection about graphs,
theory, networks and data science.
So next we have is indicators. So, indicators are
very much important in network analysis.
The crucial thing in network analysis is to identify
the important nodes in a network. This is known
as measuring the centrality network. So the
centrality aims to identify the most important node in
a network. So in a simple terms,
how central our node is within patent graph.
So different nodes could be considered as important depending
upon how importance it is defined.
And centrality also has a different flavor and
each flavors become. Each flavor defines the importance
of a node in a different way, which leads to an inequity
of measuring centrality. So some of the most commonly
and often used in real time
flavors are degree centrality, closeness centrality between
a centrality. I'll quickly talk about
all these flavors. Again, I'm not
going into in detail about each of these flavors because it
goes again beyond the scope of the stock degree
centrality. So as the name mentioned, that a nodes
node with a higher degree has a higher centrality,
meaning that the higher the degree of a node
these more important it is in graph. So that is why we call it
most connecting node closeness centrality.
So this centrality calculates the
shortest path between all nodes and assigns
a score for each node based on the
sum of its shortest path. So it is a
fastest communicating node. And finally,
between a centrality measure the number of times a
node lies on the shortest path between other nodes
and it represents the degree of which nodes stands
between each other. So this is the most influential nodes
in a graph. So that's all about
indicators. And next is the
most awaiting and important topic in our talk,
which is a financial evolution. Networks financial
networks analysis been on the research agenda since the financial crisis of 2008.
So the crisis has played a huge role
in leveraging the understanding of a financial network.
So after 2008 crisis, many economists have
come around to the view the very network architecture of
a financial systems plays a central role in shaping
system risk. In fact, many of
ensuing policy actions has been motivated from
these insights. So as a result of
those insights, network science concepts has
been cross applied to finance field after 2008
crisis. From there, financial networks has got
into a full swing and it has then become an active topic
not only in data science, but also in finance.
There are some major areas of interest and applications for the study
of financial networks. For example, interbank networks,
stock correlation networks, agent based models, and there are also many
other different applications in financial networks.
So in our talks, we are dealing with stock correlation
networks. So we'll see the stock correlation networks in real
time. And there are also several studies and research has
been conducted for studying the stock correlation network. And the
research and studies are still on and they're also trying
to find out even more better techniques
for studying the stock correlation networks. So far,
the stock correlation network has proven its efficiency in
predicting market movement, which is a very positive
news and great in financial networks.
Now, as we all know, that financial data is a very
complex data. So how we actually deal with
this complex data or know how we build
better networks with this financial data.
So this is where the power of Python
graphs come into the picture. Now,
why Python? So Python is a
general purpose and high level programming language whose
design philosophy emphasize nodes, readability,
clear syntax, dynamic typing and
strong online community and numerous libraries
and fast prototyping. And it
also has expressive features. So that is why
Python is so powerful. Now in
order to create a powerful graphs, we need to have a software.
So network X is a very good software,
a high productive software for doing a complex network analysis.
And this software is very flexible where
in roots can be a hashable subjects in Python. So it
can be a text, images or XML records
and it just can be an arbitrary data. So maybe it can
be a weights or no realtime change data.
So this software is a treasure true of a graph algorithm,
meaning that we can build many standard graph algorithm and
we can solve many complex problems with this software.
And it is very easy to use.
So I think I have given enough download
on theoretical part. Let's straightaway
jump into action to see some real time financial
network analysis. Let me quickly
switch to my code notebook.
Okay. All right, so here is my
code notebook.
So here is a code notebook which I have created
for this demonstration. So considering
the time constraint, so I have executed the entire code.
But don't worry, I'll explain each and every point in the code so that
you'll get a better understanding of this concept.
So installed a couple of libraries to satisfy this demonstration.
So installed network x. And I've also installed Yahoo
finance to crawl some data from Yahoo finance.
So basically this notebook has been categorized into
two sections. In the first section we'll take some sample stocks
and do basic network analysis. In the
second section we'll take some asset prices and we'll deep
dive into financial network analysis and
build some interesting visuals. And we'll
find out some interesting insights from those visuals.
So let's start with these section one now.
So as usual, we have imported necessary subjects here.
So a couple of libraries have imported and then I'm loading my
data here. So for the first equity,
so I created a variable called ticker here. And then I'm
creating a ticker object and passing a ticker called Tesla.
So these equity, with the first equity which I've selected here
is a Tesla. So once I execute this,
we got a ticker object for Tesla. So with this ticker object,
we can access the entire information related to.
So I've created a variable again, the Tesla,
and then I'm passing my
ticket object and I want to access these
institutional holders of the Tesla. So once I execute
this, there we got institutional holders
and its shares and the values of the.
So all these companies are the institutional holders of
the Tesla. So meaning that all these companies
have some part of ownership in Tesla
stock. Now next, what I'm doing is
I'm adding a new column to my data frame which represented
the ticker symbol of.
So the reason behind for adding these company column is
for easy mapping purpose when we build a network graphs,
let's say. So what I mean exactly here is for
example if BlackRock is holding so much of this
much of shares and it has this much of a value and
this company is mapped to Tesla. So for
that kind of understanding level. So I'm creating a company column
here. So this is our clear data frame on
the Tesla stock. Next I'm taking another equity
under stock. So this time I'm selecting Google
stock here. So again I have created a ticker object
for Google. And then again I'm requesting
the institutional holds of the Google. And again I'm
also adding the company column to this Google
so that we got a clear entire
data frame of the institutional holders of Google and its respective
cool. So then next what I'm doing is I'm combining
these two data frames. So I'm combining both Tesla and Google
data frames with pandas concrete
function. So I create a variable called combined and I'm
calling a pandas concrete function. I'm passing these two data frames.
So once I executed this so that we got can entire
data frame of both Google and Tesla and its respective
institutional. So far it
is good. So we'll start with the basics
of network analysis here.
So I've created a variable called p and
then I'm passing a function here called nx from
underscore pandas edge list. So network X has a pretty
handy function when we are dealing with the data frame
which is called nx from underscore panels edge list where
I'm passing my data frame and I'm also giving my source
and also giving my target here. The source here is institutional
folders and the target here is a company. So we
want to map against each company
and its institutional folders.
So once I executed this, we got a network graph
object here and you can also see the nodes
in our graphs. So the nodes here are the Tesla.
The node here is these Tesla and its respective institutional
holders. Respective institutional holders of the Tesla.
And if you also see the edges and you can also see
the Tesla. And the vanguard
is for TeSLa. And the vanguard is also for booming.
So these are the edges list of these our graph.
So finally we'll plot our network graph. So network
X has a function for plotting is
an exit draw. These we need to pass the edges list which we
have created, which is called p. And I'm also passing a labels called
true, which means that I want labels to be showcased
on my graph. So once I plot this,
you can see that you can see a network
graph here. So we got our nodes
and engines. But what we can do is for
me it is very easy to understand what are nodes and what are edges
in this graph. But if I show you this
point of time, maybe if I show you or maybe any
person who is just looking at these first time of this
graphs so he don't understand what are nodes and what are edges
in this graph. So for that, what I'm
doing is I'll make this plot much clearer by adding
colors here. So I'm adding colors.
So for that what I'm doing is I've created an empty list
called colors where I'm quickly doing a
loop here that whether if my
combined data friend company has values, then I
want that to be a red color and it should be appended to my
color list. Otherwise it should be showcased as a green.
So again we are plotting this
final plot draw function where I'm passing my
edge list labels. This time I'm also passing these
node colors because we have correlated a list here, colors. So I'm passing the
node colors equal to colors. So once I execute this code,
you can see that. So our nodes are in
red colors. So nodes here are the Google and Tesla, and our
edges are all the institutional holders. So you
can clearly see that TeSLa and its respective institutional holders,
Google and its respective institutional holders.
So with this, what we can do is we can also further expand
our analysis by identifying who are
the majority institutional holders in Tesla or
Google. And we can also do some
correlation and compare between
the institutional holders. And we can also identify the top five
or non top ten institutional holders in Tesla or
Google. For example, if you see here,
Tesla is holding the management group LLP and LLP,
but the management group LLP institutional
may not be these top five or top ten. So you can identify those
kind of insights if you further expand your analysis.
But for now, we keep things simple. And my
main objective for this section is to show you
how we can do basic network analysis with
the financial stock.
But definitely what I can suggest you is you can definitely give
it a shot and try
to analyze some interesting insights from
these graphs. So with this, I'll conclude my
first section part. And then now
we'll jump into our second section
in this section. So we're taking
these ETF prices, which are nothing but SS prices over a
period of time, and we dive into financial network
analysis and we'll come through with
very meaningful visuals and insights from those network
graphs, as usual. Again,
I've imported can suspect libraries here for this
case. And the objective from this data set
is to identify the correlation between asset classes.
And so for that, in order to achieve that,
we need to analyze and visualize the relationship between our
asset classes. So, which we'll be doing now, and you'll
see it in a while. So again, I am loading
the data here. So I created a label, sorry,
variable called ETF, where I'm reading my data,
asset price data, and once I execute this,
so I'll got my ETF prices
here. So ETF is nothing but can exchange traded fund.
So it is an indicator of a security. So ETF
may including different type of investment
securities. Like it includes stocks, it includes
bonds, it includes commodities,
currencies, or also some type
of different investments as well.
All those kind of securities assets
are included in exchange rate fund data.
If you see here, we have a 40 columns and 1013
rows, meaning that we have 40 different asset classes
and we have a 1013 rows for each asset
class. Okay, cool. So next is
we are converting these time to eight subjects. So often when
we are dealing with a timesheet data,
we first initially need to check that our
date is in these right data type and we need to
set the date as
an index. So this is exactly what I'm doing here.
So I'm creating efs date these again
passing a function called PD two underscore datatime
function where I'm converting the
date column from an object which is in a string format,
and I'm converting that to date time object and
then I'm setting the date column as an index.
So once I execute this, so you can see here that now
the date is can index and we can see all our asset classes.
Cool. And if you want to understand
what is the start and end period of our data,
you can see that our data starts from no
11th January 2013 and it ends
with December 10,
2017.
We have asset prices for
this period. Cool. So next,
what we are doing is we are converting into lock daily returns.
So what is a log return? What do you mean by log return? So it
is a way of calculating the rate of return on investment
before we actually proceed. For calculating
correlation matrix and comparing correlation between assets,
we actually first need to convert our asset prices into
daily log return. So the reason behind for
doing it
allows us to compare the expected return
between two assets much easily. So that is the reason why we are converting
our no asset prices into daily log return.
So what I'm doing is I'm creating an empty data frame for log
written these what I'm doing is we want
to calculate daily log return on each asset.
So what I'm doing is I'm quickly looping each columns in
my data frame and calculating a daily log written in
my data frame. And finally what I'm using is
finally I'm passing all the daily log returns to my
log written data frame. So once if I execute
this, you can see that we got a
daily log written values for all our asset classes.
So you can see that different daily log written
values for all our asset classes. Cool. Now we are good to proceed
for calculating correlation matrix. So I've
created a variable called correlation matrix. And what we are
doing is these, we are doing a pairwise correlation by
using a built in pandas function called core.
So I am calling my log written data frame and then I'm
passing a function called core. So once I execute this,
so you can see the correlation values
of our asset classes instead of looking
at this correlation values instead of looking at this number.
So let's try to visualize this correlation matrix
and try to understand the
insights through division.
So has we all know that the traditional way
of visualizing the correlation matrix is usually
heat map. So this is exactly how I do in real time.
So when comparing correlation matrix, simply I plot through the heat
map and I try to analyze the correlations
which are positive and which are negative correlations. So that's exactly
how I do. So I did the same thing here.
So written some HTML styling for
my cluster map. So what I'm doing is I've taken a cBond cluster map
which visualize the matrix as a heat map and it also
identifies the cluster
of our assets so that we can see that which assets
are similar to each other. So we can clearly see that which assets
are behaving are close or similar to each other.
So we'll see that here
once I plot this,
so we can clearly see that we
got our cluster heat map on correlation between ETF
price returns.
So first things first.
So the heat map is color coded
here and you can see
the dark blue color which is highlighting here indicates that
there's a strong correlation where the correlation
equals to value equals to one. And the yellow
color here clearly highlights that it is uncorrelated.
Where the correlation value equals to zero and the
color with the red, it is a negative correlation
where the correlation value equals to minus one.
And if you also observe this meat map, we can
see some interesting insights here. For example,
ETFs like ETF assets like EwI,
EWQ, Em, EWJ are
all these are highly correlated and
you can see that they are close to each other.
Just like if you can also see that EWB
and NLU, so all these are strongly correlated
assets. And if you see
the ETF like Pxx which
is another ETF asset, where you can
see that it is negatively correlated into equities
here, negative correlated into equities. And if you
also look at that FxY
currency here, FXC which is japanese,
you can also see the japanese currency which is moving
into the opposite direction here. And you
can also see that all these are riskier
asset classes. Riskier asset classes.
The heat map here is conveying one dimension
information. We are
only able to see the distance of these
correlation between assets,
but if we want to see how
the volatility between assets and
how the analyzed return between asset class are performing,
so we are not able to find such things in this heat
map. So what we can do here is we'll
take all these insights and findings from
this heat map and we'll further investigate
these findings and insights
with network graphs, and we'll try to build some meaningful
visuals and find insights in a more
meaningful way. Let's see
those things now.
Next we'll see the financial network analysis using network tech.
So as I mentioned, that non networks is one of the most
popular python library for doing complex network analysis.
So in order to analyze correlations in
a network, so we need to convert our
correlation matrix into an edge list so
that we can easily create graphs and compare the correlation.
So what I'm doing here is I'm converting the correlation
matrix into can edges list and renaming the column. So created
a variable called edges where I'm converting through converting
my correlation matrix and resetting its index as well,
and then also renaming the columns. So I'm
giving different nodes for our asset classes, first one,
SS two and correlation,
and then finally if we execute this.
So you can see that our edge list data
frame here. So the nodes here are asset
classes, and the connection between nodes which are known as edges,
are these numeric value, which is a correlation value here,
and these values are corresponding to the correlation between their
respective paid nodes here.
So we have successfully created an edges list now. So with
this address we can create a graph
I'm creating a variable called g, as you have seen earlier
in the first section about this function called x from underscore
pandas edges list, which is a function which takes
function which helps us to create a graph from
the edge list. So these, I'm passing my edges list here,
which is known as edges, and I'm also passing my source and target.
This is true. I'm also passing the sources
here, the target edges attribute, which is a correlation here.
And if you look at the information of our graphs here,
so our graph contains 39 nodes and 741
edges, meaning that we
have 39 asset classes and these
741 connections for those asset
classes. Now let's visualize
our network. What I'm doing is I'm
creating a subplots here, creating a subjects here,
and also passing some maplot properties. Probably you can
break it when you export the documentation and also
creating layouts here. So I'm creating a different layouts
here and I want all these layouts
to be plotted on my graph separately. So for that
I am quickly writing a conditional statement here.
And then I'm calling my network plotting
function which is nx draft. I am passing my edges
list and I'm also passing the labels here and
also giving the nodes size. I'm also giving a node color.
So probably you can check all these nodes colors and edit colors
in the documentation page or maybe you can find it in Google
has well, and giving a layout
and also giving can access here. And I'm also giving a title
for each of the layouts. And once I execute
this, you can see
the four different layout plots on each
of these graph. So let me quickly show you
this. So you
can see that we got a circular layout, we got a random layout,
and we got a string layout, and we also got a spec layout.
All these plots look pretty fancy if you observe
them and they looks pretty fancy, but they
actually fail to convey the information which we are actually
looking from our network graph. So the main
thing which we are looking in our network graphs is
to be able to identify the correlation in
between assets. But these plots are actually failing
to failing or fail to showcase
them. Exactly. So what we can do is we can
improve these plots. We can improve these plots by taking certain steps
and approaches so that we
can build a meaningful network graph. And so we'll
see how we can do that. Now firstly,
I'm removing edges. So I really want to cut these
unwanted edges in the graph so that I know my graph shows
more meaningful information. So for that, in order to remove
the edges. What I'm using is I'm giving a minimum correlation point
to remove the edges in the graph. So the point here is the 0.5
which I'm taking. And then again I'm creating a new edge list
here where I'm passing my edges source and
target as well. And then I'm creating a list to store
the remote edges here. So maybe you can also use it
in. But again I'm creating here I
want to store everything in these remote list.
So for that what I'm doing is I'm quickly looping through
my edge list and finding out the correlation which are below
my correlation point. So if my absolute
correlation is less than my given point,
I want all the edges to be
appended to this remove list. Then.
And finally what I'm doing is I'm adding all the removed edges to
this remove edges. So if we execute
this, you can see that total
530 edges were removed. So if
you see earlier that we have 741 edges
in the graphs, now we have only five. So we
have removed 530 edges from our graph.
So we have removed all unnecessary edges from the graph.
So you'll get to see that when we
plot final pictures.
Next, what I'm doing is I'm doing some styling here. So why
I'm, the reason behind for doing styling is to
show my plot more meaningful. So the styling these
is not here to show give you some fancy stuff. So it
is more to show you some meaningful stuff
in the vision. So for that what I'm doing
is I've written simple custom functions
here in order to avoid and creating multiple lines
of code. First thing is I have
quickly defined a function called selecting color where I'm passing a correlation
parameter. If a correlation is less than or
equal to zero, I want my color
to be written as a red, otherwise it should be green. Then again,
same with selecting thickness,
selecting the thickness in the nodes and edges. So what I'm doing is I'm passing
a parameter of correlation, giving the parameters
called benchmark thickness scaling factor. And I'm written
that I want to return it as
benchmark thickness into absolute correlation. Then it should be an
exponential to the scaling factor. So all
these correlations you'll get to understand when you see
the final plot. So don't get confused or scared
by looking at all these things.
And same with the node size as well. So I've written
custom function for node size as well. Cool. So let me
execute these then. Next we are identifying
the positive and negative correlation. So it is important to identify
which assets are positively correlated and which assets are negatively
correlated in our asset classes. So in order
to identify those positive and negative correlations. So h
colors will be help us two selecting because they will help
us to select defining upon the positive whether it is a positive or
negative correlation. So for that what I'm doing here is
I have created an empty list called
edges color and edge width. Then where I'm
written a conditional statement called where for key
value in our Nx get attributes
correlation items. So if this condition is satisfied,
I want my select color values to be appended two
this edge color. So the select color is nothing but the custom
function which we have written here. So I'm passing the same custom function value
and so this shouldn't be appended if this condition is satisfied to
this edges color. The same goes with a selective thickness.
So it will be automatically appended to this edge with
list if this condition is satisfied. And finally,
I'm also doing the same thing for node
size, so assigning the node side depending upon the number of connections, making that
these more number of a connection we have,
the size of a node will be that much big and
it shows that how much number of strong correlations
it has. So you'll see that has, well,
you'll see that as well when we go to visual same
thing. So let me execute this one. And now
it's time for our final graph. So we have taken different
steps. So in order to improve our final graphs, let's see whether this
will definitely help us to identify our
goal or we are able to draw some meaningful insights from
overnight paragraph or not.
Again, creating a fixed size here and passing
a font size and then calling my product function
with I'm passing my parameter and also passing a layout.
This time I want only the circular layout here and
given node labels and given the node sizes,
nodes size list which we have created above here,
the list which we have created same goes with the edge color,
same goes with the edges width as well. So passing these same the
list which we have created here, passing the same here.
And then I'm creating a title price correlation since
we are understanding the correlations.
And so once I execute this one,
you can see
that price correlation
graph. So let's try to understand what all the changes we
have made in this
graph comparatively with what we have seen above.
So firstly, I have removed these edges with the weak correlations
and we have kept only the edges which have strong and
significant correlations. And secondly, we have
also added colors to indicate the positive and negative correlation.
So all the positive correlations are in a green color and all the negative
correlations are in a red color. So, which you can see here, the green color
indicates positive correlation and the red color indicates the
negative correlation. Here we can also see
the relative strength of a correlation between nodes.
And we have also adjusted these size
of a node, which represented the number of
strong correlations between the nodes with.
So for example, if you
see that VGT vanguard has vanguard size
is pretty big and it has quite a
strong correlations with others in the network.
And if you also look at dia,
which these, the node size is also big, where it also has
some strong correlations. And same goes with ebod here
and same goes with xlk here. So all
these are strong correlations in the networks, comparatively with
these others in the network. And if
you also look at the graph, if you also look at
the graph, majority are strongly
correlated. So most of the asset classes are here are strongly correlated.
And if you also observe the small nodes,
like for example GDx
or XLU or Fxf, so all these are different etfs,
they are negatively correlated with other assets.
So all these are negatively correlated with other assets.
The only thing which we are not able to figure out
from this, which assets are similar
to each other in terms of correlation to nodes.
So this is the only thing which we are not able to
figure it out in this network. So for
that, in order to identify that, what we can do is we can further improve
this visual by taking a different layout approach.
So let's see that now here
what I'm doing is I'm taking a layout called
Fretcherman Rainbow dayout.
Basically, this layout will basically cluster
this layout. What I will do is this layout basically cluster
the nodes which are strongly correlated to each other. And it
allows us to identify these group of assets with similar properties.
So let's see how it showcase now.
So I'm calling an x raw function, where I'm
passing my parameter x list and this time
I'm giving a fetch membrane goal layout and restaurant.
Other parameters remain the same, which we have seen. But so
once I execute this,
you can clearly see that how
it has clustered the nodes which has
a strong correlation between each other.
And we can also see that it has clearly identified these
group of assets with similar properties.
For example, if you take GLD,
which is commodity, it has been successfully
grouped with similar properties. And same goes
with, for example BND, which is all
these bond etfs, which have been successfully grouped with respective
similar properties. And same goes with here
it is a group of. So this is quite a
large group of cluster of equities and it has been successfully
mapped with their similar properties.
This is pretty cool, but the only
glitch in this entire visual is so
the labels, these are overlapping in these large cluster
group assets. And we can also see that we
are not able two see these nodes
as well, clearly because they are quite packed. So what
we can do is we can quickly improve this visual
by taking an approach method
called minimum spanning tree.
So what exactly is a minimum spanning tree?
So these minimum spanning tree is a very famous and often
used in financial network analysis. So what
exactly these minimum spanning. So minimum spanning tree. So what exactly the
minimum spanning tree will do? So minimum spanning
tree will minimize these edges in the graph
edges and it reduces all the clutter it years
that it removes all the clutters in the network.
So we'll see how our minimum spanning tree
help us to identify our insight
or help us to identify our goal.
So these, I'm creating a minimum spanning tree here. Again, I'm adding
colors to my minimum spanning tree and then I'm creating
my minimum spanning these here and calling my
plotting function. And the best part here
is networks has a built in function,
built in function which calculates minimum spanning
tree for us. So here I'm passing building function here and
passing labels. And layout is again feature mandatory
layout because this
layout help us to identify the group
of assets with the similar properties here and we can
quickly identify the correlation with this layout.
So I'm using the same thing here, install other parameters remain the same
given the title here.
Now you can clearly see that how
it has been removed the clutches. And so our minimum spanning
these looks more readable and it
has successfully removed these unnecessary edges
and unwanted nodes from our graph. And it is more
readable now. Now you can clearly see the group of cluster
of equities here with the similar properties and
same with our commodities bonds and
also currencies here structure
is very clear and we have successfully
able to identifies
the correlation between assets. And we have also seen the group
of assets with these similar properties with
our graphs. So with this I'll conclude
my section two part and to
summarize the things. So in this talk we have seen
that how the history and graphs
have been came to the picture. And we have seen what are networks and
how two define the network structure. And we have also been how the
financial network evolution came into the picture. And we also
understood why the power of Python graphs, why network
X and why Python is so powerful for
doing the complex network analysis and coming
to hands on part, we have seen two sections. In first section
we have done some basic network analysis on financial data,
and in these second section we have taken the ETF
prices and we have deep dived into our network analysis
where we have seen the asset correlation. Initially, we have seen the asset correlation
with our heat map, and we have find out some
interesting insights and issues in the heat map.
And we have further investigated and further
investigated and analyzed with our network graphs.
And we have also seen some potential issues with
our fancy graphs and where we have improved
those fancy graphs by taking certain steps
and different approaches. And finally, we have
seen that finally we have seen that
different layout approaches, two identifies
to gain our final core, where we
have seen that correlation between our asset classes and what are the positive
and negative correlations?
And we have seen remove the
unnecessary edges. And we have also seen the group of
assets with similar properties. So all
these things have been achieved through our network graphs
with the power of Python and Network X.
Yeah, so that's all I have in my plate
today. And let me quickly jump to my
slides here. So these are some of the
great references if you want to study about network analysis
and graph theory. So feel free to check them out.
And I really appreciate
you all for being patient and listening my talk.
And if you have any questions, feel free to
ping me on the platform. I'll be addressing each and every
question. And thank you so much for having me
today. Have a great day.