Transcript
This transcript was autogenerated. To make changes, submit a PR.
Today I will talk on the topics that is not commonly covered on such conferences.
Well discuss core infrastructure and data
centers infrastructure in particular.
Of course, all engineering topics are very complicated
and deep and it's impossible to cover them all.
I just want to slightly open this box and have a look inside
and hoping in case of need, you will know where to look
deeper. But before that, let me introduce myself.
My name is Yigor and I have been leading critical IT
infrastructure for more than 15 years now.
I am ahead of IT infrastructure of one of the worlds biggest
marketplaces. And before that I led infrastructure transformation
at one of Europes largest banks and IT ecosystems.
Developers, IT architects, and other field professionals
often lack a solid understanding of core IT infrastructure.
This gap in knowledge can be fatal, especially for critical systems.
So today we'll try to bridge the gap and talk about
why it's important to understand one of the main parts of core
infrastructure, the data center level.
We'll discuss the data center infrastructure setup and operations,
and the main idea of this talk is to invite you to a virtual
tour of the datacenter main systems and let you see
how this system works.
Imagine hosting your systems on the core infrastructure
as you were sending your child to a daycare.
You could choose the cheapest or closest kindergarten, but will
your child be safe and sound there? As a
responsible parent, you would normally research and select
the best option to ensure your child's safety and comfort.
Actually, the same way you should investigate and choose the best infrastructure
environment for your systems to provide their required level
of reliability and resilience.
That's why it's important to understand what core infrastructure systems
are now hosted and we at infrastructure department
call it infrastructure awareness. This concept
presumes the deep understanding of all engineering and ITC systems
behind the product level.
Actually, you may have a question if infrastructure awareness is
important for all kind of IT systems, and the answer is,
it depends how critical the system is.
The more critical system is, the deeper infrastructure awareness
should be. If you are working a personal pet
project, you can actually run it on your local
PC, and if something fails, it's not a
big deal. Only you as a developer are affected
and you can fix it at your own pace. However,
if you are responsible for corporate application used by many people
and tied to business processes, the stakes get higher.
Even if small back office application for ordering lunches
goes back for a week, it will be inconvenient
but not disastrous. Still, it is a much greater
issue than a pet project failure, and in
case of failure of marketplace application for customers,
the impact of downtime will be huge,
critical business processes will be down, and even the whole
existence of business can be at risk.
Many root causes of critical failures are located on the
infrastructure racks or server fails,
connectivity fails, and so on. That's why we should
very carefully think of core infrastructure. This is exactly what
I mean by infrastructure awareness.
So first of all, you need to define how critical your system
is and define what to do in case of possible failures.
Usually the system architects limit themselves to a platform
level. For example, they plan several instances and
in case of fails, the rest will be
bear the load. If we deal with non critical systems,
it's totally enough to stop on this level and that's it.
But if the system is critical, then you must
drill down further. There are many more questions to be
answered. For example, what are your servers and virtual servers hosted?
What kind of cloud provider is used, if any?
Are your servers old on you? In what kind of quality
of data centers they are hosted, in what geographical regions and
country the data center placed,
and many, many more. One of the
most important elements of core infrastructure with developers and architects
often overlook, is data centers. I'll give
you a moment to try and figure out what systems
it includes.
The three pillars of any data center are electricity,
cooling, and connectivity. Let's elaborate
a bit of each. Electricity is a lifeblood of data
centers. Obviously, without a reliable power
supply, servers and other hardware cannot function.
Data centers often have multiple power sources,
including backup generators and uninterruptible power supplies,
providing continuous operations during mainline power outages.
Then comes cooling. As you may know, data centers
generate a significant amount of heat. Effective cooling
systems remove heat energy from IT equipment.
They prevent hardware failures caused by overheating,
and later on we'll talk about it more in details.
Basically, the goal is to remove heat and connectivity,
without which data centers are completely useless.
That is a must. All other engineering systems you
could heard of, like fire extinguishing physical access systems,
automation are actually optional.
Data centers come in different sizes,
but how do we measure them? Do we need to know the area
of the building, or do we know the count or
number of recs or units? No,
no, and no. The funny thing
is that data centers are measured in catals.
In other words, we use power watts to understand
the capacity of a data center. Of course,
more often we speak about thousands of kettles or megawatts.
A small size data center has a power capacity up
to 3 mw. Medium sized data center, three to 10.
We consider 10 more as data centers
for a large size. For context, 1 mw
is enough to energy the power about 200 homes.
And what about XXL size? When we
speak about hyperscalers such as Amazon,
Google, Meta and so on, their data center
capacity reaches a gigawatt.
To give an idea how big this number is, let me just
say that all in all, in 2023, data centers consumes
around eight gigawatts of power. So you can see half of
this amount is consumed by the hyperscalers.
As we have discovered, data centers need power and
I like to visualize it as a tree like structure
with a trunk and numerous branches and leaves.
It is a fitting metaphor for redistribute electricity from
one central source to numerous consumers, ensuring that power
reaches every endpoint device.
Actually, electricity comes from the local power grid with the
high voltage transmission lines on its way to data center
consumers. High voltage is transformed to medium and
then to low usable wattage from
the transformer. Electricity goes through the primary distribution panel
which you can see on the screen to be further distributed
to it. Equipment load electricity
also always goes into Ups which stands for
uninterruptible power supply for backup power in
case of grid failure.
On the screen on the top of raxd you can see distribution
busbar with the hanging red distribution outlets
which are connected to the server racks.
That's what a diesel generator looks like. In case
of continuous grid failure, it will take the it load on.
The data center will keep on working as long as is
needed. A diesel generator can operate even for
months. Now we know
the data centers consume huge amount of electricity
and I have a tricky question for you.
What are the things the electricity is used for?
I'll give you a moment to make a guess.
The answer is heat. Electricity is 100%
transformed into heat and to nothing else.
To get rid of produced heat, we need to take this heat away
somehow, and there are many ways to do it.
Cooling techniques fall in two main air based
and non air based. Air cooling used by
vast majority of data centers includes common air conditioning
systems and no air cooling system. Use water,
oil or solid materials.
Air conditioners are the data centers most common air
cooling methods. They actually function the same
way as the ones we have at homes. Just bigger.
Chillers are the second most common cooling systems,
using water or water based solutions to transfer heat
from it equipment halls to outside. They are
more energy efficient than air conditioners, but require more
complex installation and maintenance.
Adiabatic cooling involves chambers or mats.
Water evaporates in them and cools the air inside.
This technique is normally used in addition to
other cooling methods, exotic methods
including peltier elements which rely on thermoelectric
effects, and underwater data centers like Microsoft project
Natit.
There is a method that stands out of other cooling techniques.
It's called free cooling and it uses normal outside air.
As it is, free cooling is just running
the outdoor air through it equipment and does
nothing more. If the air is warm, we just
need to run more air through the servers, as free
cooling requires nothing but air fans and outside
air. This technique is extremely energy efficient
and to be honest, free cooling is my favorite cooling methods
in which I have quite a broad experience and I
am always eager to discuss and answer any questions about
that. As I said,
freecooling is one of the most efficient cooling technologies and it
provides the highest power usage effectiveness, or pue
value, which one of the most important metrics to understand
the efficiency of data centers? Many companies
worship pue as a sacred animal. Let's try to
figure out why it's so important.
What is pv power usage effectiveness is a
ratio that describes how efficiently a data center
uses energy, especially how much energy is
used by computing equipment. In contrast to cooling and other
overhead that support the equipment,
ideally it equals one, which means that all energy
is used for it equipment without any wastage.
However, achieving this perfect scenario is
impossible in real world. There are different methodologies
for calculating PvE. For example, there is a
way to calculate an instantaneous PVE. On the
other hand, a more comprehensive assessment will be
to calculate an annual average pUv.
I have just said that it is impossible to achieve pv
equal to one. But how do you think if it
is possible to achieve pv less than one?
And the answer is yes. But for this, we need to
switch from the terms of power to the terms of money.
For example, few data centers in USA and Europe
use heating pipes and sell hot water to the nearest
towns. In this case, if we calculate their PvE
in terms of money, it will be even less than one
because they get profit from selling this hot water to
facilities now to
more tangible things. Many it professionals are well
familiar with the servers and regs, but there are also
those who are not. And this is the opportunity to have a quick
tour of a data hall and get an overall impression of its
setup. For many
of you, it's obvious, but I'll mention it. The data centers
servers are mounted in racks, which are standardized frames or
cabinets that host multiple servers and network equipment.
This is the way they look,
and here is the view from another angle.
Each rack contains many units, allowing servers to be stacked
vertically. This setup provides easier operation
and maintenance and maximizes space efficiency.
As you can see, network ports and drive slots are located
in the foreign part of servers so that a data center
engineer can easily access them.
Now we are witnessing the rise of ML
and AI technologies, so more and more capacities
of data centers are allocated to GPU's.
The GPU's through physically small consume significant
amounts of electricity. One GPU
node can consume more than 8 power.
That means they produce the same huge amount of heat.
To address this problem, data centers have to upgrade the
electrical and cooling infrastructure to support their increased
power and cooling demand to
wind up let us quickly go through the today's takeaways.
The three pillars of any data centers are electricity,
cooling and connectivity. Today we have a virtual
tour of the data center electricity infrastructure.
We made an overview of the DC cooling technologies,
the special focus on free cooling. We also shortly covered
data center servers including GPU's. But if
you take away only one thing from today's talk,
let then it be this. The more critical your
system is, the deeper your infrastructure awareness should
be. Hope this short data
center infrastructure tool was useful. If you are
interested in any above discussed topics, especially in
free cooling, you are always welcome to contact me directly
or refer to my articles in Hackernoon. Thank you
very much.