Transcript
This transcript was autogenerated. To make changes, submit a PR.
Welcome to my presentation about affordable machine
learning platform where data is becoming the new oil.
The ability to harness this data through machine learning and
artificial intelligence is no longer a luxury reserved
for large corporations. It has become a necessity
for bases of all size, researchers and
individual developers. However, the journey to
effective machine learning can often seem expensive.
High cost, complex infrastructures and the need
for specialized hardware can create significant barriers
to entry. Many small business startups and
individual developers find themselves wondering,
how can we leverage the power of machine learning without
breaking the bank. This is where furthermore,
machine learning platforms come into play. These platforms
democratize success to machine learning capabilities,
providing cost effective, scalable and user friendly
solutions to build, deploy and manage machine learning models.
They are designed to lower the barriers,
enabling you to turn your data into actionable insights
regardless of your budget.
Today, Vivi will explore the landscape
of a football machine learning platform, understand their
key features and learn how they can be utilized to maximize
efficiency, minimizing costs.
Thank you for joining us. Let's get started.
Before we dive into the details, let me
walk through today's agenda. We will cover the
following topics. We will start with
an overview of what an affordable machine learning platform is
and variety essential in todays data driven world.
We will identify the key stakeholders and user
groups who benefit the most from affordable machine learning
platform. Then we will break down the essential components
of a machine learning platform, explaining which
parts are necessary for furthermore machine learning platforms.
We will dive deeper into the technical aspects
that make an affordable machine learning platform efficient
and powerful. Let's begin with the first
topic, what's an affordable machine learning platform?
To start, let's define what a machine learning platform
is. A machine learning platform is a comprehensive
environment that provides a necessary tool,
frameworks and infrastructure to develop,
team, deploy and manage machine learning models.
It streamlines the entire machine learning workflow from data
preprocessing and model building to
deploy. An affordable machine learning platform
is generally designed to work effectively with a
single or few GPU's, focusing on the importance
of resource sharing to ensure the cost efficiency and
broad accessibility. So the key
point of affordability is GPU sharing.
Who needs this machine learning platform? Before answering
this question, I am going to discuss why sharing GPU's is
crucial in creating an affordable machine learning platform.
First, lets talk about the cost. High performance
gpu's which are necessary for running complex machine learning tasks
come with a press tag. For many startups,
small business and individual researchers,
this cost can be a significant barrier to entry.
Consider that GPU's are often idle out of
regular working hours. In many organizations,
these expensive resources sit on youth after workday
and leading to inefficiency.
Most applications have CPU and IO work in
between. Launching GPU kernel the GPU utilization
of a deep learning model running solely on the GPU
is most of the time much less 100%.
It means even during working hours there can
be periods when CPU's are not fully utilized.
GPU's are getting powerful each year.
Experimenting with a new model allows and
sometimes even requests when to use smaller
hyperparameters, making the model use much less
GPU memory than normally. Such tasks
lead to underutilization and inefficiency.
Typically, GPU's are the most expensive part of
machinery PI four and also the component that
mostly affects platform utilization.
Increasing GPU utilization is key
to reducing costs and building an thermal machine learning platform.
About who needs this platform first lets
talk about startups and small basics.
The key for startups and small businesses is to
respond flexible to market
demands at a low cost. When building a machine learning platform,
they shouldnt invest heavily from the start.
Instead they should look for cost effective solutions
that can scale as their needs grow.
Nice educational institutions, including universities
and research labs, need to provide students with
hands on experience in completing end to end machine
learning tasks. While modern GPU's might
be overkill for education and thoughtful machine learning platform
can offer practical waste powder to industry practices
without the excessive. This approach enables
students to gain valuable skills and experience while
helping institutions manage their budgets effectively.
Nonprofit organizations also work on tight
budgets and need to maximize their impact
with limited resources. They can
use machine learning to analyze data, optimize operations,
and drive their missions more effectively.
An affordable machine learning platform provides them with the necessary
computational power without diverting too much of their
funds from their primary objectives.
Freelancers and consultants in
the field of data science and machine learning independently
or in smart teams. Access to an affordable machine learning platform
allows them to offer competitive services and solutions
to their clients without the need to invest heavily
in expensive hardware. It can help them
maintain flexibility and scalability in their
operations. In summary, an affordable
machine learning platform can significantly benefit various
groups. Before introducing the affordable machine learning
platform, I would like to first introduce what components
a typical machine learning platform should consist of.
Here I used a simplified diagram. Through this
diagram, we can see that a typical machine learning platform
can be simply divided from top to bottom into
action application layer, infrastructure layer,
and hardware layer. In the application layer,
it's divided into machine learning part and data parts.
Let's talk about machine learning part first. Typically, a machine
learning part is divided into four data
engineering experiments, training, and inference.
Data engineering is responsible for collecting,
cleaning, and preparing data to ensure it's reliable
and suitable for machine learning tasks. The experiment
phase involves exploring and analyzing data,
testing different algorithms, and creating
model prototypes to find the best solution.
The training phase works on training models using historical data
and optimizing their parameters
to achieve the best performance. The inference phase
involves developing deploying trained models into
product environments to make predictions on new data.
Dataparts is also an indispensable part
of the machine learning platform. Data related subparts
usually includes a feature store,
model management, and data lake. The feature store
is responsible for storing and serving data, serving feature
data consistently across streaming, and inference
to ensure reproductibility.
Model management involves tracking and visioning
machine learning models, driving collaboration, and managing
model deployment pipelines. The data lake
serves as a centralized repository for serving
vast amounts of flow and process data,
enabling efficient data material and analysis.
However, to create an affordable version platform,
we have decided not to include the data components
at this date. The reason for this is that our
target users typically handle smaller data sites,
and there is no immediate need to
establish in a dedicated data platform
at this point. Moreover, data platforms and
machine learning platforms can be decoupled
that as our basis scales up in the
future, we can build a dedicated data platform
separately. So for the affordable machine
learning platform, the scope is a dark color part in
the diagram, the machine learning part,
and the infrastructure part. I will
deep dive into the most critical technical points
in these two scalable container
environments and GPU sharing. To better
understand the requirements for scalable container
environments, let's revisit some typical
basis scenarios. Educational institutions
often operate with a single machine equipped with a few GPU
cards suitable for classroom use and small skill
research projects. Startups in small business
typically have a setup consisting of a few PCs
each with GPU's ideal for initial product development
and smart skill deployment. Freelancers and
consultants usually work with a single PC
equipped with only one gpu, which is perfect
for individual projects and consultancy
work. Then we could find that there is a
conflict. From the basics scenario perspective,
the hardware setup may consist of only one
or few PCs. However,
building a machine learning platform requires multiple container
environments, including experimental training and inference
environments. To manage these environments effectively,
we need to introduce kubernetes. The instruction
of kubernetes requires at least three nodes.
The challenge now becomes how to deploy kubernetes
on a single PC while ensuring compatibility
for potential multinode expansion.
The answer is to introduce OpenStack to
further illustrate our needs consider typical basis scenario
where the initial hardware setup consists of only one PC.
As the basis grows, the hardware may expand
to multiple physical machines or virtual machines and
potentially transition to a cloud environment.
This can give shift to ensuring compatibility with
heterogeneous hardware environments while maintaining scalability.
OpenStack is well suited to drive issue as
shown in the diagram on the OpenStack website,
it excels in heterogeneous hardware
compatibility. OpenStack also provides a dedicated
single machine deployment tutorial,
making it a perfect fit for our requirements.
Additionally, since OpenStack provides a virtual machine
environment, it allows for stimulus transition to
either self host cloud or public cloud environment
in the future without disrupting the operation
of the Kubernetes cluster and machine learning platform.
This diagram illustrates the potential lifecycle
of a typical affordable machine learning platform.
In the initial phase, OpenStack is used to support single
machine setups and ensure compatibility with
heterogeneous hardware. As a platform evolves,
Overstack continue to provide compatibility
with more complex environments, including cloud
infrastructure. In Lithostix,
OpenStack can be seamlessly removed to other container
environments, directly enhancing flexibility and
scalability. Next, I will introduce one of
the most app critical technologies in building an affordable machine learning
platform. Before diving deep into the technical details,
lets first understand the mainstream GPU sharing solutions
and their applicable scenarios from the
big picture perspective. Machine learning tasks
use Nvidia GPU's we will start by looking at several
official Nvidia solutions, including multi instance
GPU, which is MIG GPU time sharing
and multiprocessed service, which is NP's.
We can see that MIG allows two parallelism for
multiply multiple tasks on the same GPU with the highest
level of isolation, making it suitable for
inference tasks and small scale training tasks, although it's
also the most expensive option which we will discuss in
detail later. GPU time sharing involves time slicing
in a single GPU, which causes content switching between different
tasks, leading to increased
total task time, making it suitable
for, let us say, sensitive inference tasks, but more suitable for
relatively synchronous training tasks.
The last solution, NP's, is the earliest
GPU sharing solution which merges multiple tasks
into a single GPU contest that if one
task fails, all tasks will fail. Thus,
it's only suitable for experimental scenarios.
Among third party solutions, they will primarily introduce
Tencent KKe Gagia GPU.
Let's now dive into the principles of those solutions
and their advantages and disadvantages.
MIG is a technology that allows a single Nvidia GPU
to be partitioned into multiple isolated instances
each of this instance has its own dedicated resources
such as memory, compute cores, and bandwidth.
This separation for such multiple workloads can
run simultaneously on the same GPU without affecting each other,
thus maximizing resource utilization
and performance. The primary advantage
of IMIG is the strong isolation it provides
between different tasks by dedicating specific
resources to each instance,
one task from impacting the performance or stability
of another. This makes imaging particularly suitable for
environments that revere the workloads.
Additionally, MNG allows true for precise
resources allocation, enabling efficient
use of GPU capabilities and improving overall
system scalabilities. However, here is
an important drawback to consider. The cost of using
MnG can be very high as only Nvidia's high
performance professional GPU support this technology.
As shown in the table on the right, the minimal requirement
is the nadia, a 30 toothpod mnG,
which doesnt align with the goal of an affordable machine learning
platform. Lets talk about the next candidate.
GPU type sharing immediate time slashing
is a feature that allows a single GPU to be shared by
multiple processes or users.
By dividing the GPU's computer resources into typesclass,
each process or user gets a
dedicated time slice during which they have full access
to the GPU's resources. This enables multiple
tasks to run on the same GPU in a sequential manner,
providing the illusion of parallel processing
while ensuring that each task gets a fair share
of the GPU's capabilities.
Advantage of time pricing is increased flexibility
and better results utilization.
It allows multiple users or processes
to share a single GPU with zasket. The need for partitioning
is hardware resources. Physically, there are some
disadvantages to time slicing.
In a time slicing setup, multiple processes share the
same proof vram, leading to potential memory
contention. If one process consumes a large amount
of vram, it can leave insufficient memory for
other tasks, causing performance degradation
of failures. Additionally, the shared memory
spaces increases the risk of where
inefficient memory management by one profile can
gradually consume more vram, impacting the performance and
the stability of other processes.
Regarding for the isolation, time slicing doesn't provide a strict isolation
between processes. If one process encounters
a fault or crashes, it can potentially affect
other processes sharing the same GPU resources.
Lack of isolation can lead to system insertability and
unpredictable performance, making it
challenging to ensure reliable operation in production environments.
Next solution is Nvidia NP's the core principle
of NP's in
allowing multiple processes to share single GPU context.
Traditionally, each CUDA application would create
its own GPU context, leading to resource message
and context switching overhead.
By sharing a single GPU context NP's reduces
its inefficiencies. Additionally,
NP's Merc command queue from different processes
into a single queue, thereby facilitating more
efficient scheduling and execution of command
midping turn minimize GPU add time.
Whilst the primary advantage of Nvidia
NPM is performance improvement, by reducing
context switching and managing command more
efficiently, NP's can significantly enhance
the performance of concurrent processes.
Furthermore, compared to MIG and templating
NP's both consumer grid gpu notable
disadvantages associated with NP's while
the pay issue its memory isolation,
the shared GPU contest results in less strict memory
oscillation compared to a separate contest.
This can lead to memory contention and potential data security
concerns. Another significant drawback is fault
isolation. If one process encounters fault,
it can impact other processes sharing the same GPU
context. Configuring and debugging NP's
can be complex and requires specialized knowledge
and experience outside of official solutions.
Many third party vendors have proposed their own GPU
sharing solution. A typical example is Tencent Gaia
GpU. Tencent provides a
complete Fido GPU sharing permissions, which is a
fully open source GPU services. Inga GPU
is a GPU resource limitation component and
belongs to Cuda hijacking. We could manage
each container's memory usage by intercepting
CUDA's memory allocation and release requests,
thereby achieving memory isolation.
There is the only thing to load is that context
application doesn't go through the malloc function,
so it's impossible to know how much memory the
process uses in the condesce. Therefore, we could
accurate the current memory usage from
the GPU each time. In terms of computing power isolation,
users can specify the GPU utilization rate for container.
MacUDa will monitor utilization and take some actions
when it exceeds the limit.
Both hard isolation and soft isolation are supported.
Since a monitoring adjustment scheme is used, computing power
cannot be limited in a short period.
Only long term efficiency fairness can be guaranteed.
Therefore, it's not suitable for scenarios they are tested.
Task times are extremely short, such as
inference tasks. Machine learning platforms are extremely
broad topic and even if we limit the scope
to building an affordable machine learning platform, there are still
many sets to share. However, I believe
that GPU sharing and scalable container
environments are among the most critical packs.
A few years ago, well, I was
developing a machine learning platform. We successfully run
our platforms mostly on a small PC cluster with
only few machines shown in the
picture. This demonstrated that by applying
the technological solutions previously mentioned,
we can indeed build an affordable machine learning platform aimed at
education institutions. Ngo's freelancers
and setups. Finally, I hope my
insights have provided valuable guidance and inspiration
for your own affordable machine learning platform.
Endorse and thank you all for joining
this online session.