Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. My name is Mitri Vaishinka and I am a software
engineer at MidAF Games. I specialize in mobile
game and backend system development for over a decade
and I am a member of International Game Developer
association and an author of articles on
medium and Hakunun. And let's talk
about the pillars of our stack that ensure
our operations run smoothly and efficiency.
First on the list is our CI CD organization,
which is the backbone of getting a large development team up
and running handsomely. This setup ensures
that integration and delivery happen seamlessly,
allows for frequent updates without disrupting
these user experience. Next, we focus on
backend infrastructure, preparing it materials for scalability.
As our user base grows, it's crucial to
have backend robust enough to
handle increasing load while maintaining performance.
And finally, let's not forget our
blue green deployment strategies. This method allows
us to release new versions of our software
without having to pause for maintenance or
technical workings. It means zero
downtime and a smoother, more reliable experience for
our users. Altogether, these elements form
an integrated approach to development and deployment,
setting the stage for both agility and
reliability. Let's start with
CI CD organization.
In our technology stack, we've adopted some of the industry's
best practices to measures seamless and efficient deployment
process. We operate under
the infrastructure as a code paradigm,
which allows us to manage and
provision our technological infrastructure through machine
readable definition files. These enabling
rapid deployment and version control for
the sake of environment reproducibility,
particularly for continuous integration and staging, we utilize
containerization techniques. This enhance
both portability and consistency across
all stages of development further bolster
our reacting efforts. We employ an emulator farm
that allows us to simulate various scenarios and environments.
Simplification is a key in our approach as we've
streamlined the deployment of both test and production replay
servers to be as easy a single click.
Finally, we conduct a cohesive suit of checks
at the merge request stage to ensure good quality and
functionality. These practices collectively
contribute to a robust, agile and highly dependable
development ecosystem. So we
use versioning to track changes in our infrastructure
as a code, it promoting us the
ability to revert these previous stages
effortlessly. Typically, we use Git.
Second, automation is at the heart of
our operations. It frees us from manual toil
and allows for agile adaptation.
Third, our use of code for describing
infrastructure guarantees repatability and consistency.
It making transitions from reacting to
production environments seamless and
fourth, our code doubles in our documentation,
it offering transparency in understanding how our
infrastructure is set up and configured.
Fifth, scalability and making adjustments
are straightforward efforts, we simply modify
the code and apply those changes effectively.
Future promoting our infrastructure.
Lastly, our code based approach enhanced collaboration
among team members and enables rapid
responses to ever managing business requirements.
These guiding principles form the cornerstone
of our efficient, reliable and agile
infrastructure management strategies.
So with Timcity, you can store
all your configurations using Kotlin DSL.
With GitLab, you can do the same with YaML,
and GitLab also offers convenience
of performing checks after pushing.
Timcity serves as a linchpin in
our CI CD architecture. It workings several
advantages that streamline our development
and deployment processes. One of its key
strength is seamless integration with a wide
array of version control systems such as
GIT, SVN and Mercurial,
and that ensuring that our development workflow remains
fluid regardless our choice of version control.
Additionally, Teamcity's settings system
is exceptionally flexible.
It allows us to tailor our CI CD
pipelines according to specific needs and conditions that
truly sets it apart. Though it's
a capability for virtualization and distributed
builds, Teamcity dramatically accelerates the
build process by intelligently allocating tasks across
multiple agents, thereby reacting time to
market and enhancing productivity.
These features collectively make Timcity
an invaluable asset in our quest for
more agile, efficient and reliable
touch environment. GitLab runs
series of tasks automatically for your project
every time changes are made. It's ensuring
continuous code testing and readiness for
deployment. GitLab CI allows you to
automate various strategies of the deployment lifecycle,
from code testing to building to deployment to production.
One of its key measures is the ability
to create complex pipelines with multiple,
parallel and sequential tasks. It breaking
GitLab CI, a powerful tool for development
teams configuration is described using
the GitLab UML file
and it's making the setup process transparent and easy
customizable. GitLab CI is closely integrated with
GitLab itself that simplifying these setup,
monitoring and management of all aspects of CI CD
in auto building building are added
for which automatic building is assigned based
on certain continuous for example, based on time.
At the moment of the trigger for automatic
building, critical validation and building of server
configs are carried out first. After that
the services is built and clients for the three
main platforms are also built and after
successful completion of the builds, the server is launched.
After the server is successfully launched, the client
versions are uploaded to the app center so
they can be downloaded lately to devices.
Thus, thanks to autobuilds in the projects,
a fresh client server pair with the
latest changes that have been uploaded appears every
one or 2 hours. Here is
the steps to build a services. First assemble the server configuration.
Then they'll build these server based on these configurations and
finally deploy the server.
This functionality is convenient for testing
new features, fixing bugs, or conducting experiments
in an isolated environment without affecting these main development
process. As a result of the build the server configuration,
server cloud virtual machine are created and these
server is launched. A unique name for a virtual
machine is assigned to each user in team city
and in the client you can select login address from the list
of shards to connect the server.
Created servers are deleted twice a week,
on Thursday night and Saturday night and
these creating a demand servers provide flexibility
and autonomy in the development and testing process.
The development process utilize merge request system
to ensure that individual changes do not break,
build or obstruct these others work.
Developers create a new branch from an updated
develop head, make some comments related
to specific tasks or bugs and these
push these changes to the target branch.
A merge request is these created
either immediately after these first push
or right before merging into develop branch.
The merge request undergoes various automated checks
and possibly manual reviews. Once approved,
the merge request can be automatically merged when
all tests pass,
breaking the developer to move on other tasks.
If can merge request becomes outdated or irrelevant,
it's essential to close it to prevent clutter and
confusion in the list of active merge requests.
The validation system services as an integrated tool
within a unity project for validating assets and
configurations within its interface.
User can execute predefined or custom validators
or selected assets or groups of
assets. Validators examine
specific aspects of can asset to ensure its integrity,
such as broken links. It can be marked as
critical to enforce checks during building
and virtualquest pipelines. Results are displayed
at the bottom of the window and any rows locked in unity.
So the next big part of our CI is
git hooks. These client hooks system
consists of several components including the installer
commit, message, pre commit, post commit, and post rewrite
hooks. Each services a specific function in the
version control workflow. The installer
is a binary that automatically downloads and updates
these hooks from central repository,
placing them in local git hooks directory,
and it also creates scripts to trigger the hooks
and keep them up to date. Commit message
checks the commands attached to the commits precommit
performs a variety of file checks, and postcommit
and posture write mainly handle notifications.
After a commit or rebase operation,
the protected branch hook restricts pushes
to certain branches, essentially making them read
only and preserving these integrity.
The rebase required hook ensures that
before you can push your changes, you must releases to have
the most up to date version of the repository, thus minimizing
conflicts. The new branch name hook validates the
naming convention of new branches.
Lastly, the message content hook enforces standardizer
to commit message through the use of regular
expressions. These tools serve as proactive
measures to maintain a high standard of coding
practices. Now let's dive
in our backend infrastructure.
In the architecture of the platform, a variety of
tools and frameworks are employed to measures efficient
and robust operation. Eclipse Vertex
serves as these backbone for the messaging
system and also provides clustered storage for untamed
data, offering a scalable solution for high speed data handling.
Complementing Vertex is Haslcast,
an in memory data grid that enhances performance and
scalability for data streaming and log
managing by kafka plays a crucial role as a data
broker. It's allowing for real time analysis
and monitoring. On the database side,
PostgreSQL is used to store persistent data that's
offering flexibility in data storage approaches.
Lastly, Ansible is utilized for automating server applications
configuration and it's ensuring
that the platform's diverse services are
seamlessly integrated and easily manageable.
The architecture of the platform consists
of several key components and designed to handle specific
functionalities. The account server is responsible for
user authentication and maintains information regarding all
connected game servers. Next, these game server is a central
hub for all game mechanics, logic and data.
That's ensuring that gameplay experience is consistent
and engaging. And on these
administrative side, game tool web
serves as a comprehensive tool for managing
both player accounts and server settings. Lastly, Game tools
ETL works behind the scenes and final
game logs from Patch Kafka
into Gametool database, thereby enabling robust data
analysis and reporting.
This HTTP server has its own database and
consists of several components. The authentication
component is responsible for user authentication and
distribution among the game servers and their front end
components. The building component process
in game purchases and the game server configuration
component is used to communicate with game
services, announce, maintains and perform
other releases tasks.
The server comprises a cluster of either
physical or logical nodes, each made up of multiple
components and services that interact seamlessly
through a common event bus. At the front
end, we have the front end component responsible
for managing TCP connections and verifying
clients via the account server.
It serves as the main gateway for client
server communication. These dispatcher queues
and delegates clients requests and messages
to the appropriate parts of the system.
Scheduler plays a crucial role in
time sensitive game mechanics, providing time and subscriptions to
various companies. Our DB operation
executor ensures smooth and asynchronous interaction
with databases, while the resource system
holds the configuration for game mechanics.
Moreover, all server activities are
diligently logged by our log system,
which sends these logs to an Apache Kafka
message broker for analysis.
The server also hosts an array of specialized
mechanical components such as these for
missions, quests and mail, making it
a comprehensive and flexible platform for an immersive
gaming experience.
Our game tool is an essential part of our
comprehensive game management ecosystem. It consists
of two pivotal components. First is
game tools ETL, which stands for extract,
transform and load. This component
is responsible for siphoning off game
logs from our Apache Kafka message broker.
Process these logs and then persist these
transformer data into its own dedicated database.
This ensures that we have a streamlined,
reliable repository for game analytics and insights.
The second part is game tools Web,
an administrative tool designed to offer a real
time access to essential services data.
The gaming platform incorporates a range of specialized services
to enhance user experience.
Proton is utilized for PvP and cooperative
gameplay, while little rewards offers
a universal system for storing
and ranking player achievements.
The friend service takes care of the list of
friends and provide referral information.
Each player also has a player profile which
gives a detailed account of their in games activities and
statistics. These replace services is
a tool for store gameplay. Replace additional
functionalities includes mail for in games,
messaging chat for broader social interactions including
group settings, matchmaking for effectively
pairing up players as opponents
or teammates can for organized group
activities, and push notifications to keep players
updated while real time information sent directly
to these devices.
Our current architecture combines
various systems and components coordinated through photon cloud
for real time multiplayer gaming. Photon Cloud
offers low latency data centers worldwide
and its versatility expands to applications
beyond gaming like text and video chat.
Our primary data storage is postgresql because
we find that relational databases are generally more
reliable and easier to validate than other data
storage models. For message brokering, we use
Apache Kafka due to its out of the box horizontal
scaling and high reliability. We also
use Hazelcast as in memory database that integrates
with Vertex, our framework for building reacting
applications. Our stack
includes vertex, which support multiple programming languages and operates
reactor pattern. Despite of its benefits,
vertex can lead to complicated code,
especially if the language are using isn't fully supported by
the framework. In such cases, alternatives like
Quasar project could be considered,
although Quasar wasn't breaking actively maintained when
we began our project in 2017.
For transactional operations, we've created a
custom object that allows linear operations within message
processing. This approach covers most of our use
cases. During testing, we discovered
and reported a log queuing issue in veritex,
which the developers have since addressed.
We monitor performance metrics using primateos
and visualize them using grafana.
This setup helped us fine tune our vertex configuration
and resolve bottlenecks.
Our game cluster is a collection of machines
running instances of vertex and Haslecast,
with each node running various game mechanics.
These mechanics are encapsulated in vertex verticals
which have different tasks like game model loading
or arcade tasks.
To manage all these, we use a comprehensive admin interface.
Scaling for performance is relatively straightforward.
Our current hardware can comfortably support
150,000 users,
and if we reach cpu limitation,
we can add servers to the cluster and our
postgresql setup might be the first bottleneck
in terms of scalability, but different synchronization with
Haslcast can elevate this issue.
Now let's review our blue green deployment process.
So why we choose to employ a blue green deployment
strategies for our operations? This decision to
go with this approach wasn't taken lightly as
it does come with its own set of architectural and operational
costs. However, in our specific context, these advantage
clearly outweighed these expenses.
These are two main reasons behind this decision. First,
and furthermore, downtime is not just inconvenient,
it's expensive. Even a minute
of downtime can have significant financial implications
for us, and the second reason is unique
to our focus on mobile games.
When we publish a new version of our mobile game
client, it needs to go through
a store review process which isn't instantiated
and can take several days.
This means we absolutely need to ability to
support multiple game servers instances concurrently
to align with the release cycles of mobile App
Stores. So these blue green deployment strategy
provides us with the flexibility and reliability we
need to meet our business requirements.
Our setup involves three main elements,
your client and two servers named alpha and
beta. Two objective is to
transition the game traffic seamlessly from alpha to beta,
all while ensuring that players experience zero
interruptions. This discrete migration
process involves not just these game servers and the
clients, but also specialized account server.
The sole role of this account server is to provide a client with
appropriate game server address for connection.
It also keeps track of the services status
which is essential meta information that helps coordinate the switch.
These goal is to make this transition as smooth
as possible so players remain blissfully
unaware that any change has even occurred.
Let's walk through how our game update
system achieves zero downtime, ensuring an
uninterrupted game experience for our players.
Initially, Alpha server is live while beta is stopped.
When a player enters the game, the client contacts the account server
to find out which game server is currently active.
The account server responds with the address of Alpha and
the client connects accordingly. Now, when it's time
to update, Alpha is declared as stopped and battery
is set to life. Alpha then sends
reconnect broadcast to all its
connected clients. On receiving this, the clients reestablished
their contact with the account server
which now provides the address for beta.
The client switches its connection to beta seamlessly,
all without the player noticing any disruption.
Through these coordinated dance between the
account services Alpha and beta, we effectively achieve
zero downtime during server updates.
There are some areas for power enhancements.
First, our quality assurance specialists have expressed the need
for a final testing phase on the new version of the game
services before players are allowed to join. Second,
we want to allow the client to complete certain activities on the same
game server where they target.
To facilitate these improvements, we introduced
a new server status called staging.
During this managing phase, access to the game server
is granted to a select group,
our QA specialists for
organizing final testing, and ordinary players who specify
these preferred game server during the login request.
These added layer of sophistication measures both
quality control and enhanced user experience.
These is how our enhanced game update mechanism works.
As illustrated in the given example,
initially Alpha is live and beta is stopped. With clients
connected to Alpha. The first change occurs
when alpha remains live but better transitions to
a managing status. This allows
our QA team to perform final tests on better while
keeping the bulk of the player traffic on Alpha.
Once beta clears QA,
it becomes live and Alpha switches to managing.
At this juncture, Alpha sends
out a broadcast to connect event. However, if a player
is engaged in activity like battle, the client
has the option to ignore these reconnect
signal and stay on alpha. Finally, when Alpha transitions
to stopped status, any new login
attempts are directed towards beta and
this games offers flexibility for various update scenarios
whenever we are rolling out a completely new game version
or simply pushing updates to fix bugs in
existing version. This ensures both robust
quality assurance and uninterrupted gaming experience for
our players. While having
multiple versions of servers running could technically
allow players who haven't updated their game to
continue player playing, it intricacies complexity.
These need to maintain both forward and backward compatibility across
different system companies like databases or interserver
interactions. And to simplify this, we've adopted
a strict responsive policy.
Client of version X will only connect to server of the same
version X and similar it for version Y.
This approach eliminates the need for double work
in maintaining protocol compatibility.
Within these same version. Server changes are permissible
as long as they don't affect the client and direction
protocol, giving us room for operational flexibility.
As a result of this, the account server now needs
to be aware of game server's version and
these client is required to specify which version of the game
server it wishes to connect to.
These streamlines the system while allowing
us ample space for ongoing improvements.
So version 20 is slated
to replace the existing 10. Once our QA
specialists have given beta the green light,
we initiate what we call a soft update.
During this phase, beta goes live and a fraction
of players gain access to their 2.0
client via their response respective App
Stores. If all goes well,
with no critical bugs, we expanding these to 100%
of the player base. Contrary to
the Bluegreen deployment strategy, the server
from previous version doesn't initiate any connections
when a new version is rolled out.
Now, if any issues surface, we employ these Bluegreen
deployment process to transition players to
third server gamma, which contains the
necessary fixes. Meanwhile,
players on the 10 client can continue a
bare session on Alpha.
Ultimately, we initiate a hard update,
shutting down alpha and halting all 1.0
login attempts. Players are
then prompted to update their clients to continue playing.
This nuanced approach not only ensures a smooth transition,
but also incorporates continuous plans for
unexpected hip rs.
Our server update process has been streamlined to
such an extent that it's
entirely managed by our QS specialists using a straightforward games tool
interface. Here is how it works in a nutshell,
the account server which holds its complete stage
in a database. It's entirely stateless,
making it highly robust and flexible.
QA specialist uses the games tool to instruct the account
server to change beta status from stopped to staging.
This is where a final check take place,
and once beta is confirmed to be live,
these same QS specialist uses game tool to prompt alpha
to send a reconnect signal to all connected clients,
initiating their migration to beta. This approach
offers a simplified, user friendly method for QA specialists
to manage the complex projects of server updates.
That's ensuring a seamless player experience while
maintaining the integrity of our game service.
This approach not only allows us to roll out game
service updates without downtime, but also enables us to quickly
address game mechanics, bugs and optimizations.
Imagine a scenario when a critical error secures during a particular
games activity. Players aren't left at large. These can
still enjoy their aspects of the game while we rapidly deploy
a fix. This ensures that their
next attempt at the original game activity
is likely to be error free,
another advantage that deserves special mention in our ability to
fix client side bugs through the server.
This is critical because updating the mobile client through the App Store
takes time, and that's making client sidebar potentially
more damaging than server side ones. There have
been instances there are minor adjustment to
server responses effectively convinced the
client to behave. While we can
always count on such fortunate outcomes, our blue
green development system remains our safety net even in
the most unexpected of situation.
So let's make some conclusions.
In summary, these three pillars of
our tech stack work in unison to create a
highly effective, agile, and dependable ecosystem for
both deployment and development. First, our CI CD
organizing acts as a spine of our development
structure. It not only integrates a large
team, but also allows for seamless update
without affecting the end users. Second,
our backend infrastructure is flexible, explicitly engineered for
scalability to meet the demands of a growing user base without
sacrificing performance. And finally, our bluegreen
development strategy that measures zero downtime
during software updates, giving our users a
seamless and reliable experience.
Collectively, these pillars establish technology environment
that is both state of the art and extraordinary
reliable.
Thank you for your attention and see you next time.