Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
Thanks for joining me today at the Conf42 Kube Native 2024.
My name is Ramneet and I will be delivering a talk on leading high
impact launches, strategies for mastering critical engineering rollouts.
This talk is based on my learnings leading major initiatives such
as page sharing at Netflix.
I will dive into the roles and responsibilities of a launch
captain and will also share specific strategies and tactics that you
can use to master highly critical launches in your own organization.
So let's get started.
Now, what do we mean by a critical launch here?
Every launch is different.
Some launches require a lot more rigor than others.
Such launches typically have some common characteristics, such as
large scale user and revenue impact.
That is, if the launch is going to affect a large proportion of your customer
base and has high revenue implications.
This would also mean that the launch could be high visibility.
That is, it is tracked for a long time.
very closely by the senior leadership, which makes it even important
to ensure clear communication to seek inputs and share updates.
Such launches may also be a huge cross functional effort.
That is, they involve an army of cross functional teams to come together and
deliver, which brings coordination as a skill to the forefront.
Now, you may think that this can be a managed responsibility of
a couple of key stakeholders.
But that may not be the right approach and can result in diffusion of
responsibility or something known as the bystander effect, where people
are less likely to take an action where there are other people involved.
To avoid this, I would highly encourage you to consider a dedicated
role of that of a launch captain.
So let's talk about the responsibilities.
There are two primary responsibilities for a captain.
One is to ensure launch readiness and smooth execution.
And the other is to be the focal point of communication for all cross
functional partners and leadership.
Now you must be thinking who should be the launch captain in a team that's filled
with engineers, data scientists, design, and other cross functional team members,
who is best equipped to execute this role.
At Netflix, we typically have senior engineers play this role.
But it can practically be anyone on the team who's close to the solution,
although there are definitely some must haves that are helpful in
ensuring your success in this role.
The first is that you should be clearly able to understand the product
feature, the intended business impact, and the engineering systems involved.
You should be able to look at the launch from different perspectives of that of
a user, a customer service professional, an engineer, a data scientist, etc.
You should be able to visualize the launch to identify the expected
trends and the not so expected trends, that is failures and risks.
And you should also be a great collaborator because ultimately
you would have to lead the cross functional team towards a common
goal of smooth execution and success.
Now let's go a little bit deeper into the foundation.
The way I see launch captaincy, it has three support pillars for success.
The first is strategy.
Now a launch captain starts playing an active role in weeks leading up
to the while the entire team is heads down on delivering the right solution.
Making sure it's scalable, testing the solution, the launch captain is
looking further ahead, visualizing and preparing for the launch.
As part of this, the captain drafts a launch plan that captures details
of how a launch should be executed.
Identifies metrics that will provide a signal as to how
the launch is progressing.
It is always good to develop a mental model of what the metrics should look
like in case of a successful launch.
Identifying potential risks and mitigations is an important part
of this role and ensures minimal surprises on the day of the launch.
The last part is about reviewing functional and operational readiness.
Now functional readiness means the solution is as per the product
specifications and there are no blockers.
It is in your best interest as a launch captain to keep a tab on
the functional readiness and have regular checkpoints to flag risks.
Last but not the least, reviewing operational readiness.
Operational readiness is important because in a way it would determine your
effectiveness in navigating the launch.
It ensures that you have the right metrics and dashboards in place.
To assess how the launch is progressing and relay the status with confidence
to a larger group and the leadership.
The other two pillars are that of collaboration and communication.
In projects such as these, there will be a huge cross functional team and
you would have to work with all of them to ensure readiness on all fronts.
For example, uh, one of the user facing features could, where you could expect
users to reach out to customer support.
In that case, customer service, knowledge base articles and FAQs should
be updated in time for the launch.
And if the launch is sensitive in nature, you would want to very
closely coordinate these actions.
This underlines the importance of collaboration.
Under this, you would also need to identify key stakeholders and
their roles to support the launch.
Launch captain alone cannot navigate the entire launch and will always defer to
domain and system experts to weigh in, in case of issues or for validations.
The last pillar is that of communication.
As I mentioned, these launches can be closely tracked by senior leadership.
And can have larger implications on other business outcomes.
That is there are high stakes, clear and timely communication is key in ensuring we
cover the risks and keep the group updated and empowered to help where needed.
Launch rooms for such launches can have many teams and
navigating those discussions.
while keeping the group focused on the outcome becomes extremely important.
Now, having shared all the context and responsibilities, this role
may seem daunting to some, but in the next section, I'm going to talk
about specific steps you can take to master each phase of the launch.
And we will start with pre launch.
So the first thing you can do in the pre launch phase is run readiness
checks across teams and systems.
This could be a simple asynchronous check in on Slack or any other collaboration
tool that you use where every team reports their status to indicate if
they are on track for the launch.
Teams that are not on track should be asked specific steps that need to
be taken or issues that need to be resolved for them to be back on track.
These blocking issues could be related to dependencies on other
teams or bandwidth constraints.
The intent here is to identify and surface the risks up front.
It also informs and empowers the larger group and the leadership to help
unblock the launch whenever needed.
The second part is reviewing operational readiness.
As I said before, this is important to ensure you have all the right signals and
metrics in place to navigate the launch.
Identify key engineering and product stakeholders who would be the point
of contact to indicate the health and readiness of their respective teams.
And domains.
These stakeholders will also play a key role in ensuring that the launch
captain has all the required metrics and signals available to monitor the launch.
Collaborate with these people to identify the relevant metrics, map
them to a launch dashboard, and reach a consensus on the expected trends.
The dashboards are important and should all should answer all the expected
frequently asked questions, such as how many users signed up, how many
users bought an item, depending on your use case, these launch dashboards
should have a high level view and troubleshooting views, which to allow
more detailed debugging of issues.
The second part in the pre launch phase is to review the launch logistics.
And the main part of launch logistics is a launch runbook.
Launch runbook is a definitive guide or source of truth for executing the launch.
All the pre launch preparation and the work that you would do would get captured
ultimately in this launch runbook.
It includes sequence of steps needed to execute the launch,
along with all the other details.
I would highly encourage you to document.
Everything from little minor steps to reminders in the runbook.
Critical launches get intense.
So always assume you will likely not recall all the minor
details at the right time.
Add them to your runbook.
We will cover the runbook in detail in the coming slides.
Now, while the launch runbook is primarily drafted by the launch captain using
inputs from other team members, it is a good practice to review the final
runbook with the extended team and ensure alignment and a shared understanding
of the overall sequence, as well as the individual's roles and responsibilities.
The other part is, of launch logistics, is to plan for the launch day.
You should set up calendar invites for the launch day and for the launch check
ins, clearly marking individuals who are required, and to be present and share the
metrics that they will be responsible for.
So let's talk a little bit about the runbook.
As we said, this is going to be the source of truth for managing the launch.
And I have created, I've added a little bit of a snippet to show you what kind
of information can go into a runbook.
So we can capture the overview of the product or the feature
that we are trying to launch.
The teams involved, along with the launch accountability metrics, which is
primarily for the operational support.
There has to be a pre launch checklist, which is all the items that need
to be executed in a specific order.
And these could be items such as go, no go from all the stakeholders,
configurations that need to be enabled.
systems that need to be pre scaled and teams that need to be notified
before starting the launch.
The runbook then captures a launch day section, which talks about specific
steps that you need to execute and also how Uh, validate, how to
validate each step during the launch.
All the monitoring dashboards are captured in the runbook along with
mitigation and rollback scenarios so that you don't have to do a last
minute brainstorming on how to rollback something that has gone unexpected.
There's also a section for post launch, that is, when in post launch phase, you
will have launch check ins to capture metrics, which would go into the runbook.
And also issues that are under investigations and
tech debts and learnings that the group would share later.
Let's come to the launch day.
Now putting all this effort prior to the launch should ensure
that the launch goes smooth.
And even if there are unexpected issues, the team is ready with
possible mitigation measures.
On the day of the launch, as a launch captain, you should use
the runbook to guide the team for all the launch proceedings.
Lead the launch room with clarity and empower the group to voice
their concern or opinions while ensuring the launch stays on track.
Now there can be scenarios where there may be conflicting opinions.
Moderate the room to ensure everyone's voice is heard while ensuring we are on
track to meet the objective for that day.
which could be, for instance, a 20 percent ramp by this particular time.
The launch captain should feel empowered to hear dissent.
And in case there is no consensus in the room, you should be the informed captain
to decide on the next steps while clearly articulating the risk and the trade offs.
Let's jump to monitoring.
Any high impact launch usually has a staged or a ramp rollout, which means we
start rolling out to a small percentage of users and then slowly increase this
percentage over a span of hours or days.
In the early stages of the launch, where there is less traffic, it is
always useful to spot check the first few instances of success or failures to
verify if everything is looking good.
It is good to document the success and failure metrics and also compute
the success and failure percentage.
This helps us flag any deviations as the traffic ramps up since we would expect the
success and the failure percentage to stay somewhat in the same range as we ramp up.
It is awesome to see a launch go smooth without any issues.
But that may not be the case always or often.
During the launch, each team will be monitoring their metrics.
A launch captain or any of the team members can notice an
anomaly, which can be an unexpected trend or a spike in error.
As this is noticed, The launch captain should identify and call upon specific
teams to share their insights on the potential issue to help assess
two things, severity and impact.
The next step then is to summarize the issue for the extended group and clearly
communicate the impact and the severity, flagging whether this is a launch blocker
or something that's critical or a major.
If it is a critical or a major issue.
Capture it as issues under investigation on the run book.
And you can designate a small subset of the team to follow up on this.
If it is a launch blocker and needs more thorough investigation, lead a
smaller group into a separate war room.
While the investigation is in progress, keep the group informed
on the updates and help brainstorm possible resolution mechanisms,
concluding the launch day.
Now, hoping that the launch is all successful.
Always conclude the launch day with a status update and metrics.
Call out potential risks and issues encountered and the next steps.
It is possible that some teams have support models, have different support
models in the early phases of the launch.
So always align on the support models and the point of contacts
for each team in case there is any issue in the late hours of the day.
Last but not the least, the initial few days of the launch can be taxing.
Always include a thank you note and a reminder for the team to rest
and recover for the coming days.
Now let's talk about the post launch phase.
As we celebrate the launch and step into the post launch phase, we should
transition from active to active.
Active to passive monitoring.
That is, let the automated alerts take over once the launch is
stable and scale down on the real time or active monitoring.
We should also use this time to shift gears and shift the team's
focus towards tuning pageable and email alerts, which are going to
be the backbone of monitoring.
We can also use this time to follow up on any critical or major issues
that were reported during the launch.
It is a good and a healthy practice to initiate retrospectives, to gather
feedback and learnings for the future.
This brings us to the end of the presentation.
Thank you everyone.
I hope this was useful for all of you.
Again, my name is Ramneet.
If you want to follow up with me and have a discussion, I'm very
happy to connect on LinkedIn.
Thank you and have a good day.