Transcript
This transcript was autogenerated. To make changes, submit a PR.
Good day. Welcome to the session on getting started with GitHub
actions and some best practices along the way. I'm Ranjan
Mohan, the senior security engineer at my low security.
So a couple of prerequisites for this presentation would be that you
are familiar with Git and the concept
of commits, branches and remote versus local repositories.
And also familiarity with any remote git based service such
as GitHub, BitBucket or GitHub would be
useful diving right into the session what are GitHub actions?
In an oversimplified sentence, GitHub actions can be
specified as the automation platform on GitHub.
You could use this to perform any actions related to
git repositories hosted on GitHub. And these
actions, the way they function, can be split into
three phases. Broadly, they would be trigger
as to the trigger event that would trigger the
action to start or to run, it could be anywhere,
from anything from changing a branch to opening or
managing a pull request, or even scheduling a
expression, a cron expression to run the action periodically.
And these actions run on GitHub hosted runners,
which are basically virtual machines that they host in their environment.
And at the moment there are three operating systems that are supported
by GitHub runners, and that would be macOS,
Windows and Ubuntu. And once the action
is triggered, it actually runs. It runs the
specified commands and code that you have set to run as a part of
the action. And the way you define an action is in
a Yaml format using the syntax provided by GitHub specified
by GitHub. And once the actions runs,
you could use it to enforce status checks. And this is the phase
which distinguishes CI pipelines from
other sorts of pipelines. In case of continuous integration pipelines where
you want to ensure code quality checks or even security
scan checks, you could have GitHub actions that
could report pass or failure status and use that
to enforce pr merges or changes to a particular
branch to ensure that you have a base
code quality check for all changes that go
into your repository. Now when we talk about maintaining
GitHub actions, I have a few questions that
would help determine as to where or how you should meet him.
Do you reuse the same action for multiple repositories within the
same GitHub organization? If the answer is yes,
then you would want to store the action in the GitHub repository
of that organization so that it's
easy to set it up in the remaining repositories
of the same organization. The only problem
it solves is an ease of setting up the action in
other repositories, but there is still a problem that
it doesn't address, which is if the original action file,
the Yaml file changes, you would need to manually propagate the changes
to all the repositories using the action. The second question
would be does your action need access to secrets,
any secrets from GitHub secrets, or any other
secret storage utility that you have? If yes,
please ensure that you do not trigger any
of these actions from any untrusted forks. I will
explain and also show you why.
Now moving on to the Git
repository this is an organization that me
and my friends have been working on for the past three years. It's called padao.
It contains an array of Python and Java, and a few Javascript
libraries that you could use for your day to day becoming.
Now, as you can see, there are 30 repositories that are a part
of his organization at the moment. And for each organization
you could create a special repository called the GitHub repository,
which would house templates for
anything to do with our organization. For example, you could
host workflow or GitHub action related templates
over here. You could host the readme for the organization within
the profile folder, and you could also store issue
or pull request templates for the organization for all repositories in the
organization within the care sub folder. Now let's focus on the
workflow templates part of it. As you can see, there are
three actions over here within this folder. One would be the maven build
Yaml, other would be the Maven package publish yaml,
and another one would be the Python build yaml. We're going to look into the
Maven build Yaml as an example action. And as you
can see, there are three files associated with this action. One would
be the properties JSON which basically contains
the metadata for that particular action, the name, the description, and so
on. And then you have the support vector graphic file which is the icon
for the action, and the yaml file which actually contains
the steps that need to be run or executed as a part of the action.
Let's take a look at the JSON file. You can see it
has some standard metadata fields such as name description, icon name categories.
One interesting field over here is called file patterns,
which can be used to suggest this action
for any repository in your organization based
on any file patterns that are matched. For example,
any repository in my organization that contains a form XML file
will be suggested. This maven build, clean, test and verify
action in the action stack we'll be looking
into it shortly. Now, going back to the workflow templates
folder, let's try to look at the action and see
what are the different parts of it. We start out
with specifying the name of the action, and this on section over
here is the trigger section. This is where you specify what
triggers this particular action or workflow for
this particular action.
Pushing to the default branch or opening a pull request
against the default branch are the triggers. And if you
can see we're using a variable called default branch
which we can only use in template actions that
are specified within the GitHub repository. When you
try and set up this action in any of the other repositories
within this organization, it gets replaced by
the main branch for the repository you're setting it up for.
Now moving on to jobs.
We have created a job called build and we specify that it runs
on a bunch of operating systems which
are Ubuntu, windows and macOS. And I've specified
Java value as in the JDK version as
14.0.1. And the steps of the action or
this particular build job is as follows. It starts by
using a preexisting action on GitHub called checkout,
and this checks out the current repository. And then I've
specified a git command to check out any sub
modules if it's present in the repository. And then I'm setting up
the JDK using another predefined or pre built action
on GitHub. And after that I'm basically
using this step to cache any maven
related dependencies that I have pulled as a part of
setting up this project. So for example, if my
Java project depend has say 15 dependencies which it needs
to pull every time it needs to run the project or test the project,
if I end up caching it, this cache can be used
by any other runs
of this GitHub action to save some time on setting up the
dependencies. That is the advantage of this particular step.
And then we do a linking check using
a tool called check style, and then a static code analysis check
using a tool called PMD. And then I run all the Java
unit tests and using junit and I use Jacob
to enforce and verify code coverage.
So as you can see, this yaml file specifies the
set of steps that I would like to be run as a part
of this action, and it is triggered whenever
someone pushes to the default branch directly, or if they open a
pull request against it.
Now that we took an example and
we went through it, let's look at another
example of a GitHub workflow within a repository
called report. So this repository called report
is a repository that I personally use to generate reports
on the partial organization. It generates it
as a readme within the same repository.
So when we look at this particular action, you can
see that not only are the trigger points any
pushes to the main branch or pull request to the main branch,
which happens to be the default branch for this repository.
There is also a cron expression specified as
a schedule. So this ensures that this actions is run
once every day. So that's the
advantage of using any cron expression as a schedule.
You could specify the action to be triggered periodically or
at a specified time, every single day or on a periodic
basis. Another thing
that I've specified as a part of the triggers for
this action is something known as workflow dispatch. And what
this lets me do is it lets me manually trigger this action.
In general, whenever you just use pull, request,
push, or even schedule, there wouldn't be a
facility for you to manually trigger this action. But since a specified workflow
dispatch that is possible, and how we go about
doing that is we go to the actions tab, we click on the action and
then we would see this breadcrumb over
here which says it has a workflow dispatch event trigger
and it gives you an option to run this particular action
on any branch within this repository. So I
can select my branches main and click on one workflow
and we should be able to see the job shortly.
The actions and it has been queued
and when I click on it it's starting to set up the job.
It's running all the steps within the action file,
installing all the dependencies and this is quick because I have already cached
them and I'm retrieving them and
then I'm running the report. So after it runs the
report, it commits and pushes the report to the same repository, which you'll
be able to see, and it caches the current dependencies
in order for it to be reused later, and so on.
Perfect tools like the job has completed successfully.
Now if I go to the code and I look at the readme.
Nice. This happens to be the latest report from today, the 26
November at 06:56 a.m. UTC time,
which is the current time. And this report generation
action also runs periodically, once every day,
and it basically displays all the repositories in this organization
along with certain information as to whether it's maintained or not, whether it is
publicly exposed or whether it's a private report, how many open issues
are there, and so on. So feel free to use this template
to generate your own reports using GitHub actions.
Now coming back to the organization page,
one other aspect of configuring
settings for this organization would be looking
at a section known as secrets. See, whenever you run any
actions, you could use it for CI related
goals or any other tools, you might require some sort
of secrets or credentials in order to trigger API calls or
do some work. And for all those custom secrets
that you need, you would configure that in GitHub secrets
and then fetch it in your action. So how it goes about doing this is
whenever you configure something in the secrets section of
the organization, it gets set as an
environment variable within the runner running the action.
So you could just ask your script to fetch the environment variable corresponding
to the secret and use it. That's how you can use it.
But as much as this prevents us
from checking in the secret into
our repository, it also
poses one small issue. Let's take a look at this particular
pr. This particular pr is merged
into one of the repositories within the partial organization
and the main branch. So we're looking at the jpopper
repository, and the changes are coming from a user called
SIl s one 10, and they are getting
the changes from their fork containing
the branch called issue 33. So since they control
the code that they write in issue 33, they could
also modify the GitHub action to
read the secrets and expose it via
an API call to their own servers,
or just send it externally to any
other public facing service
that they own. So that could lead to compromising
secrets or any other sensitive information that you
retrieve using GitHub secrets in your action. So you need to
be wary whenever you run GitHub actions
on pull requests or changes that come from
fort repositories. And how we go about configuring
this is something that's interesting. When you go to
settings,
the actions tab lets you configure
only a restricted amount of things. It lets you say okay,
you can only allow the actions in this organization and
reusable workflows, or even actions that are created
by GitHub or verified creators. This helps with an
extent, but it still doesn't prevent people
with forked repositories from making changes and running that action
as a part of the new pull request. So we go to the organization
page and the settings section, and we
go to actions. And over
here we see a section called four pull request workflows
from outside collaborators, and we see three
different settings require approval for first time contributors
who are new to GitHub require approval for first time contributors
who are first time contributors to this repository,
any repository within this organization, and require approval for all outside
collaborators. Ideally, you would
want to require approval for all outside collaborators, and only when
the approval is successful, as in a person
has approved the pull request or the action request,
will the workflow run for
their changes. Another interesting setting
change over here is workflow permissions if you don't
like. For example, in case of the report repository,
I would need read and write permissions for the GitHub action
because it writes the report back to the repository. But in case you just
need to build and you don't need to write back anything to the repository,
always, always keep the read permissions as default.
Only read permissions as default.
Okay, perfect. Now that we were able to look at
GitHub secrets and how they're used, and what not
to do when dealing with pull requests from folks,
let's get back to the presentation.
So as a follow up
set of links, I have specified three links that you could
use to not only set up GitHub actions for your repositories,
but also maybe look into using
it for many other programming languages such as Go,
JavaScript, and so on. So the first link over here
has information on setting up starter actions for
a variety of programming languages on GitHub, and the
second link is to the organization report that I generate.
You could always fork it, modify it as
per your needs for your organization and go on. And I've also
specified workflow templates for Java and Python repositories
that we use in our organization.
I'll specify all these links as a part
of the description section as well. Feel free
to refer them, and feel free to post any questions, his comments,
or reach out to me via LinkedIn. I'd be more than happy to connect and
answer and help in any way. Thank you for your time. Hope to
see you now in the presentation.