Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello and welcome to this session on GitHub, where we're focusing
on fortifying your code base with GitHub specifically. There's a lot of
great features in GitHub. We just got through GitHub Universe and there's some
amazing copilot and AI innovations there and so many features
that I think a year from now you'll wish you had started today to implement
some of these things. And I love that quote by Karen Lamb showing here as
we get started in today's session. My name is Travis and I work
as a distinguished software engineer for a company called SPscommerce.
And you may not have heard of SPS commerce. That's because we're a business to
business organization that's focusing on connecting suppliers and retailers together
into a massive retail network, the world's largest retail network,
in fact. And I focus specifically there on developer experience.
And you might be asking yourself, developer experience, what does
that mean exactly? That can mean so many different things to different people.
And over the last few years, this is one of my favorite definitions that I've
seen pop up and land on. And that's the developer
experience is the activity of studying, improving and optimizing
how developers get their work done. So we're not interested in how a developer is
going to communicate with the HR department to change their address.
Instead, we're focusing on how we can engage with the developer and have
their user experience and their developer principles line up to form this frictionless
experience that they can use day to day to deliver code to
production, to deliver features to production. And that's
so important, especially as you think about the history of your organization, the number
of existing tools that you have that are kind of forming this different experience of
this CI CD tool, this source control tool, this observability tool.
And we need to all bring them together to form this nice cohesive ecosystem that
allows you to have the best quality of life possible. And one of my favorite
quotes kind of describing this problem is that developers work in
rainforests, not planned gardens. This idea of
a rainforest or a jungle, that these tools have really
popped out in your organization over the last 20 years when
you needed a particular need, but they haven't been curated together or planned together
what that ecosystem looks like. And so as we think about how we can
more effectively create planned gardens for our developer experience,
the reality is that there's a lot of work to do,
especially when we think about just coding alone. As an engineer
specifically or developer who's writing code to deliver to production,
your job is far more than just delivering code. In fact,
you are expected to deal with infrastructure as code, CI,
CD pipelines, dev environments, configuration.
You also, which is really important for today's discussion,
have to deal with a plethora of supply chain, SaaS, Das, remediation issues,
all related to security. And on top of that, or I should say
on bottom of that, you have to deal with code quality, tech debt, feature flags,
testing and of course just the overhead of the day to day
operation within an organization, whether it be meetings or management or
just other stuff. And when we examine this and we pull the
stats from software, we find that developers code
on average 52 minutes a day. That's not very much. And so
we need to make that 52 minutes longer and better and
better quality, better quality of life so you can accomplish more during that. From a
productivity perspective, this quote from software
CTO Mason McLeod, who says code time is often undervalued,
continually interrupted and almost wholly unmeasurable. And I definitely agree
with that, especially in my coding experience. So we need to
work to improve daily work. We need to fix bottlenecks,
we need to include more automation, we need to reduce feedback cycle
durations. Codified best practices is one of my favorites. I don't want
to have to read a whole bunch of documentation. I want it to be part
of the process that I'm working in and the tool set that I'm working with.
Effective documentation is so important. We so many times don't
even think about documentation and how important it is to be not just present,
but also accurate and of course streamlining collaboration.
And one of the key toolset that we find in developer experience
that can impact many of these areas is GitHub. And GitHub
has had a long, interesting journey from when it started started way
back as early as 2008. Right? That's when we first saw GitHub
and they were really focused on the idea of git repository hosting.
No longer a pain in the ass. Finally a code repository that works
as well as you do, which is incredible. At the time we were just happy
to get managed source control that worked so excellently quickly.
They realized what they were onto. And in 2011 we see their mission and their
focus move towards this, lowering the barriers of collaboration
by building powerful features into our products that make it easier to contribute,
which is true. We see them moving just beyond githing and saying we're going to
allow you to collaborate better. And of course, moving back
to the acquisition from Microsoft in 2018, we see the complete developer
platform, build, scale and deliver secure software and
if you've been paying attention, especially to GitHub universe, there's lots of
new, exciting features that were launched even this particular month.
And so now GitHub has transitioned, as of November
2023, to the world's leading aipowered
developer platform. And that's an exciting place to be in.
But at the same time, recognize that staying up to date with GitHub features is
almost a full time job. It would seem if you track the
releases per month, I'm only going back as far as 2018. You can
see that we're getting as many as 60 70 releases,
feature releases of GitHub per month. And that's just so many
explosion of capabilities that are both exciting. But have
you worrying about what do I focus on? What don't I focus on? So I
found a lot of our teams are looking for the hints at where to explore,
where do I go next? So as we dive in today on fortifying your code
base, we're zoning in on GitHub on how we can maximize your developer productivity,
specifically with two GitHub tools.
This is important. If we look at the Gardener 2020 report,
it says that 29% of organizations have the shift towards
consolidating security vendors due to operational inefficiencies.
And we see that growing. That grew to 75%
on the same report in 2022. And I imagine in 2024 it's going
to be even more interesting on top of that. And so what is
that all about, shifting security vendors due to operational inefficiencies?
Well, we find some answers deeper inside the Dynatrace report, focusing on
application security, where it talks about tool sprawl. And if you're
in developer experience, you know, tool sprawl is a big problem. We have so many
tools all over the place, and this comes back to that curated
garden that we want to build. It's very difficult when you have so much
individual or independent tooling and incumbents that are there. And so as
we look to this and we gauge we're already in source control, GitHub does
so much of what we need already. What if it could do more? What can
it do for us from a security perspective, to bring in that tool sprawl
and allow us to focus on what we do best in code?
And GitHub really is in some cases that swiss army knife of
tooling. But at the same time, some of the tooling that it has, a lot
of the tooling it has, does an incredibly great job of integrating with the ecosystem.
And so today we want to look at Dependabot which is all about transparency
and automation to keep your supply chain dependencies up to date.
And it's going to be super effective. If you haven't seen Dependabot yet,
it's going to feel like a breath of fresh air. And of course, GitHub advanced
security we've seen recently take a large presence
on GitHub and it's all about the centralization and the transparency
of code security, really focusing on static code analysis and how
it can support that. And so with that, let's dive in. Let's take a
look at GitHub Dependabot. And this is all about
supply chain security. And in this particular feature,
GitHub defines it as monitor vulnerabilities and dependencies
used in your project and keep your dependencies up to date with Dependabot.
What does that actually mean? Don't worry, we're going to explore it. But this idea
that in all of your repositories, whether it be pypy
packages or like a requirements TXT, whether it be a nuget config
for. Net or whether it be a maven settings,
XML, whatever you have, whatever ecosystem you're
in, you have a number of dependencies. You rely on abstractions that
are really important, but keeping them up to date can feel like a
nightmare, right? But if we look at the mend IO 2021
report, it says that over 90% of cves aren't present in most recent
dependency versions. That's incredible. That means that the single best
security practice that you can do in terms of consuming external supply
chain security is to just keep your packages up to date all the time.
Just use the latest and you're going to save yourself a lot of pain.
And I like to think about this as Mendio describes it, which is kind
of like going to the dentist. If you only update your dependencies every five
years, it's going to be painful, right? It's really going to hurt. But if you're
doing it every month or continually every week, it becomes second nature.
It's a simple best practice, right? Just as we think about CI CD
and doing that more often, and so we'll dive into three components of dependent
bot alerts, security updates and version updates.
All right, so first bit of an overview. If you go into
your GitHub, you're going to need admin access to your repository and you'll be able
to find this security section that we'll be exploring today, which is code
security and analysis. And it's got a dependency graph
present. And dependency graph has been around a long time in GitHub and basically maps
all of these supply chain dependencies. So that way you can generate a pretty clear
software bill of materials or an s bomb. And turning that on
is free and cheap and easy and there's no reason you shouldn't use your dependency
graph. And once you have that data set enabled, then you
can begin to take advantage of the dependent bot features that we just introduced
and there you'll be able to then drill in. You can
see your dependency graph where you can actually take a look at all the packages
in your repo or better yet, see what dependencies are used across your entire organization
as a part of that sbom. And when you drill into it,
then you'll be able to look at your dependent bot alerts. And so by enabling
the dependent bot alerts, we can very quickly see well, here's my dependency graph,
but highlight for me the things that are critical or high concerns
related to cves that are out there. And you get that as a part of
your security tab that you can see here. And on that security tab you can
drill in and check out the individual details of each and every one of these.
And there's no other infrastructure you have to turn on for this, you just simply
have to enable the feature. Once it's enabled,
you'll be able to drill in. And from here you can do a couple of
things. First, that's pretty neat is you can actually create a security update
immediately from this particular issue, and it's going to create
a pull request on your repository for you. If you decide that
this isn't a fix that you need to make, or perhaps the surface area of
this particular cv doesn't affect the way that you're using it well, you can easily
dismiss it. And there's plenty of workflow options that allow you to track and
see why certain things were dismissed over time. And so you also
have the option in your organizational settings to turn
on this capability across the entire organization. You can enable and disable
all from it as an administrator and an.org owner.
However, a word of warning, as you begin to turn on and play with these
features, especially the ones that actually create pull requests,
that's the security updates alerts. Just remember, tell me about
a problem. Security updates actually submit pull requests when there's a security
concern disabling, or I should say enabling security
updates for everyone. Keep in mind that if you have 3000 repos in your
organization, you're about to turn that on across the board and
each one of those may submit a pull request, which in turn will submit
a status check related to your build provider, and all of a
sudden you're about to kick off a plethora of builds that's really going to jog
up that queue, I think. So just be careful as you think about organizational
rollout, but it does seem pretty trivial and easy
to do. So here. You can also find views at that level
about who has it enabled, who has alerts enabled, versus security updates,
and how many of your repos are protected version updates.
Take us to the next level then they say, I don't just want security updates,
actually give me updates for all packages that are out there, any package that I
have in my ecosystem, and I'm a big fan of using version updates across the
board. And GitHub defines version updates as automated pull
requests that keep your dependencies updated even when they don't have
any vulnerabilities. And so you can see here an example of
a pull request that's been created that clearly outlines
an update that I'm making for this particular package, and has release notes and
commit information available to you, as well as labels that are there. And the
supported ecosystem is pretty substantial here. I think you'll find that
a lot of the core languages that you work with will be supported, whether it
be go, maven, gradle, NPM, nuget,
PiP, Elm, even some interesting ones that you might not
have thought of would be docker, for example, or terraform modules, or even
git sub modules or GitHub actions can all be updated.
If you're specifying a Docker file and it uses semantic versioning,
you can automatically have that from statement updated as a part of Dependabot.
And a little bit on my wish list is that Helmchides could be part of
that too, but maybe we'll see that in the future. It does support
private feeds as well, so you likely have internal packages
that are part of your organization, and you can include those here as a
part of it too. And organizationally configure secrets that
would allow private access to a JFrog feed. For example,
you can specify an update schedule, which is important because you don't always just want
to update in real time. Sometimes you want that to happen on a regular cadence.
You also have metadata configuration, and we'll talk about the metadata configuration options
in a second. And we have behavioral configuration, and we'll see that
too. So as we begin to explore, you'll find that that dependency graph now is
going to be populated. And as a part of that, here's where you can generate
that s bomb that we talked. But, and 83% of security teams don't
have access to a fully accurate s bomb in real time, which is crazy that
you can have that for free here. You can automatically hit the
check for updates and you can look for updates anytime that you need to and
process through that. All right, so moving on to configuration. Now, version updates
are not configured through the UI like the rest of the dependent bot capabilities
were. Version updates are actually going to move into source
control and configure it in the way that you expect with the YamL file.
So you're going to create a yaml file called Dependabot Yaml,
and you're going to place that under your GitHub metadata folder that exists
in your repository here. Then we're going to specify version two because dependabot comes
from a previous preview that had a different schema. So we're just specifying the version
of schema we want to use, followed then by a series of registries.
These could be private registries inside your organization that you want to make use of.
In this case, I'm going to use a private Nuget feed that's attached to Azure
DevOps. And you can see here that I can tokenize and use secrets
that are pulled from the organizational level, which is great. It means I can use
this configuration across many repositories.
And now I'm going to indicate the ecosystems I want to update and the directories
for those. So if you have a monorepo, you can specify multiple ecosystems
in a single file and specify just one if you need. And you can set
that schedule here in the interval of how often you want to update. You can
also have several other options around open pull request limits. In this case,
I'm going to say I don't want any more than ten pull requests ever at
a time. You can also include additional metadata around custom labels,
signees, reviewers, commit messages, lots of information
you can explore for how you want to customize and piece together your workflow for
how it creates pull requests. What's neat though, is that you
have the ability to ignore certain dependencies. In many cases you
have some of your capabilities, or I should say some of your
packages are updated in like a nightly build, and you might retrieve those far
more often than you want. An example of this that I've seen is like AWS
SDK seems to have almost a build every single day for
some of them. And well, I want that build. I want to get updated.
Boy, I don't necessarily want to worry about it every single day,
maybe once a week or whatever that cadence is. You can ignore certain types
of updates, and you can also ignore in some cases, if you're not ready
to make a major upgrade to your system, ignore major version numbers or patch version
numbers, depending on what you want. One of the largest additions
that makes Dependabot even so much better now than it was a few
months ago is the ability to handle grouped
pull requests. And by that I mean we won't actually group several changes
or several package updates into a single pr.
And that's essential because it causes a lot of problems, a lot of noise,
by generating ten pull requests. In some cases, the granularity is
too small that updating one package causes another one to break, and you'll never
get both of those to pass your status checks as it creates those pull requests
in GitHub for you, requiring some manual intervention and moving between branches
in order to figure it out. And so this is why grouped updates allow
us to say, hey, take all of those test dependencies and squash
them together into one pull request. Take those core dependencies and those
packages that rely on each other. Make sure they're together in one pull request.
Take all of those AWS updates and make sure they're in one pull request
together, not individual ones. And this is pretty essential,
I think, for the effectiveness and the productivity
of dependent but, and so if you've come from dependent but years ago and you
thought it's too noisy for me, try it again, because this is a big difference
that's enabled now and available. So custom
groups are awesome. I can add those. I can add exclude patterns per group
so I can say include all these, accept these. You can also
do a catch all where you could actually say I want all my dependencies
in one easy pull request. And that makes it nice and easy to validate
and merge when it's successful. But what about when it's not successful?
Then you have to try and filter through and understand exactly which update failed
what? So there can be good and some bad with that. It also supports dependency
types as well. So you can say, hey, I want all of my production dependencies
or development dependencies if your ecosystem supports
that. And of course you can do other update
types to say, I actually only want to update minor or patch versions,
don't give me major version updates. Those are something that I need to plan for.
I can't just have prs being open for. And so the usage of
dependent bot with grouped updates and updates in general is critical. I know,
at SPS commerce, one of the key use cases that we have as well is
inner source distribution, really focusing on velocity.
And so internally when you're setting up a new library and you're distributing it and
your applications are consuming it, typically the only reason
these applications are going to update a version number without something
like Dependabot is because they did an initial install, they're doing
a major upgrade, or they need a feature that's actually as a part of that
and they've been following it. Otherwise the only way you're going to get upgrade is
through Dependabot. And so if you're interested in that at all, feel free to
check out. I have another session at other conferences
called compelling code reuse in the enterprise. You can feel free to Google
that and find it online as well. But this is essential to enabling
inner source distribution and velocity. And you can filter your
updates independent, but by using the allow tag and
saying I actually only want this individual dependency to be updated. And so if you're
not going to use it for the rest, at least use it for your internal
organizational velocity.
And so with that, a couple of thoughts. Some pitfalls. If you're not
using grouped updates, you need to be, because that is a big difference here that
makes it go ten times further. There's no auto merge capability.
So assuming your checks pass and everything's good, there's no ability
to merge it in without some additional extensions or using GitHub actions in order
to accomplish that. And I would love a feature here that allowed us
to look at the package maturity or the package age and say,
I only want to include updates for packages that are x number of days old.
I want someone else to go through the process of finding those particular bugs and
kind of have a pre baked period for that.
There are alternatives. If you're not in the GitHub ecosystem and you're really
liking this one alternative out there, it's kind of deprecated. Now is new keeper.
It was kind of new get specific. But it had just a ton of features
and was really before its time. And a more popular one
then would be renovate that you can make use of and renovate is cross platform
and provides a lot of the same functionality, if not even more capabilities
in some cases. Merge queues if you're using
merge queues, which is a brand new GitHub feature as well, we don't have time
to cover that today. But you can actually integrate and use merge queues along with
dependent bot to try and get some of that grouped update effect in there kind
of throttle some of those deploys a little bit. So that way you can group
a number of merged dependent bot updates all at the same time and
custom dependencies. So looking at this,
trying to understand your dependency chain, what's proprietary,
what's internal, can be helpful, but can also be really
problematic as well. And of course, from a security governance
perspective, enable those defaults, get your dependency graphs on, get your
alerts on, and have access to that s bomb,
and begin to assess what your organizational kind of perspective
looks like from security. And you'll be able to actually see
who's using some of the packages you maybe thought are a little bit funny.
So with that, I want to move on to GitHub advanced security.
And while dependent bot was all about supply chain kind of
scanning other people's code and consuming other people's code, GitHub advanced
security is a feature that is all about thinking about the practices around your
own code security. So now the code that we actually write, and so that's why
it pairs very well. And going back to our introduction,
you'll recall that we talked a lot about this tool. Sprawl and
team silos and Dependpot is great,
but it doesn't necessarily allow you to hook in with other tools. What we're going
to find is that GitHub advanced security provides a centralization,
a mechanism for visibility of not just information that
we're seeing related to GitHub itself that is generated, but how we
can integrate other tools into the same interface as well,
which is a massive advantage compared to what we're seeing elsewhere.
And so we want to do a little bit of an overview. We want to
check out code scanning, and we want to then separately check out CodeqL,
which is going to interact with code scanning to provide some static analysis as a
part of that centralization. And as we get started, we'll see
a couple other components here with GitHub advanced security as well.
First is you're going to be in the same section of security that we were
before for dependent bot, but you're going to scroll down the page a little more
in your settings, and you're going to find GitHub advanced security in there. It's got
these two sections that you can enable here,
enabling then gives you access to code scanning
and secret scanning. And so code scanning
basically is what we're going to focus more on in a minute. But to give
you a preview of secret scanning, we'll see that too.
And that's where we can receive alerts or even block commits to your
repository that it thinks contains secrets.
For GitHub advanced security. It's important you know that this is a paid portion
of the ecosystem. And so depending on if you're a public repo or
you're an enterprise or what your implementation of on premise
is, you'll have to look at the licensing for this. And the licensing
is a bit odd, mind you. It's actually one license per user for every
active committer, which is the last 90 days on your particular repository.
And once you're licensed in that organization, then you don't take up a license in
another repository that's there. So just be mindful of that.
But as we dive into secret scanning, I think you'll find that it's interesting to
see that push protection, when it went
generally available for public root pools, blocked over 17,000
credentials in one year, which is incredible. And so enabling
secret scanning is a no brainer. If you have the license, you're going to want
to turn that on and you can verify then if a secret is valid or
not as well. So as it detects a secret inside your
repository or the code that you're committing can actually go and verify that
with providers. So think about AWS and taking those
particular credentials and seeing that not only did I find credentials that match a
pattern, but I've actually validated these credentials are real and they work.
That's obviously going to raise a much larger security risk
than invalid credentials or credentials that don't match a particular pattern.
And so as we take a look at this and we're thinking about the
number of blocked credentials in a year, think about the impact this can have to
your organization. I'm sure your security team would love that. And in
addition, you can also add custom patterns that you can see there in the background.
You can block the protection. So as someone commits, don't even let them
commit, they're going to see this message here instead that says, hey, I see a
secret in your code. I see a secret in your code based on this custom
pattern or based on our standardized patterns that we see, you might
internally, for example, have your own implementation of a token and you can codify those
patterns across the organization and include them. But better yet,
if you're following GitHub universe, we saw that GitHub copilot, which is
basically finding its integration to everything we do in GitHub,
has the ability to auto detect passwords based on the context and
information around it. So that's exciting to see that being even more effective
for detecting credentials even without custom patterns in place. So that's
great, but let's dive into code scanning. Secret scanning is a no brainer.
Turn that on. If you have a license, there's no reason not to. But code
scanning has a lot more interesting architecture and details
that we need to think about. First of all, recognize that with code scanning it
allows me to include a number of tools. And so you can see here,
first thing it says is, well, what tools would you like to turn on that
can contribute to the code scanning of detecting anomalies
and coding errors? So first is the first class citizen of CodeqL.
CodeqL was a purchased product, or I should say an acquisition to GitHub.
It was originally the product was SEML, and now they've integrated that capability,
first class with integrated CLI that can upload
directly to code scanning capability here. So you
can go ahead and hit the setup option. And this setup option here is going
to create a GitHub action for you essentially, that has this ready to go
that can execute on your repository. And of course you can explore other workflows
and pull those up. And we'll just shelve the idea of codeql here
for a second now, and we'll talk about the interface that code scanning provides
that any tool can contribute to. First, here is the
interface. It looks a lot like dependent bot. In fact, you'll see when I go
to the security tab and I scroll down to the pendantbot section for
vulnerability alerts, or right below that is code scanning. And you also
see there's a secret scanning section. So it's all very nicely outlined on where
you find your alerts on different components. And here under code scanning,
then you get the same classic view the GitHub provides.
Here's a list of the different warnings or critical items or even notes
that we've detected related to your code specifically.
Drilling into one of those then gives you the nice view that you can see
exactly what happened. In this case, it's calling out a generic catch clause,
indicating that you probably should be more specific in your exceptions and not just
grab that. And of course you still have your workflow on the right. You can
see there where you can dismiss a particular code scanning item and say, I'm not
going to fix this, or this is actually just used in tests,
it's not production code, so I'm not going to worry about it. And that information
again is just part of the workflow that tracks. So you can see who and
the reasoning why they might dismiss something with a bit of a description.
And what makes code scanning so great? Not just the centralization
of it, but the fact that it executes on your pull requests.
And so when you're configuring code scanning in the security section, you're going to have
this option to say, what's your pull request check failure? Do I
want to fail pull requests if code scanning detects an error? Probably,
I think so the best thing that we can do is to bring this left
as far as we can, meaning for engineers and developers, the best experience is
I'm submitting a pull request. I'm going to have other people look at and make
comments on the pull request. Why not have code scanning automatically do that
as well, and reject or fail the status check?
That's exactly what I'm doing. That's a zone I'm working in. And so we
can configure the level of failure that we want. We can also configure
a status check here to actually bubble up as a first class citizen.
So you can see that check and see whether it's passing or failing.
But the best part about code scanning on pull requests
is that it actually creates an annotation on your code as well. So just like
any other reviewer, you get that right on your code, only for
the code you changed. You're not actually going to see this for all errors in
your system, that doesn't make it easy for you to get a pull request in.
You need a kind of a baseline start from. But code scanning by default will
only block you if you're introducing a.
Net new item in the code that you've changed.
And so in this case, here's a warning saying I have a useless local variable
and I've also configured to give me code warnings. I don't just care
about security related information, give me some obvious things like unused
variables, because I can just clean up my code too,
once you've worked with it in a pull request like this, it's so nice that
this takes away some of that manual effort that maybe an
individual contributor would have come in and reviewed this and called out some of those
things. I can have all those things obvious things fixed and all the
security problems fixed before a reviewer even gets to my code.
And so in my mind, I love what Mike Lyman says from synopsis.
He says it makes no more sense to write code without code scanning tools
than it does to write a paper without spell check. Just like we're all using
AI now to help us as well. The differences with something like AI
and Copilot is, it still has the
potentiality to write security problems in it too, because it's trained based on our code
basis. So you're going to want to continue to scan all of
your code, no matter where it was generated or who created it.
And so for me, this is fantastic. Correlating alerts from different
tools is labor intensive with many false positives. But now
if I can shift this left as far as possible to the pull request workflow,
this is a huge key in ensuring that these things are fixed before they
even get introduced. And on top of that, with GitHub Copilot
and where it's going to take us, they've introduced the ability to auto fix,
meaning that right on the pull request. Now, when I have something, a useless
assignment to a variable, I can just hit the auto fix button and just clean
that up for me and just make me one step faster to some
of those tedious things that are maybe obvious. But as we
dive in more to this idea of what is code scanning and what is CodeqL,
it might not be entirely separated for you yet. And so I
want to just discuss the differences and where those barriers are a little
bit. Code scanning is the framework, right? It sits on GitHub. It acts
as a user interface that we can interact with that provides alerts and capabilities
that are tracking across the GitHub ecosystem. And you as an engineer,
a developer, and operator, we interact with those, whether at a specific repo or
at an aggregated level in your organization. But code
scanning and the rest of these tools sit outside of that. We choose when
we want to run CodeQL, formerly SEML,
or any of these other great tools that are out there, whether you're using Sonotype
or 42 cronch or checkmarks, all of them can also contribute
and upload information to code scanning, meaning that now I can begin
to pick and choose and use codeQL for code scanning,
but I can use 42 crunch to also submit security analysis on
an open API design. Or I can use another one of these providers
to submit information to code scanning about
infrastructure as code related concerns. So you can explore just
a ton of those other options. When I took this screenshot, there were 67.
I'm sure there's a lot more now, but essentially we get code
security analysis, and that's given to us from CodeQL. That's free.
We get code quality analysis, meaning I've enabled queries that not just for
security, but also those unused local variables and the other gotchas that I want
to call out. It is database driven. So CodeQL
is specifically going to create a database and index all your code locally, and then
you'll fire queries against it. That's how it operates. But the
queries that it runs are also open source queries that you can find on GitHub
today. You can take a look at and understand completely what kind of things
it's searching for in the code, and you're going to find that. CodeQL is pretty
well adopted across a ton of languages in the GitHub ecosystem, and these are definitely
all the core languages that we use at SPS commerce, since that makes a lot
of sense. But the key is that whatever tool you're
using, each of them are going to kind of execute differently. And you'll have to
investigate and explore that and figure but how you're going to upload then information
into the code scanning framework. Here's how it works for CodeQL specifically.
You'll have a GitHub repository, you'll have a
database create option. So you're going to call CodeQL database create.
You can say, here's my language, here is the database that
I want to create, and it's going to go and index against a
repository that you give it. And you can specify other
custom build commands that you want or many other overrides here.
But it's going to look at the CodeQL query packs and code queries that existing
on GitHub today. CodeQL, it's going to create that database
and then we're going to specify that query packs that we want to use here.
It's a QLS and the database that was created and
we're going to say create a serif file from this. So now it's basically taking
the database, taking the queries and executing all those commands.
The output of that then is a serif file. And a
serif file, if you're not familiar with that, is a static analysis results interchange
format. It streamlines how static analysis tools share results. So it's
a generic JSON schema essentially. And so you can
follow that schema by creating your own tools and uploaded code scanning or using
many of the existing tools that can follow that format and upload to it.
Now, of course, with the tight integration that we see between CodeQL
and GitHub, CodeQL CLI comes built in with a CodeQL
GitHub upload results, which is hitting an API endpoint that I can
pass the serif file to on that particular repository and that's it.
It's submitted to code scanning pretty easy and you can commit multiple
configurations to that. So different subdirectories, different tools,
they can all contribute and create this suite of capabilities that you're now
analyzing against your code base. Be asking, what is a CodeQL
query exactly? I'm no expert on CodeQL queries. I'm still
learning as well. But think of it as a standard kind of SQL
like query language that you can kind of drive. Where you're importing libraries,
you're using a from statement, a where statement, and a select statement. Here's an
example how you can really simply find an empty if statement
and then go ahead and write that as a custom
query. And so there's lots of tutorials you can find online about that for
writing custom queries. You can also define custom query packs, meaning I can just
configure the exact number of queries that I want to use in a YAML file
and then provide that to the CodeQL CLi as well to really fine tune
it. And they also come and query suites too. And you can create your
own suites internally for your organization. What makes sense for you? Kind of pull
those together. There is a vs code extension that can make that easy,
but you can see that generally speaking,
the CodeQL repository itself, where the open source maintained queries
are, is fairly popular, fairly regular, and is in
my opinion maintained by lots of great experts. And so I'm
glad to be able to pull in what they're doing, but also augment it with
some of the small minute things I might want to add.
So advanced security provides a ton of stuff, but there can be a
high setup cost and that depends. Are you using GitHub actions? It can be easy
to set up, but do you have specific dependencies?
Do you have specific requirements in order to build it that don't need
to be integrated with it? It will take a little bit of architectural understanding
in order to put that together, but in some cases it's as simple as running
the CLI tool, understanding your build command and away you go.
Dynastray says 62% of organizations use four or more solutions.
Well, I'm really glad that this is a simple integrated experience. This is one
final solution that we can put a lot of backing behind and
see it in one central pane of glass. It is remote only
and that's something to consider. A lot of our teams have asked about. Well,
I want some of that analysis done in my Ide locally
and you can see that information in your ide when
it pulls it from GitHub and you can see it locally and highlighted in your
code, but it's not generated locally. It has to be done on the server or
you have to do it as part of the Codeql Cli commands.
And that can take three, four, five minutes. So this is not something
that is comparable to linting in real time where you'll get those results. It's there,
and GitHub has indicated that's not their intention either. So you might want
to look elsewhere for some of those easier linting problems that you're solving.
And of course, the VS code extension helps you pull down that information and see
it and pull request workflow is fantastic. We all use that workflow
at our organization. And if you are, this is a great place where you can
put it in organizationally from a governance perspective and begin to rally
around it. Depending on where you're at and what
your investment in GitHub is, the cost can be significant, but we've
found it to be actually significantly lower than some of the other comparables
and some of the other tools out there that would do something similar. So there's
a really nice blend of capabilities and getting code scanning and then
using CodeqL as part of that for free.
As we said, you can write custom queries, you can bring your own. I'm really
looking for custom queries that we can write on YamL and JSON and basically non
supported languages. Even so, I can detect other things and other linting warnings
and other kind of organizational problems in our code bases
that we're seeing. But the complexity to writing custom queries does take
a little bit of onboarding experience and knowledge to get started with. So it's not
the simplest. And in terms of interoperability, it's a
huge win here. Ecosystem of tools in the standard serif format to
even build your own integration is the win that you're
looking for, I believe, and this is what we're looking for in terms of building
our ecosystem of security tools together. So that's all
the time that we have for today. Thanks for checking out this talk on fortifying
codebase with GitHub. I hope these two tools are something you're able
to take advantage of, especially dependent bot. That one's really easy to get started with.
Code security is a little bit more involved, but not that
difficult either, especially if you're already on GitHub actions.
And at the end of the day, this comes down to this quote that we
started, which is that developers work in rainforest, not planned gardens. And so if
we can bring the GitHub ecosystem a little bit more to being that planned garden
for engineers, let's give them that quality of life and let's continue to
work towards this centralized ecosystem and this single pane of glass.
So, thanks all, and we'll catch you at another talk.