Conf42 DevSecOps 2023 - Online

Fortifying Your Codebase with GitHub

Video size:

Abstract

In a time where security tool sprawl is out of control, centralizing your security processes becomes not just a convenience but a necessity. GitHub’s integrated toolset not only bolsters security but also significantly enhances the developer experience.

Summary

  • We're focusing on fortifying your code base with GitHub specifically. The session focuses specifically on developer experience. Effective documentation is so important. Developers code on average 52 minutes a day. We need to make that 52 minutes longer and better.
  • GitHub Dependabot monitors vulnerabilities and dependencies used in your project. Over 90% of cves aren't present in most recent dependency versions. The single best security practice is to keep your packages up to date all the time.
  • GitHub has a dependency graph that maps all of these supply chain dependencies. By enabling the dependent bot alerts, you can see what dependencies are used across your entire organization. But a word of warning, as you begin to turn on and play with these features, that's the security updates alerts.
  • Dependabot now allows version updates to move into source control. One of the largest additions is the ability to handle grouped pull requests. This is essential for the effectiveness and the productivity of dependent.
  • GitHub advanced security is a feature that is all about thinking about the practices around your own code security. It gives you access to code scanning and secret scanning. It's important you know that this is a paid portion of the ecosystem.
  • Code scanning allows you to include a number of tools. It executes on your pull requests. The best part about code scanning is that it actually creates an annotation on your code. What makes code scanning so great?
  • Code scanning is the framework, right? It acts as a user interface that we can interact with. The queries that it runs are also open source queries that you can find on GitHub today. But the key is that whatever tool you're using, each of them are going to execute differently.
  • So that's all the time that we have for today. Thanks for checking out this talk on fortifying codebase with GitHub. I hope these two tools are something you're able to take advantage of. Let's continue to work towards this centralized ecosystem.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello and welcome to this session on GitHub, where we're focusing on fortifying your code base with GitHub specifically. There's a lot of great features in GitHub. We just got through GitHub Universe and there's some amazing copilot and AI innovations there and so many features that I think a year from now you'll wish you had started today to implement some of these things. And I love that quote by Karen Lamb showing here as we get started in today's session. My name is Travis and I work as a distinguished software engineer for a company called SPscommerce. And you may not have heard of SPS commerce. That's because we're a business to business organization that's focusing on connecting suppliers and retailers together into a massive retail network, the world's largest retail network, in fact. And I focus specifically there on developer experience. And you might be asking yourself, developer experience, what does that mean exactly? That can mean so many different things to different people. And over the last few years, this is one of my favorite definitions that I've seen pop up and land on. And that's the developer experience is the activity of studying, improving and optimizing how developers get their work done. So we're not interested in how a developer is going to communicate with the HR department to change their address. Instead, we're focusing on how we can engage with the developer and have their user experience and their developer principles line up to form this frictionless experience that they can use day to day to deliver code to production, to deliver features to production. And that's so important, especially as you think about the history of your organization, the number of existing tools that you have that are kind of forming this different experience of this CI CD tool, this source control tool, this observability tool. And we need to all bring them together to form this nice cohesive ecosystem that allows you to have the best quality of life possible. And one of my favorite quotes kind of describing this problem is that developers work in rainforests, not planned gardens. This idea of a rainforest or a jungle, that these tools have really popped out in your organization over the last 20 years when you needed a particular need, but they haven't been curated together or planned together what that ecosystem looks like. And so as we think about how we can more effectively create planned gardens for our developer experience, the reality is that there's a lot of work to do, especially when we think about just coding alone. As an engineer specifically or developer who's writing code to deliver to production, your job is far more than just delivering code. In fact, you are expected to deal with infrastructure as code, CI, CD pipelines, dev environments, configuration. You also, which is really important for today's discussion, have to deal with a plethora of supply chain, SaaS, Das, remediation issues, all related to security. And on top of that, or I should say on bottom of that, you have to deal with code quality, tech debt, feature flags, testing and of course just the overhead of the day to day operation within an organization, whether it be meetings or management or just other stuff. And when we examine this and we pull the stats from software, we find that developers code on average 52 minutes a day. That's not very much. And so we need to make that 52 minutes longer and better and better quality, better quality of life so you can accomplish more during that. From a productivity perspective, this quote from software CTO Mason McLeod, who says code time is often undervalued, continually interrupted and almost wholly unmeasurable. And I definitely agree with that, especially in my coding experience. So we need to work to improve daily work. We need to fix bottlenecks, we need to include more automation, we need to reduce feedback cycle durations. Codified best practices is one of my favorites. I don't want to have to read a whole bunch of documentation. I want it to be part of the process that I'm working in and the tool set that I'm working with. Effective documentation is so important. We so many times don't even think about documentation and how important it is to be not just present, but also accurate and of course streamlining collaboration. And one of the key toolset that we find in developer experience that can impact many of these areas is GitHub. And GitHub has had a long, interesting journey from when it started started way back as early as 2008. Right? That's when we first saw GitHub and they were really focused on the idea of git repository hosting. No longer a pain in the ass. Finally a code repository that works as well as you do, which is incredible. At the time we were just happy to get managed source control that worked so excellently quickly. They realized what they were onto. And in 2011 we see their mission and their focus move towards this, lowering the barriers of collaboration by building powerful features into our products that make it easier to contribute, which is true. We see them moving just beyond githing and saying we're going to allow you to collaborate better. And of course, moving back to the acquisition from Microsoft in 2018, we see the complete developer platform, build, scale and deliver secure software and if you've been paying attention, especially to GitHub universe, there's lots of new, exciting features that were launched even this particular month. And so now GitHub has transitioned, as of November 2023, to the world's leading aipowered developer platform. And that's an exciting place to be in. But at the same time, recognize that staying up to date with GitHub features is almost a full time job. It would seem if you track the releases per month, I'm only going back as far as 2018. You can see that we're getting as many as 60 70 releases, feature releases of GitHub per month. And that's just so many explosion of capabilities that are both exciting. But have you worrying about what do I focus on? What don't I focus on? So I found a lot of our teams are looking for the hints at where to explore, where do I go next? So as we dive in today on fortifying your code base, we're zoning in on GitHub on how we can maximize your developer productivity, specifically with two GitHub tools. This is important. If we look at the Gardener 2020 report, it says that 29% of organizations have the shift towards consolidating security vendors due to operational inefficiencies. And we see that growing. That grew to 75% on the same report in 2022. And I imagine in 2024 it's going to be even more interesting on top of that. And so what is that all about, shifting security vendors due to operational inefficiencies? Well, we find some answers deeper inside the Dynatrace report, focusing on application security, where it talks about tool sprawl. And if you're in developer experience, you know, tool sprawl is a big problem. We have so many tools all over the place, and this comes back to that curated garden that we want to build. It's very difficult when you have so much individual or independent tooling and incumbents that are there. And so as we look to this and we gauge we're already in source control, GitHub does so much of what we need already. What if it could do more? What can it do for us from a security perspective, to bring in that tool sprawl and allow us to focus on what we do best in code? And GitHub really is in some cases that swiss army knife of tooling. But at the same time, some of the tooling that it has, a lot of the tooling it has, does an incredibly great job of integrating with the ecosystem. And so today we want to look at Dependabot which is all about transparency and automation to keep your supply chain dependencies up to date. And it's going to be super effective. If you haven't seen Dependabot yet, it's going to feel like a breath of fresh air. And of course, GitHub advanced security we've seen recently take a large presence on GitHub and it's all about the centralization and the transparency of code security, really focusing on static code analysis and how it can support that. And so with that, let's dive in. Let's take a look at GitHub Dependabot. And this is all about supply chain security. And in this particular feature, GitHub defines it as monitor vulnerabilities and dependencies used in your project and keep your dependencies up to date with Dependabot. What does that actually mean? Don't worry, we're going to explore it. But this idea that in all of your repositories, whether it be pypy packages or like a requirements TXT, whether it be a nuget config for. Net or whether it be a maven settings, XML, whatever you have, whatever ecosystem you're in, you have a number of dependencies. You rely on abstractions that are really important, but keeping them up to date can feel like a nightmare, right? But if we look at the mend IO 2021 report, it says that over 90% of cves aren't present in most recent dependency versions. That's incredible. That means that the single best security practice that you can do in terms of consuming external supply chain security is to just keep your packages up to date all the time. Just use the latest and you're going to save yourself a lot of pain. And I like to think about this as Mendio describes it, which is kind of like going to the dentist. If you only update your dependencies every five years, it's going to be painful, right? It's really going to hurt. But if you're doing it every month or continually every week, it becomes second nature. It's a simple best practice, right? Just as we think about CI CD and doing that more often, and so we'll dive into three components of dependent bot alerts, security updates and version updates. All right, so first bit of an overview. If you go into your GitHub, you're going to need admin access to your repository and you'll be able to find this security section that we'll be exploring today, which is code security and analysis. And it's got a dependency graph present. And dependency graph has been around a long time in GitHub and basically maps all of these supply chain dependencies. So that way you can generate a pretty clear software bill of materials or an s bomb. And turning that on is free and cheap and easy and there's no reason you shouldn't use your dependency graph. And once you have that data set enabled, then you can begin to take advantage of the dependent bot features that we just introduced and there you'll be able to then drill in. You can see your dependency graph where you can actually take a look at all the packages in your repo or better yet, see what dependencies are used across your entire organization as a part of that sbom. And when you drill into it, then you'll be able to look at your dependent bot alerts. And so by enabling the dependent bot alerts, we can very quickly see well, here's my dependency graph, but highlight for me the things that are critical or high concerns related to cves that are out there. And you get that as a part of your security tab that you can see here. And on that security tab you can drill in and check out the individual details of each and every one of these. And there's no other infrastructure you have to turn on for this, you just simply have to enable the feature. Once it's enabled, you'll be able to drill in. And from here you can do a couple of things. First, that's pretty neat is you can actually create a security update immediately from this particular issue, and it's going to create a pull request on your repository for you. If you decide that this isn't a fix that you need to make, or perhaps the surface area of this particular cv doesn't affect the way that you're using it well, you can easily dismiss it. And there's plenty of workflow options that allow you to track and see why certain things were dismissed over time. And so you also have the option in your organizational settings to turn on this capability across the entire organization. You can enable and disable all from it as an administrator and an.org owner. However, a word of warning, as you begin to turn on and play with these features, especially the ones that actually create pull requests, that's the security updates alerts. Just remember, tell me about a problem. Security updates actually submit pull requests when there's a security concern disabling, or I should say enabling security updates for everyone. Keep in mind that if you have 3000 repos in your organization, you're about to turn that on across the board and each one of those may submit a pull request, which in turn will submit a status check related to your build provider, and all of a sudden you're about to kick off a plethora of builds that's really going to jog up that queue, I think. So just be careful as you think about organizational rollout, but it does seem pretty trivial and easy to do. So here. You can also find views at that level about who has it enabled, who has alerts enabled, versus security updates, and how many of your repos are protected version updates. Take us to the next level then they say, I don't just want security updates, actually give me updates for all packages that are out there, any package that I have in my ecosystem, and I'm a big fan of using version updates across the board. And GitHub defines version updates as automated pull requests that keep your dependencies updated even when they don't have any vulnerabilities. And so you can see here an example of a pull request that's been created that clearly outlines an update that I'm making for this particular package, and has release notes and commit information available to you, as well as labels that are there. And the supported ecosystem is pretty substantial here. I think you'll find that a lot of the core languages that you work with will be supported, whether it be go, maven, gradle, NPM, nuget, PiP, Elm, even some interesting ones that you might not have thought of would be docker, for example, or terraform modules, or even git sub modules or GitHub actions can all be updated. If you're specifying a Docker file and it uses semantic versioning, you can automatically have that from statement updated as a part of Dependabot. And a little bit on my wish list is that Helmchides could be part of that too, but maybe we'll see that in the future. It does support private feeds as well, so you likely have internal packages that are part of your organization, and you can include those here as a part of it too. And organizationally configure secrets that would allow private access to a JFrog feed. For example, you can specify an update schedule, which is important because you don't always just want to update in real time. Sometimes you want that to happen on a regular cadence. You also have metadata configuration, and we'll talk about the metadata configuration options in a second. And we have behavioral configuration, and we'll see that too. So as we begin to explore, you'll find that that dependency graph now is going to be populated. And as a part of that, here's where you can generate that s bomb that we talked. But, and 83% of security teams don't have access to a fully accurate s bomb in real time, which is crazy that you can have that for free here. You can automatically hit the check for updates and you can look for updates anytime that you need to and process through that. All right, so moving on to configuration. Now, version updates are not configured through the UI like the rest of the dependent bot capabilities were. Version updates are actually going to move into source control and configure it in the way that you expect with the YamL file. So you're going to create a yaml file called Dependabot Yaml, and you're going to place that under your GitHub metadata folder that exists in your repository here. Then we're going to specify version two because dependabot comes from a previous preview that had a different schema. So we're just specifying the version of schema we want to use, followed then by a series of registries. These could be private registries inside your organization that you want to make use of. In this case, I'm going to use a private Nuget feed that's attached to Azure DevOps. And you can see here that I can tokenize and use secrets that are pulled from the organizational level, which is great. It means I can use this configuration across many repositories. And now I'm going to indicate the ecosystems I want to update and the directories for those. So if you have a monorepo, you can specify multiple ecosystems in a single file and specify just one if you need. And you can set that schedule here in the interval of how often you want to update. You can also have several other options around open pull request limits. In this case, I'm going to say I don't want any more than ten pull requests ever at a time. You can also include additional metadata around custom labels, signees, reviewers, commit messages, lots of information you can explore for how you want to customize and piece together your workflow for how it creates pull requests. What's neat though, is that you have the ability to ignore certain dependencies. In many cases you have some of your capabilities, or I should say some of your packages are updated in like a nightly build, and you might retrieve those far more often than you want. An example of this that I've seen is like AWS SDK seems to have almost a build every single day for some of them. And well, I want that build. I want to get updated. Boy, I don't necessarily want to worry about it every single day, maybe once a week or whatever that cadence is. You can ignore certain types of updates, and you can also ignore in some cases, if you're not ready to make a major upgrade to your system, ignore major version numbers or patch version numbers, depending on what you want. One of the largest additions that makes Dependabot even so much better now than it was a few months ago is the ability to handle grouped pull requests. And by that I mean we won't actually group several changes or several package updates into a single pr. And that's essential because it causes a lot of problems, a lot of noise, by generating ten pull requests. In some cases, the granularity is too small that updating one package causes another one to break, and you'll never get both of those to pass your status checks as it creates those pull requests in GitHub for you, requiring some manual intervention and moving between branches in order to figure it out. And so this is why grouped updates allow us to say, hey, take all of those test dependencies and squash them together into one pull request. Take those core dependencies and those packages that rely on each other. Make sure they're together in one pull request. Take all of those AWS updates and make sure they're in one pull request together, not individual ones. And this is pretty essential, I think, for the effectiveness and the productivity of dependent but, and so if you've come from dependent but years ago and you thought it's too noisy for me, try it again, because this is a big difference that's enabled now and available. So custom groups are awesome. I can add those. I can add exclude patterns per group so I can say include all these, accept these. You can also do a catch all where you could actually say I want all my dependencies in one easy pull request. And that makes it nice and easy to validate and merge when it's successful. But what about when it's not successful? Then you have to try and filter through and understand exactly which update failed what? So there can be good and some bad with that. It also supports dependency types as well. So you can say, hey, I want all of my production dependencies or development dependencies if your ecosystem supports that. And of course you can do other update types to say, I actually only want to update minor or patch versions, don't give me major version updates. Those are something that I need to plan for. I can't just have prs being open for. And so the usage of dependent bot with grouped updates and updates in general is critical. I know, at SPS commerce, one of the key use cases that we have as well is inner source distribution, really focusing on velocity. And so internally when you're setting up a new library and you're distributing it and your applications are consuming it, typically the only reason these applications are going to update a version number without something like Dependabot is because they did an initial install, they're doing a major upgrade, or they need a feature that's actually as a part of that and they've been following it. Otherwise the only way you're going to get upgrade is through Dependabot. And so if you're interested in that at all, feel free to check out. I have another session at other conferences called compelling code reuse in the enterprise. You can feel free to Google that and find it online as well. But this is essential to enabling inner source distribution and velocity. And you can filter your updates independent, but by using the allow tag and saying I actually only want this individual dependency to be updated. And so if you're not going to use it for the rest, at least use it for your internal organizational velocity. And so with that, a couple of thoughts. Some pitfalls. If you're not using grouped updates, you need to be, because that is a big difference here that makes it go ten times further. There's no auto merge capability. So assuming your checks pass and everything's good, there's no ability to merge it in without some additional extensions or using GitHub actions in order to accomplish that. And I would love a feature here that allowed us to look at the package maturity or the package age and say, I only want to include updates for packages that are x number of days old. I want someone else to go through the process of finding those particular bugs and kind of have a pre baked period for that. There are alternatives. If you're not in the GitHub ecosystem and you're really liking this one alternative out there, it's kind of deprecated. Now is new keeper. It was kind of new get specific. But it had just a ton of features and was really before its time. And a more popular one then would be renovate that you can make use of and renovate is cross platform and provides a lot of the same functionality, if not even more capabilities in some cases. Merge queues if you're using merge queues, which is a brand new GitHub feature as well, we don't have time to cover that today. But you can actually integrate and use merge queues along with dependent bot to try and get some of that grouped update effect in there kind of throttle some of those deploys a little bit. So that way you can group a number of merged dependent bot updates all at the same time and custom dependencies. So looking at this, trying to understand your dependency chain, what's proprietary, what's internal, can be helpful, but can also be really problematic as well. And of course, from a security governance perspective, enable those defaults, get your dependency graphs on, get your alerts on, and have access to that s bomb, and begin to assess what your organizational kind of perspective looks like from security. And you'll be able to actually see who's using some of the packages you maybe thought are a little bit funny. So with that, I want to move on to GitHub advanced security. And while dependent bot was all about supply chain kind of scanning other people's code and consuming other people's code, GitHub advanced security is a feature that is all about thinking about the practices around your own code security. So now the code that we actually write, and so that's why it pairs very well. And going back to our introduction, you'll recall that we talked a lot about this tool. Sprawl and team silos and Dependpot is great, but it doesn't necessarily allow you to hook in with other tools. What we're going to find is that GitHub advanced security provides a centralization, a mechanism for visibility of not just information that we're seeing related to GitHub itself that is generated, but how we can integrate other tools into the same interface as well, which is a massive advantage compared to what we're seeing elsewhere. And so we want to do a little bit of an overview. We want to check out code scanning, and we want to then separately check out CodeqL, which is going to interact with code scanning to provide some static analysis as a part of that centralization. And as we get started, we'll see a couple other components here with GitHub advanced security as well. First is you're going to be in the same section of security that we were before for dependent bot, but you're going to scroll down the page a little more in your settings, and you're going to find GitHub advanced security in there. It's got these two sections that you can enable here, enabling then gives you access to code scanning and secret scanning. And so code scanning basically is what we're going to focus more on in a minute. But to give you a preview of secret scanning, we'll see that too. And that's where we can receive alerts or even block commits to your repository that it thinks contains secrets. For GitHub advanced security. It's important you know that this is a paid portion of the ecosystem. And so depending on if you're a public repo or you're an enterprise or what your implementation of on premise is, you'll have to look at the licensing for this. And the licensing is a bit odd, mind you. It's actually one license per user for every active committer, which is the last 90 days on your particular repository. And once you're licensed in that organization, then you don't take up a license in another repository that's there. So just be mindful of that. But as we dive into secret scanning, I think you'll find that it's interesting to see that push protection, when it went generally available for public root pools, blocked over 17,000 credentials in one year, which is incredible. And so enabling secret scanning is a no brainer. If you have the license, you're going to want to turn that on and you can verify then if a secret is valid or not as well. So as it detects a secret inside your repository or the code that you're committing can actually go and verify that with providers. So think about AWS and taking those particular credentials and seeing that not only did I find credentials that match a pattern, but I've actually validated these credentials are real and they work. That's obviously going to raise a much larger security risk than invalid credentials or credentials that don't match a particular pattern. And so as we take a look at this and we're thinking about the number of blocked credentials in a year, think about the impact this can have to your organization. I'm sure your security team would love that. And in addition, you can also add custom patterns that you can see there in the background. You can block the protection. So as someone commits, don't even let them commit, they're going to see this message here instead that says, hey, I see a secret in your code. I see a secret in your code based on this custom pattern or based on our standardized patterns that we see, you might internally, for example, have your own implementation of a token and you can codify those patterns across the organization and include them. But better yet, if you're following GitHub universe, we saw that GitHub copilot, which is basically finding its integration to everything we do in GitHub, has the ability to auto detect passwords based on the context and information around it. So that's exciting to see that being even more effective for detecting credentials even without custom patterns in place. So that's great, but let's dive into code scanning. Secret scanning is a no brainer. Turn that on. If you have a license, there's no reason not to. But code scanning has a lot more interesting architecture and details that we need to think about. First of all, recognize that with code scanning it allows me to include a number of tools. And so you can see here, first thing it says is, well, what tools would you like to turn on that can contribute to the code scanning of detecting anomalies and coding errors? So first is the first class citizen of CodeqL. CodeqL was a purchased product, or I should say an acquisition to GitHub. It was originally the product was SEML, and now they've integrated that capability, first class with integrated CLI that can upload directly to code scanning capability here. So you can go ahead and hit the setup option. And this setup option here is going to create a GitHub action for you essentially, that has this ready to go that can execute on your repository. And of course you can explore other workflows and pull those up. And we'll just shelve the idea of codeql here for a second now, and we'll talk about the interface that code scanning provides that any tool can contribute to. First, here is the interface. It looks a lot like dependent bot. In fact, you'll see when I go to the security tab and I scroll down to the pendantbot section for vulnerability alerts, or right below that is code scanning. And you also see there's a secret scanning section. So it's all very nicely outlined on where you find your alerts on different components. And here under code scanning, then you get the same classic view the GitHub provides. Here's a list of the different warnings or critical items or even notes that we've detected related to your code specifically. Drilling into one of those then gives you the nice view that you can see exactly what happened. In this case, it's calling out a generic catch clause, indicating that you probably should be more specific in your exceptions and not just grab that. And of course you still have your workflow on the right. You can see there where you can dismiss a particular code scanning item and say, I'm not going to fix this, or this is actually just used in tests, it's not production code, so I'm not going to worry about it. And that information again is just part of the workflow that tracks. So you can see who and the reasoning why they might dismiss something with a bit of a description. And what makes code scanning so great? Not just the centralization of it, but the fact that it executes on your pull requests. And so when you're configuring code scanning in the security section, you're going to have this option to say, what's your pull request check failure? Do I want to fail pull requests if code scanning detects an error? Probably, I think so the best thing that we can do is to bring this left as far as we can, meaning for engineers and developers, the best experience is I'm submitting a pull request. I'm going to have other people look at and make comments on the pull request. Why not have code scanning automatically do that as well, and reject or fail the status check? That's exactly what I'm doing. That's a zone I'm working in. And so we can configure the level of failure that we want. We can also configure a status check here to actually bubble up as a first class citizen. So you can see that check and see whether it's passing or failing. But the best part about code scanning on pull requests is that it actually creates an annotation on your code as well. So just like any other reviewer, you get that right on your code, only for the code you changed. You're not actually going to see this for all errors in your system, that doesn't make it easy for you to get a pull request in. You need a kind of a baseline start from. But code scanning by default will only block you if you're introducing a. Net new item in the code that you've changed. And so in this case, here's a warning saying I have a useless local variable and I've also configured to give me code warnings. I don't just care about security related information, give me some obvious things like unused variables, because I can just clean up my code too, once you've worked with it in a pull request like this, it's so nice that this takes away some of that manual effort that maybe an individual contributor would have come in and reviewed this and called out some of those things. I can have all those things obvious things fixed and all the security problems fixed before a reviewer even gets to my code. And so in my mind, I love what Mike Lyman says from synopsis. He says it makes no more sense to write code without code scanning tools than it does to write a paper without spell check. Just like we're all using AI now to help us as well. The differences with something like AI and Copilot is, it still has the potentiality to write security problems in it too, because it's trained based on our code basis. So you're going to want to continue to scan all of your code, no matter where it was generated or who created it. And so for me, this is fantastic. Correlating alerts from different tools is labor intensive with many false positives. But now if I can shift this left as far as possible to the pull request workflow, this is a huge key in ensuring that these things are fixed before they even get introduced. And on top of that, with GitHub Copilot and where it's going to take us, they've introduced the ability to auto fix, meaning that right on the pull request. Now, when I have something, a useless assignment to a variable, I can just hit the auto fix button and just clean that up for me and just make me one step faster to some of those tedious things that are maybe obvious. But as we dive in more to this idea of what is code scanning and what is CodeqL, it might not be entirely separated for you yet. And so I want to just discuss the differences and where those barriers are a little bit. Code scanning is the framework, right? It sits on GitHub. It acts as a user interface that we can interact with that provides alerts and capabilities that are tracking across the GitHub ecosystem. And you as an engineer, a developer, and operator, we interact with those, whether at a specific repo or at an aggregated level in your organization. But code scanning and the rest of these tools sit outside of that. We choose when we want to run CodeQL, formerly SEML, or any of these other great tools that are out there, whether you're using Sonotype or 42 cronch or checkmarks, all of them can also contribute and upload information to code scanning, meaning that now I can begin to pick and choose and use codeQL for code scanning, but I can use 42 crunch to also submit security analysis on an open API design. Or I can use another one of these providers to submit information to code scanning about infrastructure as code related concerns. So you can explore just a ton of those other options. When I took this screenshot, there were 67. I'm sure there's a lot more now, but essentially we get code security analysis, and that's given to us from CodeQL. That's free. We get code quality analysis, meaning I've enabled queries that not just for security, but also those unused local variables and the other gotchas that I want to call out. It is database driven. So CodeQL is specifically going to create a database and index all your code locally, and then you'll fire queries against it. That's how it operates. But the queries that it runs are also open source queries that you can find on GitHub today. You can take a look at and understand completely what kind of things it's searching for in the code, and you're going to find that. CodeQL is pretty well adopted across a ton of languages in the GitHub ecosystem, and these are definitely all the core languages that we use at SPS commerce, since that makes a lot of sense. But the key is that whatever tool you're using, each of them are going to kind of execute differently. And you'll have to investigate and explore that and figure but how you're going to upload then information into the code scanning framework. Here's how it works for CodeQL specifically. You'll have a GitHub repository, you'll have a database create option. So you're going to call CodeQL database create. You can say, here's my language, here is the database that I want to create, and it's going to go and index against a repository that you give it. And you can specify other custom build commands that you want or many other overrides here. But it's going to look at the CodeQL query packs and code queries that existing on GitHub today. CodeQL, it's going to create that database and then we're going to specify that query packs that we want to use here. It's a QLS and the database that was created and we're going to say create a serif file from this. So now it's basically taking the database, taking the queries and executing all those commands. The output of that then is a serif file. And a serif file, if you're not familiar with that, is a static analysis results interchange format. It streamlines how static analysis tools share results. So it's a generic JSON schema essentially. And so you can follow that schema by creating your own tools and uploaded code scanning or using many of the existing tools that can follow that format and upload to it. Now, of course, with the tight integration that we see between CodeQL and GitHub, CodeQL CLI comes built in with a CodeQL GitHub upload results, which is hitting an API endpoint that I can pass the serif file to on that particular repository and that's it. It's submitted to code scanning pretty easy and you can commit multiple configurations to that. So different subdirectories, different tools, they can all contribute and create this suite of capabilities that you're now analyzing against your code base. Be asking, what is a CodeQL query exactly? I'm no expert on CodeQL queries. I'm still learning as well. But think of it as a standard kind of SQL like query language that you can kind of drive. Where you're importing libraries, you're using a from statement, a where statement, and a select statement. Here's an example how you can really simply find an empty if statement and then go ahead and write that as a custom query. And so there's lots of tutorials you can find online about that for writing custom queries. You can also define custom query packs, meaning I can just configure the exact number of queries that I want to use in a YAML file and then provide that to the CodeQL CLi as well to really fine tune it. And they also come and query suites too. And you can create your own suites internally for your organization. What makes sense for you? Kind of pull those together. There is a vs code extension that can make that easy, but you can see that generally speaking, the CodeQL repository itself, where the open source maintained queries are, is fairly popular, fairly regular, and is in my opinion maintained by lots of great experts. And so I'm glad to be able to pull in what they're doing, but also augment it with some of the small minute things I might want to add. So advanced security provides a ton of stuff, but there can be a high setup cost and that depends. Are you using GitHub actions? It can be easy to set up, but do you have specific dependencies? Do you have specific requirements in order to build it that don't need to be integrated with it? It will take a little bit of architectural understanding in order to put that together, but in some cases it's as simple as running the CLI tool, understanding your build command and away you go. Dynastray says 62% of organizations use four or more solutions. Well, I'm really glad that this is a simple integrated experience. This is one final solution that we can put a lot of backing behind and see it in one central pane of glass. It is remote only and that's something to consider. A lot of our teams have asked about. Well, I want some of that analysis done in my Ide locally and you can see that information in your ide when it pulls it from GitHub and you can see it locally and highlighted in your code, but it's not generated locally. It has to be done on the server or you have to do it as part of the Codeql Cli commands. And that can take three, four, five minutes. So this is not something that is comparable to linting in real time where you'll get those results. It's there, and GitHub has indicated that's not their intention either. So you might want to look elsewhere for some of those easier linting problems that you're solving. And of course, the VS code extension helps you pull down that information and see it and pull request workflow is fantastic. We all use that workflow at our organization. And if you are, this is a great place where you can put it in organizationally from a governance perspective and begin to rally around it. Depending on where you're at and what your investment in GitHub is, the cost can be significant, but we've found it to be actually significantly lower than some of the other comparables and some of the other tools out there that would do something similar. So there's a really nice blend of capabilities and getting code scanning and then using CodeqL as part of that for free. As we said, you can write custom queries, you can bring your own. I'm really looking for custom queries that we can write on YamL and JSON and basically non supported languages. Even so, I can detect other things and other linting warnings and other kind of organizational problems in our code bases that we're seeing. But the complexity to writing custom queries does take a little bit of onboarding experience and knowledge to get started with. So it's not the simplest. And in terms of interoperability, it's a huge win here. Ecosystem of tools in the standard serif format to even build your own integration is the win that you're looking for, I believe, and this is what we're looking for in terms of building our ecosystem of security tools together. So that's all the time that we have for today. Thanks for checking out this talk on fortifying codebase with GitHub. I hope these two tools are something you're able to take advantage of, especially dependent bot. That one's really easy to get started with. Code security is a little bit more involved, but not that difficult either, especially if you're already on GitHub actions. And at the end of the day, this comes down to this quote that we started, which is that developers work in rainforest, not planned gardens. And so if we can bring the GitHub ecosystem a little bit more to being that planned garden for engineers, let's give them that quality of life and let's continue to work towards this centralized ecosystem and this single pane of glass. So, thanks all, and we'll catch you at another talk.
...

Travis Gosselin

Distinguished Software Engineer, Developer Experience @ SPS Commerce

Travis Gosselin's LinkedIn account Travis Gosselin's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)