Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              This talk today is called declarative everything, a githubs
            
            
            
              and automation based approach to building efficient developer platforms.
            
            
            
              Before diving in too deep, let's understand what a
            
            
            
              developer platform even is. On the right hand side over here we have production
            
            
            
              ish things. On the production side,
            
            
            
              obviously, there's the true production environment. 100% of the traffic
            
            
            
              that that receives is usually from external users.
            
            
            
              There might also be staging, which might get a
            
            
            
              small percentage of the production traffic. And again, all of this depends on
            
            
            
              how the organization has implemented various blue
            
            
            
              green type of environments. So yeah, the traffic
            
            
            
              split depends on how that's configured. On the non production
            
            
            
              side of the house, we have various different environments.
            
            
            
              Development is usually one of the biggest ones that engineers get to interact
            
            
            
              with on a day to day basis. This might be a local
            
            
            
              development environment. On a local workstation, it might be a remote machine that an
            
            
            
              engineer normally sshes into, so on and so forth.
            
            
            
              Then there's CI CD. The goal for that is
            
            
            
              that environment normally serves two types of usages.
            
            
            
              One is from the various authors of all of the features and changes that
            
            
            
              are being proposed. The other is from our colleagues who are teammates
            
            
            
              who get to review the code and see the outputs of the various
            
            
            
              test executions. Then we have our classes of
            
            
            
              pre production environments, which aren't really production, but are production
            
            
            
              ish in the sense that it might be something for our QA teams
            
            
            
              to work off of, or a dev environment, which is an environment
            
            
            
              shared by multiple engineers to perform various types
            
            
            
              of end to end testing. A user acceptance testing environment
            
            
            
              might also be in that bucket, which is a
            
            
            
              place where a product manager, for example, might go to make sure the
            
            
            
              definition of done as was outlined in whatever product
            
            
            
              document has been implemented appropriately.
            
            
            
              We talked a little bit about a developer platform.
            
            
            
              Now let's talk about the developer workflows, and then we'll try to converge
            
            
            
              the both of them together. A dev workflow normally
            
            
            
              includes, and I'll use the terminologies SDLC and
            
            
            
              software development lifecycle interchangeably. They are essentially
            
            
            
              the same thing. The SDLC includes two parts. One is the inner
            
            
            
              loop and the outer loop. The inner loop is when the engineer is actively
            
            
            
              building something, which is the dev happening inside of an IDE.
            
            
            
              Some local testing might happen. Then as an engineer,
            
            
            
              we might push this branch up somewhere to get it deployed into
            
            
            
              some environment where we can do our end to end testing. And out there
            
            
            
              we can start to identify some issues. If there are any issues,
            
            
            
              goes back into dev and that loop continues till the engineer is happy with
            
            
            
              their implementation. To meet the definition of done once
            
            
            
              all of that's good goes into code review again, if everything
            
            
            
              looks fine over there, we'll move forward into the CI CD stage.
            
            
            
              If it's not, it goes back into dev, goes back into the inner loop
            
            
            
              of the SDLC, changes keep getting implemented, so on and so
            
            
            
              on the other side, the code review, that's the start of the outer loop.
            
            
            
              Hit CI CD after that, and then pre production and then prod.
            
            
            
              In every single one of these stages, if there's something that goes wrong,
            
            
            
              we go back into dev, which is the start of the inner loop of the
            
            
            
              SDLC. Otherwise, we just keep going forward into the next stage
            
            
            
              till we hit production, which is when our software is going and
            
            
            
              serving live humans. On the right hand side,
            
            
            
              it's just a different way of looking at the inner and the outer loop of
            
            
            
              the SDLC, where the inner loop, as I covered, was the repl of
            
            
            
              software development, which is the read well print loop.
            
            
            
              Once things are fine, we end up using git as our context
            
            
            
              transfer mechanism. We push a branch for code review, for example,
            
            
            
              and then it hits the outer loop of the SDLC,
            
            
            
              goes through that entire loop, and then it'll go into some sort of a
            
            
            
              pre production environment. Staging might be called a pre
            
            
            
              prod environment, or a part of the production infrastructure, depending on
            
            
            
              how the organization is usually set up.
            
            
            
              Next, I'm an engineer.
            
            
            
              I've been given a ticket. I need to ultimately get this change into production.
            
            
            
              What are the workflows that happen underneath? I will
            
            
            
              get the latest version of the source code, pull main,
            
            
            
              do a git chatout, b new user table if that's the ticket,
            
            
            
              I'm working on t 1234, and then I pull up
            
            
            
              my id, get my local dev end set
            
            
            
              up. After that, I might wait for my id to index a
            
            
            
              little bit, and then I start coding. Once some code has gotten
            
            
            
              written, I might do some form
            
            
            
              of local verification, might push it to an environment
            
            
            
              of some sort to be able to perform some level of end to end testing.
            
            
            
              And one theme across this today will be a
            
            
            
              lot of the content I cover will be more applicable to the
            
            
            
              microservice side of the house, although some of these constructs
            
            
            
              should be applicable to monolith architectures as well.
            
            
            
              So now I've done some testing, the feature seems
            
            
            
              to be working. I'll write some unit and some integration tests, and then once I'm
            
            
            
              satisfied with the definition I've done, send it out for a code review.
            
            
            
              My colleague, my teammate, will read the code. They'll check
            
            
            
              how CI was running for the tests I just implemented.
            
            
            
              They'll also make sure that older tests haven't broken as a result
            
            
            
              of menu change. Then a diligent code reviewer
            
            
            
              might also try to get this into a preview environment of some sort, right?
            
            
            
              To check how the functionality, et cetera, is working. If everything looks good,
            
            
            
              approval if there's feedback, you send it. And again,
            
            
            
              it's the same inner loop of the SDLC till things look good.
            
            
            
              After things look good, the engineer can merge the change into main.
            
            
            
              As soon as that merge action happens, tests run.
            
            
            
              If things are looking good, engineer can move it into pre prod,
            
            
            
              do some more testing, things are good, deploy to staging more
            
            
            
              testing, things are good, move it to prod. That's normally
            
            
            
              the day to day workflow of getting a change into production.
            
            
            
              Now, which parts of this can benefit from
            
            
            
              automation? I have it bolded in this image over here. The code and id
            
            
            
              process, right? Checking out the latest branch,
            
            
            
              or checking out pulling the latest main,
            
            
            
              getting into a new branch, making sure id is indexed. All the dev environment
            
            
            
              stuff is set up. All of my dependencies are there.
            
            
            
              That's a bunch of developer toil and friction that can probably benefit from
            
            
            
              automation testing in an environment. Can an engineer automatically get an environment
            
            
            
              to test in that is just based on whatever branch that they are operating on?
            
            
            
              So those are bits directly automation can help for unit
            
            
            
              and integration testing. I know there's a lot of work happening in the
            
            
            
              AI realm today, still very early, but maybe computers can also help
            
            
            
              us write some tests and also help with the code review process.
            
            
            
              Usually we all have automated tests running in CI
            
            
            
              already. As soon as a certain branch is put up for review or
            
            
            
              a request to merge domain on the functionality,
            
            
            
              checking the functionality in your preview environment. That's another area where
            
            
            
              automation can help significantly and when tests are passing
            
            
            
              approval has been given by a teammate
            
            
            
              automatically, we should be able to merge things into main, run those tests
            
            
            
              in an automated fashion,
            
            
            
              automatically provision a pre production environment, run tests in there.
            
            
            
              If things are good, get in the staging, so on and so forth.
            
            
            
              So now we talked about dev platforms,
            
            
            
              we talked about developer workflows, how changes get into production.
            
            
            
              Now Gitops, what is Gitops? Githubs at
            
            
            
              its core, you might have noticed we
            
            
            
              were talking about creating a bunch of infrastructure in various stages
            
            
            
              as we were trying to get a change into prod.
            
            
            
              Gitops is just trying to drive a configuration
            
            
            
              backed approach to codify various of the DevOps
            
            
            
              best practices which can be applied to infrastructure automation
            
            
            
              using this. Many companies call these golden paths
            
            
            
              the ultimate rationale behind all of this is standardization will fundamentally
            
            
            
              improve the developer experience, and by
            
            
            
              removing humans from having to
            
            
            
              make a variety of these almost menial decisions,
            
            
            
              to go and tear down environments, create new environments, et cetera. Automation can
            
            
            
              help us reduce costs to kill environments when they are not actively being
            
            
            
              used, for example. How can this be achieved?
            
            
            
              Some sort of configuration that lives right alongside your source code,
            
            
            
              your app logic, business logic is living in source.
            
            
            
              Why can't all of the supporting logic around codifying
            
            
            
              stages of the SDLC also not live alongside it?
            
            
            
              And ultimately, why this matters is concept CI
            
            
            
              existed for continuous integration, right? How can we have repeated
            
            
            
              automation happening every time we merge? A new change test, get run, et cetera,
            
            
            
              et cetera. Because it fundamentally helps us improve
            
            
            
              our development and deployment times. So that's why
            
            
            
              Gitops is important. Now, we talked
            
            
            
              about environment multiple times in the last three,
            
            
            
              four odd slides. What is an environment at its core?
            
            
            
              An environment needs to have whatever runtime
            
            
            
              requirements or dependencies an application needs. It could be something
            
            
            
              as simple as having a go compiler available,
            
            
            
              or something that can execute a process,
            
            
            
              a binary as a process. It might need
            
            
            
              access to some sort of dev test, et cetera.
            
            
            
              Data might need access to some form of downstream services.
            
            
            
              Might need upstream services as well. If I'm trying to, for example,
            
            
            
              test an API gateway change to make sure that
            
            
            
              the website still functions correctly also needs to be accessible to a human
            
            
            
              so that they can either test the environment or actually access it to figure
            
            
            
              out what's going wrong. Depending on the use case,
            
            
            
              if it's a production environment, obviously it shouldn't be isolated from
            
            
            
              prod. But if it's a non production environment, some level of isolation from
            
            
            
              production is important. Depending again on
            
            
            
              how the environment is being used, it might need some form of source code,
            
            
            
              build tools, your IDe, and various other developer specific tooling,
            
            
            
              and depending on when you're using it. Again, if it's a
            
            
            
              prod environment or a staging environment, that's usually a shared tenancy construct.
            
            
            
              But if it's my local development workstation, that's a dedicated tenancy construct.
            
            
            
              So ultimately, we'd like to get to a world where whatever configuration that we are
            
            
            
              storing alongside our source code that can give us environments that are
            
            
            
              well tuned and essentially configured appropriately for
            
            
            
              whatever stage of the SDLC that we are currently in testing,
            
            
            
              testing, debugging, et cetera, all of these are angles
            
            
            
              at verifying that the software we are now proposing
            
            
            
              or making a change to is fundamentally
            
            
            
              not causing regressions anywhere, and it's meeting the appropriate definitions of
            
            
            
              done testing
            
            
            
              at its core exists for just
            
            
            
              one core reason is acceptance criteria. Right?
            
            
            
              The product document that I received as an engineer has some
            
            
            
              form of features and functionality that it aims to establish.
            
            
            
              Has my implementation achieved them all? And I'm primarily,
            
            
            
              again, because I mentioned I'm going to focus more on the microservices
            
            
            
              side of the house. Again, this is also applicable to monoliths,
            
            
            
              but I'm taking more of an emphasis on services and not libraries and
            
            
            
              modules over here. The reason for that is oftentimes a
            
            
            
              database doesn't live on the same machine as
            
            
            
              where our application is running. So a network call is happening
            
            
            
              to either talk to a database or to a downstream service, or how upstream
            
            
            
              services are talking to me. So functionality, we covered
            
            
            
              that. On the interoperability side, I'm calling various
            
            
            
              downstream services, usually as part of any feature.
            
            
            
              Am I calling those systems appropriately? Am I breaking any API
            
            
            
              patterns? Is the latencies that I experience,
            
            
            
              is it something that's acceptable to my end users or
            
            
            
              my target end users? And ultimately on the
            
            
            
              confidence side of the house, which is why again, let's go.
            
            
            
              Testing exists. Is this change that
            
            
            
              I'm introducing right now? Will it cause future production
            
            
            
              deployments to be more error prone? Will it cause more
            
            
            
              alerts to go to our on call? Has the feature really been implemented
            
            
            
              in the simplest and the most resilient way possible?
            
            
            
              So normally when we talk about we
            
            
            
              have source code right on the left hand side. I've kind of tried to represent
            
            
            
              this as a Monorepo structure. All of these primitives
            
            
            
              apply just as cleanly to if a company has multiple smaller
            
            
            
              repos like micro repos,
            
            
            
              every microservice comes out of its own repo. Doesn't matter.
            
            
            
              You can see various libraries, modules, et cetera, all of it live
            
            
            
              together. There's some software that runs in the middle which will
            
            
            
              ultimately cause some form of artifacts
            
            
            
              to get built. For example, a docker container, an OCI image of some
            
            
            
              sort that then gets sent to our
            
            
            
              production workload management system or software,
            
            
            
              which gets us to go and run this latest version
            
            
            
              of a certain artifact and it runs it, and then all of our systems
            
            
            
              go and talk to each other. So how
            
            
            
              does this software actually end up in production? Normally I'll
            
            
            
              just use a couple of Kubernetes examples, but this is applicable to
            
            
            
              any sort of runtime.
            
            
            
              Kubernetes has these deployment yamls or helm charts, et cetera.
            
            
            
              Ultimately I push a new branch somewhere, or if it's
            
            
            
              a new change in the main branch, a new Docker image might get
            
            
            
              built. This image gets pushed into a container
            
            
            
              registry somewhere a field is updated
            
            
            
              in my helm chart, which is normally in the source code.
            
            
            
              This field is probably going to say like this is the
            
            
            
              latest version of this image, and then the helm chart gets applied against my
            
            
            
              production Kubernetes cluster. At that
            
            
            
              point in time the Kubernetes cluster knows okay, I need to pull this image down
            
            
            
              from this appropriate container registry. I pull that down, get it
            
            
            
              deployed, stuff works.
            
            
            
              Now we covered the Prod deployment
            
            
            
              process. What is preprod? Preprod normally involves engineers,
            
            
            
              as they were saying, pushing their branches somewhere, having the CD system go
            
            
            
              and deploy that change into a shared tenancy pre production environment
            
            
            
              of some sort. These are great for
            
            
            
              the most part till I'll
            
            
            
              give an example. I am working on a front end change. Bob is
            
            
            
              working on an API gateway update. Sally is going and running
            
            
            
              a database migration. When I push my front end
            
            
            
              change into the pre prod environment and I suddenly see an issue related to
            
            
            
              a database migration pop up in the error logs.
            
            
            
              That didn't happen because of me. A GitHub
            
            
            
              based approach, as we can see on the right hand side over here, can let
            
            
            
              us get into a world where ephemeral preproduction environments are just
            
            
            
              coming up, where each of our changes gets tested in isolation.
            
            
            
              And finally, when all of the changes land up in
            
            
            
              the main branch, similar tests can also be run to make
            
            
            
              sure the current state of the main branch is pretty healthy.
            
            
            
              It's green, but during our non
            
            
            
              production in the dev processes, being forced to share a pre
            
            
            
              production environment with all of our colleagues again leads to a lot of
            
            
            
              developer friction.
            
            
            
              Now images normally
            
            
            
              in CI we don't regularly go and
            
            
            
              spin up our own version of a full tenancy cluster of
            
            
            
              some sort where proper end to end testing is happening.
            
            
            
              There can be some cases where images get built and this is
            
            
            
              just a little spec on how something like that might happen,
            
            
            
              but ultimately it just boils down to check out the latest
            
            
            
              source code, make sure all of your relevant credentials
            
            
            
              are set up for whatever container registry you're using,
            
            
            
              build the image, push it there, and then finally call
            
            
            
              whoever your workload management system is. In this case it's a Kubernetes example.
            
            
            
              Call it with the latest version of this image that's in this
            
            
            
              registry and kubernetes are instructed to please
            
            
            
              go and apply these changes.
            
            
            
              And in this world in the CI CD process that
            
            
            
              I just talked through normally in our
            
            
            
              tests we might have configuration to set up some four environment wherein
            
            
            
              tests are run and every engineer then has to verify
            
            
            
              that the test failures are not environment related. If there are test
            
            
            
              failures, of course, they need to make sure that it's only related to the changes
            
            
            
              that they have introduced, because environment
            
            
            
              related changes again, in this world, the DevOps team is the one that's normally
            
            
            
              responsible for making sure the environment is left in the pristine state and
            
            
            
              the Githubs world that we talk about, which is what you can see on the
            
            
            
              right hand side, the test suite gets instantiated,
            
            
            
              ephemeral environment comes up all backed by configuration
            
            
            
              that lives right alongside my source code. So if I didn't change any
            
            
            
              of my environment, config in source, the ephemeral environment
            
            
            
              that comes up, that's a golden path.
            
            
            
              In that environment, we know for sure that the only
            
            
            
              changes that are going in at that point in time are the changes that I've
            
            
            
              implemented for my feature. So when I run the test now,
            
            
            
              all failures should only be happening because of my changes
            
            
            
              and nothing related to the underlying infra. This is
            
            
            
              where DevOps can essentially further add superpowers
            
            
            
              their capabilities. DevOps teams can just by having configuration
            
            
            
              wherein whatever golden path they want to attain codified
            
            
            
              left in source code, every engineer just goes and spins up their own version of
            
            
            
              it in the code review process. Again, it's very similar.
            
            
            
              Oftentimes engineers, whoever the reviewer is,
            
            
            
              need to make sure that the changes that we are reviewing,
            
            
            
              nothing is failing because of underlying infra issues or
            
            
            
              nothing is succeeding because of potential underlying infra issues as well.
            
            
            
              So as a result, preview environments, et cetera, everything can be very
            
            
            
              ephemeral. Just come up. I make my changes. When changes get
            
            
            
              merged into main, all of the infrastructure that was spun up
            
            
            
              just gets deleted automatically.
            
            
            
              So Dev and CI stages, they will normally
            
            
            
              resort to using Docker for many of these functionalities.
            
            
            
              You can see over here, the r has
            
            
            
              gotten cut off due to my screenshot. I apologize for that. But ultimately
            
            
            
              you can see like these are all just Docker compose files.
            
            
            
              If a developer, for example, needs a MySQL database,
            
            
            
              we'll normally not go to AWS for example, and spin up an RTS database
            
            
            
              just for that. Just use Docker to do our basic feature
            
            
            
              functionality testing and this usually suffices
            
            
            
              for dev. But as we get closer to CI and
            
            
            
              the other stages of the SDLC, that's where having more
            
            
            
              production symmetric infrastructure might be more
            
            
            
              helpful in weeding out bugs and getting a better sense of latencies,
            
            
            
              et cetera, et cetera, as a user would experience when
            
            
            
              interacting with a prod environment.
            
            
            
              So now why does all
            
            
            
              of this automation, why is it important?
            
            
            
              How exactly does it improve our developer experience? So, on the left
            
            
            
              hand side over here, let's take a normal dev loop.
            
            
            
              I write code, I build, compile,
            
            
            
              I run it, inspect to see if everything works fine,
            
            
            
              make a commit, move on. Let's say as an engineer,
            
            
            
              I'm working 6 hours a day. That's about, what, 360 minutes?
            
            
            
              This entire loop takes about five minutes to run. So out of 360,
            
            
            
              I can do this about 72 times in a day. So 360
            
            
            
              over five on the right hand side, as soon as we go into
            
            
            
              this microservice containerized world,
            
            
            
              we get to this. Like, we do our code, we do our builds,
            
            
            
              and then there's the container bailed, et cetera. All of the container
            
            
            
              ergonomics come in, and that's pretty time consuming sometimes.
            
            
            
              And ultimately, you can see that five minute loop for getting the same
            
            
            
              task done is now taking nine minutes. And because we
            
            
            
              were working 360 minutes a day, which was our assumption,
            
            
            
              now, 360 over nine is about 40 iterations. So on
            
            
            
              the left hand side, I could do 72 iterations. On the right,
            
            
            
              I can only do 40, which is actually a 45% degradation,
            
            
            
              which can take a significant amount of time away from the inner loop
            
            
            
              of the development process.
            
            
            
              So this dev workflow that we had talked about in the past, like the one
            
            
            
              you see on the left, is the traditional dev workflow for all of us.
            
            
            
              Pull latest code, set up local n, wait for the id
            
            
            
              to index, write code, do all of that testing, et cetera, et cetera. It's a
            
            
            
              long loop on the right hand side. Taking this Githubs based model,
            
            
            
              it can actually, when we want to do dev, just get an environment,
            
            
            
              immediately hook our id into it. Everything has already
            
            
            
              been set up for us. I don't have to wait around for anything. I just
            
            
            
              do all of my testing. I know that this is a completely dedicated environment
            
            
            
              wherein Bob and Sally's changes, et cetera, none of it's going to influence me.
            
            
            
              All of the issues that I discover have to be fundamentally related to
            
            
            
              the code I just wrote. So now when I'm sending
            
            
            
              my code for a code review, I have much more confidence
            
            
            
              that things are going to work to spec and in
            
            
            
              the CI CD to pre production workflows. If there are some changes on the
            
            
            
              main branch that need to get tested, some of that stuff is getting pushed later
            
            
            
              in the pipeline. It's not going and impacting every single engineer
            
            
            
              every time they are trying to write code again.
            
            
            
              So the dev purposes, right.
            
            
            
              Normally, whenever we are developing a feature,
            
            
            
              service dependencies are normally pretty isolated. Sure, we have
            
            
            
              our id running somewhere. But wouldn't it be great, like on
            
            
            
              the right hand side, you can see the red circle calls the green
            
            
            
              circle, which calls the pink looking circle over there.
            
            
            
              If we could get a workflow replicated wherein that red
            
            
            
              was actually calling green, which is currently running inside my
            
            
            
              id with my debugger connected to it, that would let
            
            
            
              me have much greater confidence of the software I was building, because I know
            
            
            
              what the longer request chains are going to look like, and I know exactly how
            
            
            
              my changes are going to function with respect to them.
            
            
            
              So there are various ways of looking at it, right? The green one
            
            
            
              I was just showing you in the previous slide. It's an example when my
            
            
            
              service or my feature exists in the middle of a
            
            
            
              long microservice call chain.
            
            
            
              Similarly, if I'm testing something at the start of a call chain or at
            
            
            
              the end of a call chain, again, having access
            
            
            
              to a production like environment, wherein all of
            
            
            
              the network calls, et cetera, are seamlessly handled,
            
            
            
              for me, under nape, wherein the only thing that
            
            
            
              is under test is the change that I'm working on.
            
            
            
              That would help engineers get to having a lot more confidence
            
            
            
              a lot faster in the SDLC, right? While they are in the
            
            
            
              inner loop,
            
            
            
              instead of having to discover issues after it hits the outer loop, go back into
            
            
            
              the inner loop. Because that context switching, right? That's the most important
            
            
            
              and expensive part for us software engineers.
            
            
            
              So ultimately, to wrap up takeaways as
            
            
            
              engineers, we all know that issues will always be easier to fix when
            
            
            
              it's got way before production. The fastest would
            
            
            
              be when we were writing the code in the first place, the first time we
            
            
            
              are writing that line of code. One way in which this can
            
            
            
              be achieved is by using a very Gitops backed
            
            
            
              access to having production symmetric or production
            
            
            
              like environments for every stage of the SDLC as I'm
            
            
            
              writing my code. And why this is important is
            
            
            
              different stages of the SDLC when they are configured differently, it all adds to
            
            
            
              different bits of developer friction, ruins the dev organics.
            
            
            
              So at its core, taking a GitHub based approach would also remove
            
            
            
              the drift between these various stages of the SDLC. So,
            
            
            
              yeah, thank you. Thanks for listening to my talk. And I work at this
            
            
            
              company called Devzero, where we are trying to operationalize various things
            
            
            
              that we discussed in this talk today. If you're interested
            
            
            
              in checking out how any of these things can be applied to
            
            
            
              your day to day engineering workflows, check us out at www. Dot devzerode.
            
            
            
              Thank you.