Conf42 Platform Engineering 2024 - Online

- premiere 5PM GMT

DevOps, 12-Factor, and Platform Engineering

Video size:

Abstract

Learn about the business wisdom of the 70s and 80s and how it laid the foundation for modern, cloud-native systems incorporating DevOps practices. Get a glimpse into what’s next by exploring internal developer portals.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello, everyone. Welcome to Conf42 and this talk about DevOps 12 factor and platform engineering. We're going to dig into the history of DevOps. We're going to really look at what I believe its true origins are and where they lie. and we are going to talk about where this practice has evolved to. Today, this is probably the third iteration of this talk that I've given. It used to be called DevOps, 12 factor and open source. And as the industry has just adopted new patterns. around what's mainstream and what's really helping developers with their flow and with their productivity. I've updated this talk, along with it. So this is where we are now in 2024, where we're talking about platform engineering. We're building multi cloud developer platforms where we create our code and we're deliver our code. And we've evolved this practice of platform engineering, which is really the the work that it takes to put together all the various tools and pieces of the platform, the different permutations of platforms that we have, has now evolved into this practice. And so that's what we're going to be talking about today. My name is Justin Riak and I've been around for a little while at this point, having had several different roles, but I really started focusing mostly on developer productivity over the last. Five years or so, there's a lot that's happened in this space around developer experience and, seeking the type of better productivity outcomes that can happen when we invest very heavily in developer experience. And so if you've seen any of my recent work, that's mostly what I've been focused on. Now I'm currently the head of dev rel for Cortex, where we ship an internal developer portal, that type of pattern, that internal developer portal pattern, I truly believe is the most important. Pattern right now, helping people move from a orchestration stage of their platform to a choreography stage of their platform. And always happy to talk more about that. So where did DevOps come from? Where did it begin? this is the commonly accepted place, at least for the words, right? DevOps days, Belgium, Patrick Dubois DevOps days, 2009, right? This is the first time that we've heard this coin termed, but I think that DevOps started long before this. And I think it started with a mindset, and I think it's this mindset, that it's no longer the big beating the small. But the fast beating the slow, okay, this quote, you've probably seen it or some version of it associated with DevOps. This has become I think the rallying cry for DevOps, but it really, this is talking about throughput and this is talking about agility, right? This is talking about, it's no longer the big, monolithic organizations that are gobbling up all of the market share, right? The small, fast, disruptive startups are the ones that kind of come out of nowhere. And this mindset started long before that quote, right? When it's something that I like to refer to as the ancient business wisdom of, in this case, the 1970s and 80s, more specifically the work of Dr. Ellie Goldrott and the theory of constraints. Now, I know that we can come back even further and talk about The work that was being done with statistical process control, W. Edwards Deming, if you've read Beyond the Goal, was a huge influence to GoldRot's work, and you absolutely will find Deming's work all over what we're doing right now with platform and DevOps and even artificial intelligence. but this talk is going to focus mostly on theory of constraints and why. It's important to DevOps and how it's shaped a lot of the, the underlying themes of DevOps. So if you're not familiar with Dr. Goldrod or the book, the goal, Dr. Goldrod was a physicist turned, business analyst, so to speak. and he realized his big intellectual leap that was such a bit of genius was that you could apply the rules of physics, as they work within complex machinery. to business, right? You could take the same terminology and the same basics, physics, principles, and you could turn around and map that to the way that an organization. And in this case, a production and manufacturing organization can work and flow. Now this book, the goal is not some business textbook. Okay. I think that's part of its appeal. It is a fictional account. of a VP manufacturing who's struggling within their company to keep pace with competition. a change agent is introduced to the business, somebody with a theory with an understanding of the theory of constraints, and they use that to transform the business. now, if you're not familiar with the goal, you're probably familiar with the Phoenix project at this point, at least in these crowds. And you should know that Gene Kim will be the first to tell you that the Phoenix project is just a retelling of the goal, right? Except now through the eyes of DevOps. But it's talking about, it's teaching the theory of constraints, the same theory of constraints that was taught in the goal. Format wise, it's also very similar, right? This book, again, is not a business textbook. It is a page turning business fiction, not, fiction novel about, in this case, a VP of engineering. moving through a major migration, and struggling, a change agent comes in and introduces DevOps principles and saves the business, right? So another great way to learn about the theory of constraints. Okay, what is the theory of constraints and how does it relate to DevOps? How did it start helping us with this, fast beating the slow? Mindset. there's a lot to the theory of constraints. Okay. And essentially, it's a theory that says that a system will never be able to perform better than its constraints. And sometimes we can use those constraints. Sometimes we can put constraints in a system to try to achieve better flow within that system. Where it relates, I think most directly to DevOps is in, defining any complex machine, according to three different basic. bits of its anatomy, right? The cost, right? The energy associated with making this machine do what it's supposed to do. the throughput, right? that machine actually doing its job, doing what the machine is supposed to do. And then the inventory, right? The raw materials, the pieces, the individual parts that work together at a cost to create throughput for the machine. All right. So theory of constraints defines machinery according to these three types of inventory. Cost, throughput, excuse me, these types of anatomy, cost, throughput, and inventory. All right, now, if we take a quick intellectual leap, what did these components map to in a software organization? All right, in a software organization, cost is still cost, right? This is FTE, the full time engineering hours. This is hands on keyboard writing code. this is the cost, the personnel costs, the tool cost of actually creating it. Inventory is the code itself. That's the raw materials that we're hoping to manufacture into something that will hopefully be lucrative for our business. Now the throughput of any business, and a software business certainly, is money. Because what is the job of the machinery of a business? It's to make money. So when we are talking about these three parts of the Theory of Constraints, In terms of a software engineering business, cost is still cost. Inventory is code and throughput is money. Okay. Now, the big mind shift that happens in the goal and then the theory of constraints is that Western businesses tend to prioritize costs as the most important part of this machinery. And that's why you see short term solutions like layoffs and things like that happening. But if you look at the long term success of a lot of those, initiatives like layoffs, for instance, and other types of cost cutting initiatives, They are almost universally unsuccessful in the longterm, unless there's a complimentary plan to either maintain or improve throughput in order. In other words, the business's ability to keep creating its product and doing what it does and continuing to be lucrative. You have to prioritize that, right? a layoff is generally just a bandaid. And if we look beneath the surface, it can actually be very detrimental to, to throughput. When we have things like brain drain in the organization, we have to later hire new people when we realized we laid off too many, has been happening over the last year or so in our industry because of AI and over indexing on things like that, right? So we actually hurt our throughput with some of these cost cutting initiatives. The important part of the theory of constraints is that throughput is the most important part. The machine must keep doing its job better and better. Now, how does that translate to the way that we deliver software? All right, if the inventory's code. And the cost is that developer engineer actually writing code, putting that code together. then the throughput is the distance between the time that the developer is creating that code and the time that code is no longer work in progress and is out in the market, making money for that business. Anything in the middle, any time that cost is still being expended to work on assembling inventory and that inventory has not yet been released. That is all waste. That's all friction. And there's so many things that can go wrong before we actually release that code to production. So are we starting to pick up on something here, right? A huge part of DevOps has been reducing the distance between hands on keyboard and creating the product and getting that product actually out to market and making money for the business. I come from a time, a lot of us come from a time. When you would write your code, you create a jar file or whatever. You literally copy it onto a disc. You walk it to another part of the organization. They pull that down to some testing system and they pull it into some, which probably doesn't match the system that you were developing in any way. and when they release that, there's really not a guarantee that it's going to work the same way in production, because back then the production systems didn't really, there's not a lot of guarantee that they match the systems that we were testing, the conditions were the same, right? So we were talking weeks, between being able to actually create this code and being able to release that code out to market. Okay. So this right here, prioritization of throughput and reducing the distance between the creation of code from inventory and costs to actually creating throughput for the business and money. This is what we must prioritize. This is what we're trying to optimize. And that's very much what DevOps tells us this, that the only initiatives that will positively impact long term performance are the ones that will increase throughput while simultaneously decreasing cost, everything else is not going to have the long term effect that you want. unless you think you can break the laws of physics. This is also where continuous improvement comes in. Okay. we're also dealing with this inventory that's sitting there, in between the time that it's created and that it's released out to production and making money for the business is susceptible to entropy. Second law of thermodynamics and rot, right? We see this in different forms with code, code rot, like we've heard that term before, but that can mean different things, right? That can mean like a dependency, going out of date or some vulnerability, being released. And now we have to stop what we were doing, working on this one part of the code and make sure that we remediate this vulnerability. And if any of this happens while this code is work in progress and in flight, this can increase greatly, increase the distance. Between the time that we are actually releasing the time that we're writing code. so this is where this notion of continuous improvement also comes in. it's battling entropy, right? Sort of battling this ever present sort of force that adds all this variability to our work in progress because nothing ever really sits still. So as long as we're continually improving. Something we're fine. we're effectively outrunning entropy. And that's where this, notion of continuous improvement and moreover, that it doesn't really matter what you're improving as long as you're improving something, because then you're, you are using feedback to outrun. Now, this is where this came from the DevOps conveyor belt, right? The, all this theory around, all right, how do we shorten this distance between hands on keyboard to actually creating money for the business? and how do we make sure that we're prioritizing throughput, right? this looks very much like the DevOps conveyor belt at this point, right? Always move forward. Even if we break something, just release a fix right afterwards, make sure that our deployment frequency can be high. that's our deployment frequency Dora metric, right? how often are we actually able to get our code, out? And so we have this wonderful conveyor belt with all these different, this is just one small, permutation and set of technologies that might be representative of a typical DevOps pipeline. But this is where we, this is why we got here, right? Was because of prioritization of throughput, because we needed to build systems that could create cultures of continuous improvement. And could create high velocity, constantly releasing types of systems. Okay. So why did free software become so much of what's underpinning these platforms today? Why is it, why is open source like so essential, to being able to build these types of platforms? I really think about it in terms of the friction created when we do something without transparency and without ubiquitous accessibility to our work. And when we do something with that level of transparency and when we make our work available to as many people as possible, I love to use, the story of Pythagoras here to tell, I think, the cautionary tale. I think, most of us are familiar with Pythagoras, or at least the Pythagorean theorem, a squared plus b squared equals c squared, way to calculate the hypotenuse of a right triangle. But of course, Pythagoras, made, All, so many, not all, but so many of the contributions to modern geometry. And more importantly, at the time when he was teaching geometry and around 700 BC in the city of Crotone, which is in modern day, Southern, Southern Italy, Pythagoras ran a school of geometry and there were two, a circles of the school. There was the inner circle called the, Mathematiquae or the learners. And this was like the inner circle of the school. And these were the only folks that were really allowed to learn geometry and practice geometry. Then you had the outer circle of the school called the acoustematics or the listeners, right? And they were not allowed. they were actually more sort of Aaron running errands and stuff like that and chores and stuff for the school. They were really more like staff. And then you had everybody else who was not even allowed into the school of geometry at all. Now, geometry at the time, 700 ish BC. Yeah. In Crotone, this was the force that was responsible for civic infrastructure, running water, housing, shelter, all the things that you needed to build a modern city, right? All that infrastructure, was, you needed geometry to make this work. And this sort of esoteric math and language was being taught by Pythagoras in this very secretive way. All right. And made a lot of people very unhappy, which we'll get to in a moment. Meanwhile, you look at somebody like Sir Isaac Newton, who took what he learned and published it, in what was possibly one of the first engineering public domain works. I like to think of this as one of the first examples of something that would be considered open source. This was the publication of Principia Mathematica. This was Isaac Newton's, learnings and math that he made available for to other practitioners, to other people who wanted to be able to take these principles and apply them to their own work, right? this is a much better type of spirit, but it's also a much more frictionless type of spirit, right? This allows much more accessibility to the knowledge. And allows people to be more free in the way that they create whatever it is that they're doing. Another great example of these open societies, the Florentine Bodega, where sculptors and painters and artists and craftspeople would all meet together to discuss what was going on and to learn from one another. Leonardo da Vinci was discovered in one of these bodegas. A more modern example could be a Parisian salon. again, an open forum where, different people from society could talk about the issues of the day in an open and transparent manner. These types of gatherings, and this way of doing things in a more transparent parent manner, this is why open source makes so much more sense for building large, scaled, cloud platforms and the things that we build with it now. Could you imagine, If something like the TCP protocol, for instance, were closed, right? if what you had to run underneath your cloud infrastructure and networking were a closed protocol that you had to somehow light, I think nothing that we have would work. Look at the Linux operating system, the proliferation of the Linux server operating system. How much further, it's been able to bring, DevOps, how much we've been able to do, from an operating system perspective, at every level of scale. certainly there are some closed and proprietary licenses for Linux, but overall this is open source, right? things did not end well for Pythagoras, right? Yeah. There was a nobleman at the time who felt that he should be included in the Pythagorean school, that he should learn geometry. His name was Cylon, actually. and Cylon would run Pythagoras out of town, murder tons of his own students, and ultimately Pythagoras would die in exile. And we don't really have full attribution for all of Pythagoras work. It's actually very hard for us to know how much of this work was created by him, other mathematicians, because, again, all of this was done without any type of transparency. Open source was essential to taking these principles of frictionless development and deployment that would build what we think of now as modern DevOps and platform engineering in removing access to this, in, in, in removing, barriers to accessibility to the software. And making sure that we could build these systems in a very frictionless. And so now we have this, terrifying, post Cambrian explosion of all of these other types of solutions that come together to build a modern cloud platform. We have all of these different solutions for database and streaming and messaging and application definition and API gateways and service proxies for all of these different pieces. And some permutation now of these different pieces of many times open source software. is what we now refer to as our platform. Now, to take advantage of this, to take advantage of this new way of deployment, this more ubiquitous and portable means of deployment, we have to start thinking differently about the way that we write our software. We actually have to code differently if we want to take full advantage of this more portable means of deploying our software. The good news is there's a wonderful framework that's already available for this called the 12 factors. It was put together by folks from Haruku and other very well known people in the industry. And it's a very easy to understand framework for reducing the amount of friction in your cloud deployments. Thanks. Now, I won't make the claim that like every successful cloud deployment has to be fully 12 factor that is not true, but there is an indirect relationship between the amount of friction that you will experience in doing your deployment and in operating and maintaining your code in these environments and how 12 factory your app is. In other words, the less 12 factor it is, the more friction, the more tech debt, the more toil you should expect. When you're pushing this app into production. So of course it does behoove you to get as far as you can, with these. So what are they? we go across different, characteristics of the code itself, but in every, in every factor, the important thing to remember is really the goal of it, and that is to keep the code portable and accessible and visible, no matter what type of platform it's running in and no matter what cloud it's running on. All right, so these are things like, for instance, tracking all of our code in one single revision control, doing many deploys out of that one single code base, right? As opposed to, that way, that one code base can very easily become a trigger to launch a CI job or a full deployment pipeline if we need to. And any changes, can be triggered off of that single code base as well. All right. dependencies explicitly declare and isolate dependencies. This is where, the shift in not including dependencies as part of our build has happened over the last really couple of decades. this is where we've got better ways now, things like package Jason, in NPM, for instance, that allow us to have these really powerful and semantic versioning controlled ways of declaring dependencies. and that they're explicitly declared and that they are isolated so that no matter what, that part of the supply chain can always be rebuilt for the app in any cloud. We want to store config in the environment, right? And more specifically, as we, as you've seen, we've moved back to the old environment variables. And the reason for that is because they're so Ubiquitous in every sort of POSIX, operating system. We have a concept of them. So again, multi cloud, multi platform, multi operating system. We can use environment variables and they're super easy to manipulate. from outside of the platform. So now that all of our code these days is buried in, some cloud image with some Kubernetes node deployed on it was some Kubernetes job running inside of it, like how many layers of abstraction between us and the app now to make all of this multi cloud stuff possible, it's easy to manipulate those environment variables from some outsider at administrating system, right? Within the operating system. And it's ubiquitous. Treat backing services as attached resources, right? Any type of database, media source, anything like that should be considered an attached resource. And we, this is not a brand new pattern. Like this is more like enforcing patterns, like for instance, J and D I. or other patterns that would create abstraction between resources that are needed by the application. and, and the application itself, right? How can we, decouple those things, but also make it very easy for the app to have access to those back backing services when it needs to. Separating our build and run stages. Okay, CI versus CD, right? making sure that our build and all of our, inner work, Is happening in one, process and then our stage to actually run the code, which will be handled by automation ultimately, and some other operating platform, is separate, execute our app is one or more stateless processes because again, stateless processes are easy to view and they're easy to manipulate from an operating system perspective from some outside administrating, agent. Whether that's Kubernetes or whatever, export services via port binding, TCP, it's ubiquitous. We've all got it. So use port binding specifically as a way to publish and export services so that we can all agree on that model and it'll work in every cloud, whether we're talking about, security groups or we're talking about ingress controlling, right? As long as everything's happening over TCP ports and we're using port binding, then we're all speaking the same language. Scale out via the process model with concurrency. Instead, by taking the processes that are stateless processes, we are now able to do that, which is great because now we can scale those processes across multiple containers, across multiple clouds, and work out at any scale. Make sure that our services are 100 percent disposable. Fast startup, graceful shutdown again, so that we can pick the service up, run it in some cloud, destroy it when we don't need it anymore and leave no trace. We, ultimate portability, dev prod parody. a lot of folks love to talk about Docker being this thing that's really, Improved. I was sped up the way that, that we can, get a quick operating system layer and do something with our app. And that's really important. But another textbook problem that was solved by Docker was this parody between development and product and production systems, right? Again, I mentioned that when I. was coming up as a developer, like a lot of us were. This was never guaranteed. We were literally running physically different services, servers, I should say. you had to make, someone had to make sure that, that config was synced between those services and it never was. and as a result, you didn't have this parity. And so the tests might pass in a QA environment or staging environment, you move to production, some tiny thing is different, and the whole thing crashes, and everybody's, in a fire drill for the whole weekend. And that was just what it was. Docker helped us solve for this big time by being able to provide ubiquitous operating system layers that would be common no matter where we were deploying our app, logs, treating our logs like event streams. I mentioned those abstraction layers before, right? please don't make your admins log into five different layers of abstraction just to get visibility on what's happening with your app, right? Your app should be Broadcasting its events. It shouldn't be writing to some log file locally. It should be pushing those logs out to some aggregator, Datadog, whatever it is, that makes it very easy to get access to those metrics. And then finally, any admin or management tasks that need to be run across the app, there should be a one off process again, so that this admin process can be run. And even automated in the long run by ubiquitous orchestration. All right. so great. We've done all this stuff, right? we have all this fantastic open source. That's given us this amazing ability to decompose into microservices. And now we're, we were just deploying the multi cloud and everything's so simple and great. No, not remotely. all of this freedom and portability. Has led in many cases to more complexity and disorganization, right? we really over indexed on velocity, and we didn't always take the right time to make sure that we were doing things in an organized way. And this leads to unnecessary friction and toil for developers, because this is the state of the art for most companies right now, right? We have, if you look at the bottom of this slide, all these different monoliths and many services and microservices and things. that are owned by different engineering managers that were worked on by different silos. And they're all getting managed over like spreadsheets or confluence wikis or something like that, right? These lagging systems of record that then somebody has to go back and update. And so of course they never get updated. And so they're always out of date. And so we don't know who owns what, we don't know where our dependencies are. we ended up tracking with things that look like this, we have our GitHub repos and our service names and our Slack channels, but. When anything has to change in this ever growing service catalog, it becomes so brittle and a system that, that makes it very difficult for cross functionally teams to understand and get any type of usefulness out of, right? So it's just bad. So we need to be rethinking this now, right? we've now moved into a level of platform engineering where we have to take that step from. orchestration and organization of our services to true choreography, right? We need to be able to add our final layer of abstraction, which I believe is the internal developer portal, right? So what we're doing at cortex, what's been done at backstage, right? We need a single non lagging real time, continuously monitoring system of record that's accessible by every engineer. And that can be used to align standards and ownership and service quality across the organization. And if you'd like to learn more about that pattern, in particular, we are having the world's first ever, conference in person conference dedicated to internal developer portals. It's happening in New York city on October 24th. and we would love to see you there and talk more about this solution. Thanks so much for your time. I really appreciate it and enjoy the rest of the conference.
...

Justin Reock

Head of DevRel @ Cortex

Justin Reock's LinkedIn account Justin Reock's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)