Conf42 JavaScript 2023 - Online

Git submodules: we have monorepo at home

Video size:

Abstract

If you’re struggling with reusing common components across teams and projects when developing applications with a shared architecture, ETG has a solution. In my talk, I’ll share our experience with Git submodules. We discuss pros and cons, also technical details as well.

Summary

  • In Pinek you need to implement feature X on Project A and release it on Project B. Half a year later, you receive a notification about a task for Project C. If such drama happens with your project from time to time, you will definitely like this presentation.
  • A monorepository is a way to manage code where you keep all the code for many projects in one place using one git repository. The problem of reusing common code is as old as time. What are the advantages of this solution?
  • Sub modules are just like regular git repositories. If you make changes to files within the sub modules, you need to execute the following code to save the changes. This way, changes related to one project won't forced an update for another project. In addition to all of this, we also get a bunch of other benefits.
  • Git sub modules are the same kind of dependency as other packages in your package. Components and functions beside being abstract and parameterized should be as small has possible. Documenting each of these aspects is another way to create documentation for new team members.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. I guess many of you can relate to this situation. Early in the morning you wake up feeling good before work, gently turn the alarm off and open your laptop to check your email and chat. You received a notification in your email you have been assigned a new task. You need to implement feature X on Project A. You work hard all day, do youre job and release feature X on Project A. Then some time passes. Put your little hand in mind. You wake up in a great mood again youre turn the alarm off and open your email. You have received another notification about a task you need to implement feature X on Project B. At first you think it's deja vu, but then upon closer inspection, you realize it's a different project and it just needs the same feature. You read about best programming practices and now you should reuse code. But the feature seems small and you already know everything about how to make it. Plus, the projects are from different domains. It's unlikely to happen again in the future, right? So you copy the code and release feature X on Project B very quickly and easily. That's a good practice too. But half a year later, waking up and opening your email, you feel a panic attack is coming on. You've received an education about you need to implement feature X on Project C. You say enough is enough. It's time to move everything to common components. You open Project A, look at your feature but don't recognize it. Everything has changed in Pinek. You open Project B, search for feature X. But here as well, it's completely different and not even the same as Project A. You are left with no choice. So for the third time, you create feature X and release it on Project C. If such drama happens with your project from time to time, then you will definitely like this presentation. And today I will display the role of Bill Murray. My name is Vadim. I am already ten years in development. I work as a front end team lead at ETG and also I write a series of articles on GitHub. 2 minutes Dev notes this series is about daily challenges programmers face. So as I said, I worked at ETG. ETG manages companies specializing in hotel bookings worldwide. It operates in hundreds plus markets represented by brands like Zenhotels and Tradehawk. So here is what it looks like on the inside. The B two C segment includes hotels, the most popular product of the company. Hotels are also represented in the B two B segment. That means that the hotel engine can be utilized for the needs of other businesses apart from hotels. In the B two B segment, the company also has a set of transportation products. What are these products? Sales, the sales of flight tickets, the sale of train tickets and transfers. About a year ago, I joined ETG and began to be responsible for the front end of all transportation products. And as you might guess, I encountered the drama. I showcased it at the beginning. So what we have the frontend is written in Nextjs. All apps have the same set of pages. Apps have similar domain. There is a similar set of services to interact with and similar work with suppliers. But each product has its own unique aspect which projects simply creating one application and coloring it in three different shades. Hence we encounter our drama. Everything seems to be the same, but it's not. So the very first pain point I started to address was how to create a feature once and forget about it. In general, the problem of reusing common code is as old as time. As soon as any company starts to grow a little and has more than one project, it faces the issue of quota use, not to mention big text. And when we hear big text and problem of quota use in the same sentence, the first things we think of is monorepository. A monorepository is a way to manage code where you keep all the code for many projects in one place using one git repository. This is different from the usual method where each project has its own separate git repository and git history. Therefore, it seems that we can conclude the presentation, create a mona repository, place common components in the separate folder and all applications will have access to it. However, the monorepository didn't suit our needs. So what's the problem? As I said before, our products are very similar, but they also have significant differences. They have different backends in different programming languages. In general, there are many edge cases where the scenarios do not overlap. Therefore, developing all products in one place with a single git history is not a good idea. Additionally, since they have different backends and different suppliers, each product should have its own release cycle and race warning, even if you are talking about the same feature. Of course we can use some of the trendy tools for working with monarch repositories. These tools allow us to check the git history only for the specific project and its dependencies and enable independent releases. All of these tools are working, however, it's not other things to learn and maintain and do it that we have a small team of front end developers introducing almost any new technology. Create a subbuzz factor of one since it's clear that I will be the one who will edit, I wouldn't want to block all the teams. Additionally, this adds unnecessary complexity. It's much simpler to copy a feature from one project to another rather than continuously maintain an abstraction that's only beneficial in a couple of cases. To learn more about the concept of over abstracting, you can watch Dana Brahma's presentation on the topic of wet code base. So Bill Murey, gripping onto a stubborn idea, couldn't escape from the loop and faced failure one more time. Once we rejected the monarch repository, we basically have one option left for reusing code. It's polar repository and packages I must say it's a quite popular approach in the company. Most UI components that are shared between teams and should be common are placed in the separate git repository. There is a dedicated team that maintains it, and all other developers in the company can contribute to it. In most of the teams I know, common used package manager is yarn and yarn allows specifying a repository URL as a project dependency in package JSON. So this led us to use a package in project even without setting up a corporate registry. What are the advantages of this solution? So it's widely used method within the company, ensuring uniformity change. History is isolated and semantically accurate within a single repository. It has intuitive workflow. You complete your work in the common package, bump the package direction, and update the package version in the application. So Bill Murray went to sleep hoping that he had broken the loop and that the new day would be different from yesterday. But the next morning. At first glance, it seems like the solution is suitable and it's time to start moving the common code to a separate repository. However, a few questions have arisen. How do we test everything during developing? And how do we write reusable code that can be reused? It might teams like a strange questions, but these are not without reason. Look, when I talk about the shared component library, it's pretty clear components are something that can be easily encapsulated and parameterized. We need a button. We wrote a function that takes some couple of parameters and returns some UI. How can we ensure that such UI will be displayed correctly in our application when we import it right? We should take some tools like storybook and create a story. Storybook will show us how our button will look like in our application when we add it there. If in addition to components we need to reuse some business logic, we still have unit tests. We extract the logic into separate functions or helpers and write unit tests for them. This way we determine how our logic will be reused within the application. But what should I do if I am using nextjs. For example, I need to extract a custom page that connects with a polar client and validates get initial props across all pages. With this level of reuse, I have to move half of the project infrastructure from the project repository to a separate repository and create a similar one there. To test my logic. In essence I need to invent my own storybook, but for components of different level. You might say just use the RN link, keep everything in your own repository, link your separate library and develop. That way when you are finished you can unlink, update the version and you are good to go. But this may seems counterintuitive. It's like agreeing that we have a poly repository instead of monorepository and then suddenly restorting the linking packages and essentially reinventing workspaces. We want to come up with something that allow us to harness the full power of the rigs as a monopository approach. However, we want to remain independent and continue using the polar repo approach. And as you can probably guess from the name of the presentation, we have found a solution. So git submodules when we are talking about GitHub modules, there is no difference from a regular git repository. I mean, this is the same things. Now, instead of storing the package version in the package JSON, we store a reference to a comet in the submodules history. This reference give us access to all of the files that were current at that common. This way you no longer need to manually handle linking packages. As soon as you install a submodules, a regular directory appears in the project containing the necessary package files. Let's look how we can do it. To add a submodules to the project, you need to execute the following comment git submodules add and specify the rule of your shared repository. After running the comment, you can check the status of your project and you will see the next following so we can see that two files have been added. First, the git submodules configuration file git modules where all the submodules used in the application will be list the config file itself is very simple. It contains name of submodule pass where submodules is placed in the application, and oral where that submodules is fetched from the second file, or more accurately a symbolic link, is the submodules itself. To confirm that it is a reference to a commit, you can execute the comment git diff and see the output that says it is has to make this a module files appear under its name, you need to fetch them. You need to run the common git submodules, update remote init, and then wrap it in NPM scripts. For example, to make it easier to use in local development or CI pipeline. For example, in our company we have such script yarn update GitHub and that's it. Now we have a directory in the project that leads to another repository. We move the shared code into the submodules and reuse it. There is no need to do anything separately within the submodules since all the code is used only in the project. Bill Murray commits his changes and goes to sleep. But in the morning, Bill Murray forgot that he has a team and that they need to learn how to use new tool, even if it's a good old git. As I mentioned before, submodules are just like regular git repositories. So if you make changes to files within the submodules, you need to execute the following code to save the changes. CD submodules pass git common git push once you commit changes in the submodules, this symbol link in your parent repository will also automatically update to point to the latest commit you made. And to avoid losing these changes, youre now need to commit the changes in the parent repository as well. It's the same process. Git common git push if you don't specify anything additional when starting to work with some modules, by default, the submodules fetches the default branch. Therefore your project will use files from this branch, but you can control this to do so. In the Gitchmodel file, you can specify the branch property, and then the submodules will start pulling changes from that branch. Unfortunately or fortunately, in the submodule's configuration, you can specify a specific commit hash, only a branch or tag anyway. In the parent repository it will store the latest commit hash from branch or from tag, but in case with branch, this has may change if new changes appear on the branch we are looking at. Due to this, our team made a decision to abstract from the common default branch. This way, changes related to one project won't forced an update for another project. What do I mean? We have created three more branches for each project and use the default branch as a synchronization point. Let's say someone started working on a feature in the Avia project. They have their own target branch Avia core, from which they create their feature branch and make changes in it. While the changes are in progress, they will also need to change the data source for the submodules and specify their feature branch. Once the feature is completed, the featured branch is merged with the Target branch and Avia youre moves forward. However, this doesn't force the other projects to immediately pull the latest changes from main branch. It's possible that this common feature might be needed in other projects later, and when these changes are needed, the main branch is pulled into the appropriate core branch of the project, for example rail whales or transfers. So these approaches looks like we have one common development branch and three production branches for each project. If you look more closely, you will notice that this approach slightly differs from the package approach. With packages, we have the opposite three developing branches which are actually feature branches, and one production branch that forces applications to use new stuff every time there is an update. Submodules allow us to avoid this. In addition to all of this, we also get a bunch of other benefits. We have the ability to easily search through files in the entire application, including the submodules. As I mentioned earlier, the IDE teams the submodule as a regular directory, so the search is no different. Now we have the capability to store regular typescript in our GitHub module which we can import into the application and use seamlessly without a build step. You no longer have to wait for the build pipeline of your common library to finish in order to use in a fresh version in your application. The shared code from the submodules is integrated into your application code and pipelines are combined. You can build once and if everything compiles, it means the common submodule is in order. This also includes linting and testing. We got an agnostic tool that can be fitted into any architecture as the submodule is the conceptually a directory with a set of files of a certain category, something like a shared model. In addition to all of the above, we also have the convenience of reusing not only isolated entities like components, but also large pieces of application logic without the need to rebuild the infrastructure. And essentially we have resolved our drama. Now implemented features develop uniformly for all products and follow the same concept. And the icing on the cake. As soon has we introduced submodules, we fixed several tipples in the shared functionality that was simply copied from project to project. Mil Muri seems to have figured something out. As soon as he started doing good things for the team, his typical day changed and now events are happened differently. Perhaps this is the only way to break the loop. Alright, next I would like to share some consideration with you on the topic of what to keep in mind if your team setup is similar to ours and you also want to transition to some modules. After all, besides the pros, there is always cons and nuances to deal with. This process always comes down to a tradeoff where we sacrifice something to get something else. So here we go. Git submodules is the same kind of dependency as other packages in your package. Json nothing in this module should implement anything from the application. You should follow this rule to avoid turning your submodules into something that only works in a limited number of projects while failing in orders because the files are not in the right places. On the other hand, the submodules is allowed to be aware of other dependencies in your project. It's similar to when you install a state manager for react. You expect it to work on any react project and not work where react is absent. If you have several multilingual applications, it's logical to assume that common text should be translated in the same way so that translators don't do the same work multiple times for each project. As soon as you have prepared the code for the release, it's best to enter the submodule, create a tag, and specify it as a branch in the submodule config in your main application. This is only for the future. If something breaks in the future and you need to debug code that depends on the submodules, it will be significant help. Look, you can make a mistake. For example, in version 100:17 release it and then add more code to the submodules. The submodules code can change a lot before the error is discovered and may already be incompatible with the version of your application in production text. Come to rescue in this case, we know which submodules version we used for the release. We can add this tag to use to the submodule config and reproduce the bug in our application on the compatible branch. But it is not necessary to create text every time. You can achieve the same effect by simply switching to a past common branch off from lit and then specify this branch in the submodule config and then we can test our application in the same way. Components and functions beside being abstract and parameterized should be as small has possible. This is necessary for tree shaking. If one file contains hundred functions, of which 90 are used in project A and project B, and only ten are used in Project C, it's better to separate them. This way. Project C doesn't need to include code in its bundle that it doesn't use. There are situations where a feature is developed at the same time. In two projects, the stars aligned. In such cases, it's obvious that the two projects might block each other while requesting the common parts. In this case, one project should work on the common part that can be reused later. While the second project focuses on things specific to its own requirements, it's crucial to define the contract or interface you expect from the common part to avoid issues with its future use. So one team works on the common part with a known interface, and the second team makes changes in their project using that interface. Then they can switch roles. The first team works on their project and the second team can start testing once the first team is done. The second team already got feedback and can make necessary adjustments to the shared code. And perhaps the most important advice all the abovementioned tips may not directly relate to working with submodules, but it's crucial to document each of them. What I have shared with you is based on our experience of using submodules for almost a year. Documenting each of these aspects is another way to create documentation for new team members joining your project and team, so don't overlook this. Let's summarize. The problem of reusing code has been around for a long time. A monopository is an excellent solution, but is more suitable for very large teams and may be challenging for smaller teams due to the need to maintain another new technology. A poly repository solves the problem of reusability and separate histories, but is better suited for entities that are easily to isolate and for which you can set up your own infrastructure. Git submodules allow you to combine the convenience of a monorepo with the ability to address issues from a poly repo. Listen to your team and document all gaps in your processes. This is your contribution to the documentation for future team members and that's all. Subscribe to my socials, leave feedback, and thank you all for your attention.
...

Vadim Tsaregorodtsev

Lead Software Engineer @ Emerging Travel Group

Vadim Tsaregorodtsev's LinkedIn account Vadim Tsaregorodtsev's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)