Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone.
I guess many of you can relate to this situation. Early in
the morning you wake up feeling good before work, gently turn
the alarm off and open your laptop to check your email and chat.
You received a notification in your email you have been assigned
a new task. You need to implement feature X on Project A.
You work hard all day, do youre job and release feature X on Project
A. Then some time passes. Put your
little hand in mind. You wake up in a great mood again youre turn
the alarm off and open your email. You have received another notification
about a task you need to implement feature X
on Project B. At first you think it's deja vu,
but then upon closer inspection, you realize it's a different project and
it just needs the same feature. You read about best programming
practices and now you should reuse code. But the feature seems small and
you already know everything about how to make it. Plus, the projects are from
different domains. It's unlikely to happen again in the future,
right? So you copy the code and release feature X on
Project B very quickly and easily. That's a good practice
too. But half a year later,
waking up and opening your email, you feel a panic attack is coming
on. You've received an education about you need to
implement feature X on Project C. You say enough is
enough. It's time to move everything to common components.
You open Project A, look at your feature but don't recognize
it. Everything has changed in Pinek. You open Project B,
search for feature X. But here as well, it's completely different
and not even the same as Project A. You are
left with no choice. So for the third time, you create feature X
and release it on Project C. If such drama
happens with your project from time to time, then you will definitely like this presentation.
And today I will display the role of Bill Murray. My name
is Vadim. I am already ten years in development.
I work as a front end team lead at ETG and also I
write a series of articles on GitHub. 2 minutes Dev notes this
series is about daily challenges programmers face.
So as I said, I worked at ETG. ETG manages
companies specializing in hotel bookings worldwide.
It operates in hundreds plus markets represented
by brands like Zenhotels and Tradehawk.
So here is what it looks like on the inside. The B
two C segment includes hotels, the most popular product of the company.
Hotels are also represented in the B two B segment. That means
that the hotel engine can be utilized for the needs of other businesses
apart from hotels. In the B two B segment, the company also has a set
of transportation products. What are these products?
Sales, the sales of flight tickets, the sale of train
tickets and transfers. About a year ago,
I joined ETG and began to be responsible for the front end of all
transportation products. And as you might guess, I encountered the
drama. I showcased it at the beginning.
So what we have the frontend is written in Nextjs.
All apps have the same set of pages.
Apps have similar domain. There is a similar
set of services to interact with and similar work with suppliers.
But each product has its own unique aspect which projects simply creating
one application and coloring it in three different shades.
Hence we encounter our drama. Everything seems to be the same,
but it's not. So the very first pain point
I started to address was how to create a feature once
and forget about it. In general, the problem of reusing
common code is as old as time. As soon as any company starts to
grow a little and has more than one project, it faces the
issue of quota use, not to mention big text. And when
we hear big text and problem of quota use in the same
sentence, the first things we think of is monorepository.
A monorepository is a way to manage code where you keep all the
code for many projects in one place using one git repository.
This is different from the usual method where each project has its own separate
git repository and git history. Therefore,
it seems that we can conclude the presentation, create a
mona repository, place common components in the separate folder and all applications
will have access to it. However, the monorepository didn't
suit our needs. So what's the problem? As I
said before, our products are very similar, but they
also have significant differences. They have different backends in
different programming languages. In general, there are many
edge cases where the scenarios do not overlap.
Therefore, developing all products in one place with a single git
history is not a good idea. Additionally, since they have
different backends and different suppliers, each product should have
its own release cycle and race warning, even if you
are talking about the same feature. Of course we can
use some of the trendy tools for working with monarch repositories.
These tools allow us to check the git history only for the specific project
and its dependencies and enable independent releases.
All of these tools are working, however, it's not other things to
learn and maintain and do it that we have a small team of
front end developers introducing almost any new technology.
Create a subbuzz factor of one since it's clear that I
will be the one who will edit, I wouldn't want to block all the teams.
Additionally, this adds unnecessary complexity.
It's much simpler to copy a feature from one project to another rather
than continuously maintain an abstraction that's only beneficial in a
couple of cases. To learn more about the concept of
over abstracting, you can watch Dana Brahma's presentation on
the topic of wet code base.
So Bill Murey, gripping onto a stubborn idea,
couldn't escape from the loop and faced failure one more time.
Once we rejected the monarch repository, we basically have one
option left for reusing code. It's polar repository and
packages I must say it's a quite popular approach in
the company. Most UI components that are shared between teams
and should be common are placed in the separate git repository.
There is a dedicated team that maintains it, and all other developers
in the company can contribute to it. In most of the teams I
know, common used package manager is yarn and yarn allows
specifying a repository URL as a project dependency in package JSON.
So this led us to use a package in project even without setting
up a corporate registry. What are the advantages of this solution?
So it's widely used method within the company, ensuring uniformity
change. History is isolated and semantically accurate within a
single repository. It has intuitive workflow.
You complete your work in the common package, bump the package direction,
and update the package version in the application. So Bill Murray
went to sleep hoping that he had broken the loop and that the new
day would be different from yesterday. But the next morning.
At first glance, it seems like the solution is suitable and it's time to
start moving the common code to a separate repository.
However, a few questions have arisen. How do we test
everything during developing? And how do we write reusable code
that can be reused? It might teams like a strange questions,
but these are not without reason. Look, when I talk
about the shared component library, it's pretty clear components are
something that can be easily encapsulated and parameterized.
We need a button. We wrote a function that takes some couple of parameters and
returns some UI. How can we ensure that
such UI will be displayed correctly in our application when we import it
right? We should take some tools like storybook and
create a story. Storybook will show us how our
button will look like in our application when we add it there.
If in addition to components we need to reuse some business logic,
we still have unit tests. We extract the logic
into separate functions or helpers and write unit tests for them.
This way we determine how our logic will be reused within the application.
But what should I do if I am using nextjs. For example,
I need to extract a custom page that connects
with a polar client and validates get initial props across all pages.
With this level of reuse, I have to move half of the project
infrastructure from the project repository to a separate repository
and create a similar one there. To test my logic.
In essence I need to invent my own storybook, but for components of
different level. You might say just use
the RN link, keep everything in your own repository,
link your separate library and develop. That way when
you are finished you can unlink, update the version and you are
good to go. But this may seems counterintuitive.
It's like agreeing that we have a poly repository instead of monorepository and
then suddenly restorting the linking packages and essentially reinventing
workspaces. We want to come up with something that allow
us to harness the full power of the rigs as a monopository
approach. However, we want to remain independent and continue using
the polar repo approach. And as you can probably guess
from the name of the presentation, we have found a solution.
So git submodules when we are talking about GitHub modules,
there is no difference from a regular git repository. I mean, this is the
same things. Now, instead of storing the package version in the
package JSON, we store a reference to a comet in the
submodules history. This reference give us access to all of
the files that were current at that common. This way
you no longer need to manually handle linking packages.
As soon as you install a submodules, a regular directory appears in the
project containing the necessary package files.
Let's look how we can do it. To add a submodules to the project,
you need to execute the following comment git submodules add and specify
the rule of your shared repository. After running the comment,
you can check the status of your project and you will see the
next following so we can see
that two files have been added. First, the git submodules
configuration file git modules where all the submodules
used in the application will be list the config file itself is very
simple. It contains name of submodule pass where submodules is
placed in the application, and oral where that submodules is fetched from
the second file, or more accurately a symbolic link, is the submodules
itself. To confirm that it is a reference to a commit,
you can execute the comment git diff and see the output that says
it is has to make this a
module files appear under its name, you need to fetch them.
You need to run the common git submodules, update remote init,
and then wrap it in NPM scripts. For example, to make it
easier to use in local development or CI pipeline.
For example, in our company we have such script yarn update GitHub
and that's it. Now we have a directory in the project that leads to another
repository. We move the shared code into the submodules and reuse
it. There is no need to do anything separately within the submodules
since all the code is used only in the project. Bill Murray
commits his changes and goes to sleep. But in the morning,
Bill Murray forgot that he has a team and that they need to learn how
to use new tool, even if it's a good old git.
As I mentioned before, submodules are just like regular git repositories.
So if you make changes to files within the submodules, you need to execute
the following code to save the changes. CD submodules pass git common
git push once you commit changes in the submodules, this symbol
link in your parent repository will also automatically update
to point to the latest commit you made. And to
avoid losing these changes, youre now need to commit the changes in the parent
repository as well. It's the same process. Git common git push
if you don't specify anything additional when starting to work
with some modules, by default, the submodules fetches the default branch.
Therefore your project will use files from this branch, but you can
control this to do so. In the Gitchmodel file,
you can specify the branch property, and then the submodules will
start pulling changes from that branch. Unfortunately or fortunately,
in the submodule's configuration, you can specify a specific commit hash,
only a branch or tag anyway. In the parent
repository it will store the latest commit hash from branch or from
tag, but in case with branch, this has may change
if new changes appear on the branch we are looking at. Due to
this, our team made a decision to abstract from the common default
branch. This way, changes related to one project
won't forced an update for another project. What do I mean?
We have created three more branches for each project and use
the default branch as a synchronization point.
Let's say someone started working on a feature in the Avia
project. They have their own target branch Avia
core, from which they create their feature branch and make changes
in it. While the changes are in progress, they will also need
to change the data source for the submodules and specify their feature branch.
Once the feature is completed, the featured branch is merged with the Target
branch and Avia youre moves forward.
However, this doesn't force the other projects to immediately pull the
latest changes from main branch. It's possible that this common feature
might be needed in other projects later, and when these
changes are needed, the main branch is pulled into the appropriate core branch of
the project, for example rail whales or transfers.
So these approaches looks like we have one common development
branch and three production branches for each project.
If you look more closely, you will notice that this approach slightly
differs from the package approach. With packages, we have
the opposite three developing branches which
are actually feature branches, and one production branch that forces
applications to use new stuff every time there is an update.
Submodules allow us to avoid this. In addition to all
of this, we also get a bunch of other benefits. We have the
ability to easily search through files in the entire application,
including the submodules. As I mentioned earlier,
the IDE teams the submodule as a regular directory, so the
search is no different. Now we have the capability to
store regular typescript in our GitHub module which we can import into
the application and use seamlessly without a build step.
You no longer have to wait for the build pipeline of your common library
to finish in order to use in a fresh version in your application.
The shared code from the submodules is integrated into your application code
and pipelines are combined. You can build
once and if everything compiles, it means the common submodule
is in order. This also includes linting and
testing. We got an agnostic tool that can be
fitted into any architecture as the submodule is the conceptually
a directory with a set of files of a certain category,
something like a shared model. In addition to
all of the above, we also have the convenience of reusing not
only isolated entities like components, but also large pieces of
application logic without the need to rebuild the infrastructure.
And essentially we have resolved our drama.
Now implemented features develop uniformly for all products and
follow the same concept. And the icing on the cake. As soon
has we introduced submodules, we fixed several tipples in the
shared functionality that was simply copied from project to project.
Mil Muri seems to have figured something out. As soon as he started doing
good things for the team, his typical day changed and now events are
happened differently. Perhaps this is the only way to break the loop.
Alright, next I would like to share some consideration with you on
the topic of what to keep in mind if your team setup is similar to
ours and you also want to transition to some modules.
After all, besides the pros, there is always cons and nuances to
deal with. This process always comes down to a tradeoff where
we sacrifice something to get something else. So here we
go. Git submodules is the same kind of dependency
as other packages in your package. Json nothing
in this module should implement anything from the application.
You should follow this rule to avoid turning your submodules into something
that only works in a limited number of projects while failing in orders
because the files are not in the right places. On the other hand, the submodules
is allowed to be aware of other dependencies in your project.
It's similar to when you install a state manager for react.
You expect it to work on any react project and not work where
react is absent. If you have several multilingual
applications, it's logical to assume that common text should
be translated in the same way so that translators don't do the same work
multiple times for each project. As soon as you have prepared
the code for the release, it's best to enter the submodule,
create a tag, and specify it as a branch in the submodule config
in your main application. This is only for the future.
If something breaks in the future and you need to debug code that depends on
the submodules, it will be significant help. Look,
you can make a mistake. For example, in version 100:17 release
it and then add more code to the submodules. The submodules code can
change a lot before the error is discovered and may already be incompatible
with the version of your application in production text. Come to
rescue in this case, we know which submodules version
we used for the release. We can add this tag to use to the
submodule config and reproduce the bug in our application on the compatible branch.
But it is not necessary to create text every time.
You can achieve the same effect by simply switching to a past common
branch off from lit and then specify this branch in the submodule config
and then we can test our application in the same way.
Components and functions beside being abstract and parameterized
should be as small has possible. This is necessary for tree shaking.
If one file contains hundred functions, of which 90
are used in project A and project B, and only ten are used in Project
C, it's better to separate them. This way. Project C
doesn't need to include code in its bundle that it doesn't use.
There are situations where a feature is developed
at the same time. In two projects, the stars aligned. In such
cases, it's obvious that the two projects might block each other while requesting
the common parts. In this case, one project should work
on the common part that can be reused later. While the second
project focuses on things specific to its own requirements,
it's crucial to define the contract or interface you expect from the
common part to avoid issues with its future
use. So one team works on the common part
with a known interface, and the second team makes changes in their
project using that interface. Then they can switch roles.
The first team works on their project and the second team can start
testing once the first team is done. The second team already got feedback
and can make necessary adjustments to the shared code. And perhaps
the most important advice all the abovementioned tips may
not directly relate to working with submodules, but it's crucial to document
each of them. What I have shared with you is based on our
experience of using submodules for almost a year.
Documenting each of these aspects is another way to create documentation
for new team members joining your project and team,
so don't overlook this. Let's summarize.
The problem of reusing code has been around for a long time.
A monopository is an excellent solution, but is more suitable for very
large teams and may be challenging for smaller teams due to the need to
maintain another new technology. A poly repository
solves the problem of reusability and separate histories,
but is better suited for entities that are easily to isolate and
for which you can set up your own infrastructure.
Git submodules allow you to combine the convenience of a monorepo
with the ability to address issues from a poly repo.
Listen to your team and document all gaps in your processes.
This is your contribution to the documentation for future team members and
that's all. Subscribe to my socials, leave feedback, and thank you
all for your attention.