Like them or loathe them, microservices are still all the rage. I'm seeing more and more companies go down this route. In particular, a common start to the journey into microservices is breaking up an existing monolith.
In this post, I'll show how you can split off several pieces of a large repository into a single, smaller repo while maintaining the git history. I'll be using AutoMapper as an example, but the same steps should apply if you need to split several project from a larger solution and keep the history.
At the end of this process you should have:
- A new repository containing all the files, projects, test etc that you want to spliut
- Full git history of the split off code
- Any branches and tags you want to keep
Background Info
The tool we're going to use, git filter-repo, works by playing back every commit from the old repo into a new repo, keeping only the bits you tell it to. We're going to do this "filtering" on a "path" basis, i.e. if the commit contained a file that was changed in a path we're interested in, keep it, otherwise ignore it.
To demonstrate this, without using an overly contrived example, let's use the AutoMapper repo. At the time of writing, the top level directory contains a number of files and folders called "docs", "lib" and "src". Inside src there are more folders called "AutoMapper", "BenchMark", "IntegrationTests" and "UnitTests":
We're going to pretend that we want to split off a new repo containing ".editorconfig", ".readthedocs.yml", "docs" and everything in "src/Benchmark". Totally meaningless and non-functioning, but it's something you can do as you read along and demonstrates the technique.
Step 0 - Install git-filter-repo
First we need to install git filter-repo. I found the installation instructions a little difficult to follow as I don't use Scoop (a package manager on Windows I'd never heard of), so if you're struggle, I did the following:
- Install Python via chocolatey - I think you can uses the Windows Store too
- Clone git filter-repo
- Update the first line of git-filter-repo so it's "python" instead of "python3"
- Run
cp git-filter-repo $(git --exec-path)
Step 1 - Identify the files/folders to keep
In our scenario, I've already decided what we want to keep in the new repo. If you're doing this for a real project, it's a little harder. I think there are two choices:
- Option 1 - Start a new solution, copy across the "runnable" project you want and see if it runs. If it doesn't, add the missing projects it mentions and try again. Repeat this till it's does.
- Option 2 - Copy the current solution and delete projects until it breaks
Once that's done, it's probably a good idea to run all your tests to make sure everything has come across.
Once you're happy the new solution works, you need to create a new text file, say "files-to-keep.txt" containing a file/folder one per line of what's left. In our scenario we would have:
.editorconfig
.readthedocs.yml
docs
src/Benchmark
Step 2 - Creating the new repo
If you're on windows, do all the following commands in "git bash" which should've been installed with git.
1. Clone the original repo
For various safety reasons, git filter-repo must be run on a fresh clone. You'll likely be repeating these steps many times, so I also took a zip of the freshly cloned repo to speed up the repeat process (especially if you have poor download speeds).
2. Analyse the repo
It's a good idea to "analyse" the repo as there might be some gotchas. So run git filter-repo --analyze
.
Pay attention to any warnings in the output. For example, I got the following warning:
$ git filter-repo --analyze
Processed 59324 blob sizes
Processed 1908 commitswarning: inexact rename detection was skipped due to too many files.
warning: you may want to set your diff.renameLimit variable to at least 476 and retry the command.
Processed 2089 commits
Writing reports to .git\filter-repo\analysis...done.
Which can be solved by running git config diff.renameLimit 476
.
Keep repeating the analyze command and resolving them until there are no more warnings.
3. Perform the filter
Finally, run the following command to perform the actual filter:
git filter-repo --paths-from-file <path_to_file_from_previous_step>
You now have a new fresh repo with just the files/folders we want to keep with the history intact. Try checking the logs with something like git log --pretty=oneline --abbrev-commit
and confirm it's worked.
Step 3 - Creating and Testing new solution file
So far so good, but we need to confirm what we've copied across works. To do this, we need to create a new solution file, add all the projects to it and confirm it runs.
Depending on the number of projects you moved across, you might want to script this task. I used:
- echo "dotnet new sln" > add_to_solution.sh
- find . -name "*.csproj" -exec echo "dotnet sln add {}" \; >> add_to_solution.sh
- ./add_to_solution.sh
Note: The solution file created will take the name of the folder, so rename it if need be.
Open the solution in Visual Studio and confirm it builds, runs and all the tests pass. (We can't do that in this example)
Be warned, I had this step fail several times because I missed something while constructing the list of files/folders I was interested in. Hopefully you've had better luck.
(Optional) Step 4 - Clearing out unwanted git refs
One issue you might have, is that the tool creates all the current branches, tags, refs etc including "replace" refs. To see what you have run git show-ref
.
I didn't want all of mine, but thankfully it's possible to automate the removal by running:
git show-ref | grep -v '\/tags\|\/dev\|\/release' | sed -e s/[0-9a-f]*/delete/ > show-ref-delete-commands.txt
- This will delete all branches and replace refs that are not tags, dev, release/*. You might want to compare this file with the output ofgit show-ref
to be absolutely surecat show-ref-delete-commands.txt | git update-ref --stdin
You should now have a working new solution with only the branches and tags you're interested in. All that's left is pushing to a new origin.
Step 5 - Push to a new origin
I was using Azure DevOps, so when I created a new repoit contained instructions on how to push an existing repo. Hopefully yours does too, but if not, it's something like:
git remote add origin "new_origin"
git push -u origin --all
That should push the tags too, but if not, git push --tags
will.
Final step - Test a fresh clone
Take a fresh clone of this repo and retest. You should now have a new repo, with the history intact, of the piece of the monolith split off.
If you don't, you will hopefully be able to "fall forward" by starting again at step 1 and adding more files.
Comments Section