Imagine you are working in the next feature for the main app of your employer and suddenly you notice something terrible: there is a hardcoded secret committed in the remote repository. This is something we must avoid at all costs because anyone with access to that repository can see the secret and potentially use it to access live data or bring the system down.
After some investigation, you notice that the commit that introduced the secret is quite old, and lots of new commits have been added since then. There is no easy way out of this mess by undoing commits because you need to change every single one of them. What can we do to solve this? In this blog post we are going to discuss how we can remove secrets that have been committed to remote repositories quickly and effectively.
Strategy to follow
Because git records every single change in each commit it is necessary to rewrite all the history in the repository from the moment the secret was introduced. This is a very sensitive task if a large team is working with the repository: all our engineers have the repository cloned locally, and by changing the whole repository means their own copy must be updated as well, or they would have to face the consequences of divergent branches and lots of merge conflicts.
To rewrite a branch of a repository there are several tools to aid us:
BFG Repo-Cleaner are some examples. In this blog post, we are going to check the former but take into consideration that there are several tools for this purpose.
The strategy will be as follows:
- Switch to the branch you want to remove secrets from (usually important ones like
- Identify the file(s) that have secrets committed and make a backup of them.
- Use the
git-filter-repotool with those files as arguments to remove them from all commits.
- Strip the secrets from the backup files and commit them again to the repository.
- Do a force push to rewrite the complete history of that branch in the remote repository.
- Make everyone in the team update their local repositories to the new one.
Removing the secrets from the repository
git-filter-repo is a script specifically created to help with these kind of rewrite tasks of repositories. You can find all the information about the script in its GitHub repository. The way it works, in a nutshell, is removing the files with the secret from all the commits of the working branch.
If the secret is placed in a file that is needed in the solution, make a copy and save it to add it afterwards after stripping the secrets from it.
This tool is a Python script so first you have to install it with the following command:
$ pip install git-filter-repo
Make sure that the repository is clean and you have no files staged or stashed. The tool will warn you about this anyways but take into consideration that there will be a huge rewrite after executing the command: you might lose data if you are not careful.
The best way of doing this is to clone the remote repository again, this way you are sure that the copy is clean.
After the installation is complete, go to the root directory of the solution, checkout to the branch that you wish to strip the secrets from and execute the following command for each file you wish to delete:
$ git-filter-repo --invert-paths --path PATH-TO-FILE-WITH-SECRETS
If everything went correctly the terminal should show an output similar to this:
Parsed 197 commits New history written in 0.11 seconds; now repacking/cleaning... Repacking your repo and cleaning out old unneeded objects Enumerating objects: 210, done. Counting objects: 100% (210/210), done. Delta compression using up to 12 threads Compressing objects: 100% (127/127), done. Writing objects: 100% (210/210), done. Building bitmaps: 100% (48/48), done. Total 210 (delta 98), reused 144 (delta 75), pack-reused 0 Completely finished after 0.64 seconds.
Now is the time to add the updated files with the secrets stripped from them. Do it and create a new commit adding those files.
Once you are sure that no secrets exist in the repository anymore is time to upload the new branch to the remote repository. It is paramount that your team is aware of this and nobody has feature branches open from the old branch to avoid huge conflicts afterwards. You are now going to change the past so no parallel work must be done.
You can rewrite the remote repository with the following command:
$ git push origin --force
ATTENTION: This is an EXTREMELY SENSITIVE COMMAND, because it will rewrite the remote branch and potentially lose data. Most likely you won’t have the rights to do that in the first place for this reason. If that is the case, you need to go to your git server provider and allow force pushes in that particular branch. DO NOT FORGET to turn that option off after you are done.
Before every team member can start working again they have to pull the changes from the remote. Since the history has been rewritten it is they cannot do a standard pull. They need to rewrite their own local history with:
$ git pull --rebase
From here on, work can continue as usual with a secretless repository.
Addendum - 08/09/2023
As Pablo commented on my Linkedin post the secret is still saved locally in the git-reflog. Every team member will have that secret in their own reflog, and may even bring it back from the dead if they use it to recover another commit.
We can delete the reflogs by executing the following command, every team member should do this:
$ git reflog expire --expire=now --all
This is a destructive command, so we need to careful when using it. The best thing to do is to wait a while after the initial rewrite to make sure everything works as intended, and then delete the reflogs.
The remote git will also have its own reflogs, but since the reflogs are local and do not propagate through clones, fetches, and push/pulls, no one but the provider will have access to those secrets. If the information is very sensitive, you can always contact your provider’s support and ask them to clean up the reflogs and any cached views they may still have.
How to prevent this from happening
Make sure you have an updated
.gitignore file with all the configuration and secret files included to avoid committing them to the main repository. This is usually done not only for secrets but for every file that is not part of the solution being developed but it is needed for it to run (e.g. external dependencies, cache files…).
Another defense is to add automatic tooling to detect if a secret has been committed, locally or at the moment a PR is created. A famous tool for this is gitleaks.