I just kicked some serious computer butt today. I would like to share.
Original problem
I just wanted a way to recursively .gitignore
some .csv
files in my work folders. CSV files, as you know, can get big (BIG). And GitHub doesn't like that; in fact, it was bouncing my git push -u origin master
back - and complaining that I had exceeded the acceptable size of my commit (wow). My previous .gitignore
always only excluded CSV files that were exactly two levels deep:
path/to/top/folder/*/*.csv
But this was no longer true. I had just massively reorganized my folders, and my CSVs were now scattered arbitrary-levels deep in my various project folder trees.
WHAT TO DO?
Well, first, this works in your .gitignore
:
path/to/top/folder/**/*.csv
Specifically, that double wildcard, **
, says "wildcard recursively". That is, "search through arbitrary levels of sub-directories". What joy! Which led me into a rabbit hole: wait, can I wildcard recursively in my shell?
Yes, but it's time to move to zsh
I just finished a very nice book about bash, and so I didn't want to haul over my well-earned shortcuts and alias
es and all that onto a totally new shell interpreter. Ugh! (Though, to be fair, wiki notes that zsh is a superset of bash.)
But I saw some StackOverflow post saying that zsh
has the double-glob (**
) built-in, and that - if I wanted it in bash
(my current/the default shell interpreter), I'd have to download something or other. So instead of downloading globstar
, I decided to pull out the GUTS OF THE SHELL ENTIRELY and just install zsh
.
LET'S DO THIS.
Installing zsh
zsh
is just another shell interpreter, with some nice add-ons and wing-dings that make it a superset of bash. I found oh-my-zsh, which is a "community-driven" zsh
configuration helper - basically, people have already futzed with zsh
profiles to make things pretty and functional and allow autocomplete, and so on. In general, I think it's wise to follow crowds on tech stuff - where goeth the crowd, there be-eth the StackOverflow answers.
Anyway, here's me installing zsh
and getting going:
# OSX has zsh installed, but I saw somewhere that it's old? so let's update
brew install zsh
# change default shell from bash to zsh
chsh -s $(which zsh)
Then close and reopen iTerm2! Next was futzing around with the oh-my-zsh
stuff, by enabling oh-my-zsh
, as well as its plugins and themes in my .zshrc
. I also copied over all my various aliases and such from .bashrc
and .bash_profile
to .zshrc
. Everything basically worked fine. (Yes, I'll put my dotfiles on GitHub someday.) For prettiness, I used the pure
prompt theme - which is not part of oh-my-zsh
, and so needed some overriding. Here's what it looks like, when I've got my pyenv
environment activated (www_env
), I'm on a git master
branch, and I'm in the blog folder:
Doing the thing I came here to do
After spending an hour or so getting zsh
all ported over, all I wanted to do was remove the giant CSV files from my git history. They were several commits behind in my local branch's history, and I was stuck: I needed to roll back to the earliest commit (pre-CSV dump) and remove them (temporarily!) from the directory entirely. I needed to rewrite history. Since I had added the path/to/top/folder/**/*.csv
line to my .gitignore
, whatever changes I made were not being picked up by git.
So my plan was:
- Banish the CSV files to a temporary
data/
folder. - Roll back to an earlier commit (pre-CSV dump).
- Re-commit everything, now without any CSVs in the git history.
- Push up to
origin/master
! 😍
In zsh
, all that was:
# 1. banish any CSVs in any subdirectory to new top-level data/ folder
find ./ -name '*.csv' | xargs -I '{}' mv {} data/
# 2. roll back to earlier commit, pre-CSV troubles
git reset [SOME_SHA]
# 3. re-commit everything
git status
git add .
git commit -m "OMG CSVS"
# 4. Pusshhhhhh!
git push -u origin master
# 5. OH SHIT THE CSVS, WHERE DO I PUT THEM BACK
At this point, my good ego feelings quickly disappeared. Since I had moved all CSVs out of any directory and plopped them down into data/
, I no longer knew which CSV was associated with which directory (and thus, which project). I was horrified. And my CSVs had cryptic names like A.csv
and eligible.csv
. Huh!?
Thankfully, I knew that all the CSVs had been used (either created or imported) by some data analysis in Python. So I could just grep
through all my .py
files (recursively, throughout all the directories and sub-directories) to find when and where each CSV file appeared.
grep -rI '.csv' ./**/*.py
This showed me that, for example, Python file X in directory Y had been doing stuff with A.csv
- voila! I could return A.csv
to its rightful directory. And now, since all CSVs were .gitignored
, I would never have to worry about this again!