GitDeepDive/README.md

11 KiB
Raw Permalink Blame History

Git Deep Dive

A Version 1 Technical Meetup talk covering the deep internals of git.

Introduction

This is the slide-deck and set-up scripts used to give the Git Deep Dive technical meet-up talk on 2018-06-25 in Version 1, by Éibhear Ó hAnluain.

The pack comes with the following:

  • This document
  • The slide-deck, GitDeepDive.pdf
  • A script to set up a simple git repository for exploration purposes: simple-setup.sh
  • A script to set up the mimic of a development team and processes: gitDemo.sh
  • A script to set up another simple repository to run through the process for completely clearing a file out of a git database: largeFile-setup.sh.
  • A .gitignore file.

Setup

  1. Run the script simple-setup.sh. This will create a git repository in simpleRepo and populate it with some commits. The commits in this repository will correspond to the diagrams in the slide-deck that relate to merging.
  2. Run the script largeFile-setup.sh. This will create another git repository in largeFileRepo and populate it with the same commits as in simpleRepo, and then puts in some more to create and manipulate a large binary file, and then to remove it. This will be used for the demonstration to remove a file completely from the repository.
  3. Run the script gitDemo.sh. This will replicate a previous git repository as a bare repo in devTeamDemo/javaBootcampNoEclipse.git, and then create a number of clones to represent the actions of a team lead and two developers, and mimic collaborative development among them. gitDemo.sh takes one optional parameter, -s, which will cause the progression to stop for 15 seconds following each "actor's" push in order to afford the opportunity to look at the git log.
  4. Use the gpg utility to generate a key-pair so that you can work through the commit- and tag-signing slide and sample commands. If you have access to gpg2 and not gpg, you'll need to set the following git config to make sure it's picked up: git config --global gpg.program gpg2.
  5. Open the file GitDeepDive.pdf, and as you go through the slides, refer to the Sample Commands section below for commands you can execute to get further insights.

Sample commands

Slide: Configuration

# Go to where the GitDeepDive.pdf file is and set your BASE_DIR
# environment variable.
cd <where-you-have-this-repo-cloned-to>
export BASE_DIR=$(pwd)
# List your basic git config
git config --list
# Go into the simple repo location and look at the various config
# contexts
cd ${BASE_DIR}/simpleRepo
git config --local --list
git config --global --list
git config --system --list
# Set some settings
git config --global user.name "<YourName>"
git config --global user.email "<YourEmailAddress>"
# Use a text editor to edit your config
git config -e

Slide: fetch and merge, not pull

# Go into the clone belonging to one of the developers in the
# development team demo area
cd ${BASE_DIR}/devTeamDemo/javaBootcampNoEclipse.dev1
# Update the clone, but don't merge anything
git fetch --prune
# Review the local and remote branches.
git branch -va

Slide: Merging approaches: fast-forward

cd ${BASE_DIR}/simpleRepo
# Check out the master branch and review it's log
git checkout master
git log --decorate --graph --oneline --all
# Merge in the Rel1 branch and review the new log.
git merge Rel1
git log --decorate --graph --oneline --all

Slide: Merging approaches: merging strategies

cd ${BASE_DIR}/simpleRepo
# Check out the master branch, merge the Rel2 branch and review the
# new log.
git checkout master
# You'll be prompted for a commit message here.
git merge Rel2
git log --decorate --graph --oneline --all

Slide: Merging approaches: Rebase

# Remove and refresh the simpleRepo
${BASE_DIR}/simple-setup.sh
cd ${BASE_DIR}/simpleRepo

# Check out master, review its log, merge Rel1 and review the log
git checkout master
git log --decorate --graph --oneline --all
git merge Rel1
git log --decorate --graph --oneline --all

# Check out Rel2
git checkout Rel2
# Rebase Rel2 onto the now-new master
git rebase master
# Review the log
git log --decorate --graph --oneline --all

Slide: refs

# Remove and refresh the simpleRepo
${BASE_DIR}/simple-setup.sh
cd ${BASE_DIR}/simpleRepo

# Look at the contents of .git/refs/heads and one of the heads itself
ls -l .git/refs/heads/
cat .git/refs/heads/Rel2

# Tag a branch, look at the branch ref and the tag ref
git tag Rel1.0 Rel1
cat .git/refs/heads/Rel1
cat .git/refs/tags/Rel1.0

# Look at the HEAD ref
cat .git/HEAD

# Go to another clone and look at the refs for the origin remote
cd ${BASE_DIR}/devTeamDemo/javaBootcampNoEclipse.dev1
ls -l .git/refs/remotes/origin/
# .git/packed-refs contains the refs that haven't been interacted with
# yet.
cat .git/packed-refs

Slide: Annotated tags

cd ${BASE_DIR}/simpleRepo

# Look at a "lightweight" tag
cat .git/refs/tags/Rel1.0
# What type of object is it pointing to?
git cat-file -t $(cat .git/refs/tags/Rel1.0)
# What's in the object it's pointing to
git cat-file -p $(cat .git/refs/tags/Rel1.0)

# Create an annotated tag and look at *it*
git tag -a -m "Formal release of 1.0" Rel1.0.prod Rel1
cat .git/refs/tags/Rel1.0.prod
# What type of object is it pointing to?
git cat-file -t $(cat .git/refs/tags/Rel1.0.prod)
# What's in the object it's pointing to
git cat-file -p $(cat .git/refs/tags/Rel1.0.prod)

Slide: blame

# Look at the lines of the file on master
git checkout master
git blame information.md
# .. and on Rel1
git checkout Rel1
git blame information.md
# And on master after Rel2 has been merged in.
git merge Rel2
git blame information.md
# Slide

Slide: Tag and commit signing

# Look at your secret keys
gpg --list-secret-keys

# Check out master and make a change to information.md
git checkout master
cat <<EOF >> information.md
An additional line for demonstrating commit-signing.

EOF

# Add and commit the change, signing the commit.
git add information.md
git commit -S<secretKeyID> -m "Update to information.md"

# Look at the commit object
git cat-file -p master

# Create an annotated tag from the Rel2 branch, signing it.
git tag -s -u <secretKeyID> -m "Release 2." Rel2.0 Rel2
# Look at the tag object
git cat-file -p Rel2.0

# Verify the signed tag and the signed commit.
git tag -v Rel2.0
git log --show-signature -1

Slide: Git Objects blobs

# Generate the SHA1 of the contents of the ~/.bash_history file as
# though git would
cat ~/.bash_history | git hash-object --stdin

Slide: Git Objects trees

cd ${BASE_DIR}/devTeamDemo/javaBootcampNoEclipse.dev1
# Get the tree object for the latest version of the top-level of the
# project
git cat-file -p master | grep '^tree'
# Look at the contents of that tree object:
git cat-file -p $(git cat-file -p master | grep '^tree' | sed 's/^tree //')

Slide: Git Objects commits

cd ${BASE_DIR}/simpleRepo
git log --graph --all --decorate --oneline
git cat-file -p HEAD
git cat-file -p <anyOtherCommitId>

Slide: Git Objects tags

# Find all the tag objects
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize)' --batch-all-objects | grep tag
git cat-file -p <anyOfTheTagObjects>

Slide: "Content Addressable Filesystem"

# Find all the objects, and select one that refers to a file
git rev-list --all --objects
# Look at the contents of the selected object
git cat-file -p <selectedBlobObject>
# Use git to get the SHA1 of the contents of that object
git cat-file -p <selectedBlobObject> | git hash-object --stdin
# The name of the file for that object is based on the SHA1.
ls -l .git/objects/...

Slide: The reflog

# Look at the reflog, then clear it completely and look at it again.
git reflog
git reflog expire --expire=now --expire-unreachable=now --verbose --all
git reflog

Slide: fsck and gc

git fsck
git gc

Slide: Useful commands

# Go into the repo where the large file had been created
cd ${BASE_DIR}/largeFileRepo
# List all the objects in increasing order of object size.
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize)' --batch-all-objects | sed -n 's/^blob //p' | sort -n --key=2
# Look for the file name associated with an object
git rev-list --objects --all | grep <blobID>
# Look for the commits that made changes to a specific file
git log --follow -- "largeInformation.md"

Slide: Permanently removing a file from your git db

# Preserve information on the tags, as you may need this later.
for tag in $(git tag)
do
  echo "${tag},$(git log --format="%H,\"%cn\",\"%ci\",\"%s\"" ${tag} | head -1)"
done | tee /tmp/tag_list.csv

# Create a local branch for all the remote branches. This does nothing
# in this demo as there is no remote.
for branch in $(git branch -r | grep -v HEAD | sed 's/\ \ origin\///')
do
  git branch ${branch} origin/${branch}
done

# Check out each branch and determine the amount of space it uses
for branch in $(git branch | sed 's/^..//')
do
  git checkout -q ${branch}
  du -sk . | sed "s/\./${branch}/"
done | tee /tmp/branch_sizes.out

# Check out the Rel1 branch
git checkout Rel1
# Use git-filter-branch to remove the file from all the commits on the
# branch
git filter-branch --tree-filter 'rm -f largeInformation.md' --prune-empty HEAD
# Update refs (will fail for some branches)
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

# Do the same for all the other branches
for branch in Rel2 Rel3 Rel4 master
do
  git checkout ${branch}
  git filter-branch --tree-filter 'rm -f largeInformation.md' --prune-empty HEAD
  git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
done | tee /tmp/cleanup.out

# Clear out the reflog
git reflog expire --expire-unreachable=now --all
# Run the garbage collector on the repository
git gc --prune=now
# Run an FSCK on the repo
git fsck --unreachable --no-reflogs

# Check out each branch again and determine the amount of space it
# uses
for branch in $(git branch | sed 's/^..//')
do
  git checkout -q ${branch}
  du -sk . | sed "s/\./${branch}/"
done | tee /tmp/branch_sizes_post_process.out