GitDeepDive/GitDeepDive.org

12 KiB
Raw Blame History

Git Deep Dive

A Version 1 Technical Meetup talk covering the deep internals of git.

Introduction

This is the slide-deck and set-up scripts used to give the Git Deep Dive technical meet-up talk on [2018-06-25 Mon] in Version 1, by Éibhear Ó hAnluain.

The pack comes with the following:

  • This document
  • The slide-deck, GitDeepDive.pdf
  • A script to set up a simple git repository for exploration purposes: merge-setup.sh
  • A script to set up the mimic of a development team and processes: gitDemo.sh
  • A script to set up another simple repository to run through the process for completely clearing a file out of a git database: largeFile-setup.sh.
  • A .gitignore file.

Setup

  1. Run the script merge-setup.sh. This will create a git repository in simpleRepo and populate it with some commits. The commits in this repository will correspond to the diagrams in the slide-deck that relate to merging.
  2. Run the script largeFile-setup.sh. This will create another git repository in largeFileRepo and populate it with the same commits as in simpleRepo, and then puts in some more to create and manipulate a large binary file, and then to remove it. This will be used for the demonstration to remove a file completely from the repository.
  3. Run the script gitDemo.sh. This will replicate a previous git repository as a bare repo in devTeamDemo/javaBootcampNoEclipse.git, and then create a number of clones to represent the actions of a team lead and two developers, and mimic collaborative development among them. gitDemo.sh takes one optional parameter, -s, which will cause the progression to stop for 15 seconds following each "actor's" push in order to afford the opportunity to look at the git log.
  4. Use the gpg utility to generate a key-pair so that you can work through the commit- and tag-signing slide and sample commands. If you have access to gpg2 and not gpg, you'll need to set the following git config to make sure it's picked up: git config --global gpg.program gpg2.
  5. Open the file GitDeepDive.pdf, and as you go through the slides, refer to the Sample Commands section below for commands you can execute to get further insights.

Sample commands

Slide: Configuration

  # Go to where the GitDeepDive.pdf file is and set your BASE_DIR
  # environment variable.
  cd <where-you-have-this-repo-cloned-to>
  export BASE_DIR=$(pwd)
  # List your basic git config
  git config --list
  # Go into the simple repo location and look at the various config
  # contexts
  cd ${BASE_DIR}/simpleRepo
  git config --local --list
  git config --global --list
  git config --system --list
  # Set some settings
  git config --global user.name "<YourName>"
  git config --global user.email "<YourEmailAddress>"
  # Use a text editor to edit your config
  git config -e

Slide: fetch and merge, not pull

  # Go into the clone belonging to one of the developers in the
  # development team demo area
  cd ${BASE_DIR}/devTeamDemo/javaBootcampNoEclipse.dev1
  # Update the clone, but don't merge anything
  git fetch --prune
  # Review the local and remote branches.
  git branch -va

Slide: Merging approaches: fast-forward

  cd ${BASE_DIR}/simpleRepo
  # Check out the master branch and review it's log
  git checkout master
  git log --decorate --graph --oneline --all
  # Merge in the Rel1 branch and review the new log.
  git merge Rel1
  git log --decorate --graph --oneline --all

Slide: Merging approaches: merging strategies

  cd ${BASE_DIR}/simpleRepo
  # Check out the master branch, merge the Rel2 branch and review the
  # new log.
  git checkout master
  # You'll be prompted for a commit message here.
  git merge Rel2
  git log --decorate --graph --oneline --all

Slide: Merging approaches: Rebase

  # Remove and refresh the simpleRepo
  ${BASE_DIR}/merge-setup.sh
  cd ${BASE_DIR}/simpleRepo

  # Check out master, review its log, merge Rel1 and review the log
  git checkout master
  git log --decorate --graph --oneline --all
  git merge Rel1
  git log --decorate --graph --oneline --all

  # Check out Rel2
  git checkout Rel2
  # Rebase Rel2 onto the now-new master
  git rebase master
  # Review the log
  git log --decorate --graph --oneline --all

Slide: refs

  # Remove and refresh the simpleRepo
  ${BASE_DIR}/merge-setup.sh
  cd ${BASE_DIR}/simpleRepo

  # Look at the contents of .git/refs/heads and one of the heads itself
  ls -l .git/refs/heads/
  cat .git/refs/heads/Rel2

  # Tag a branch, look at the branch ref and the tag ref
  git tag Rel1.0 Rel1
  cat .git/refs/heads/Rel1
  cat .git/refs/tags/Rel1.0

  # Look at the HEAD ref
  cat .git/HEAD

  # Go to another clone and look at the refs for the origin remote
  cd ${BASE_DIR}/devTeamDemo/javaBootcampNoEclipse.dev1
  ls -l .git/refs/remotes/origin/
  # .git/packed-refs contains the refs that haven't been interacted with
  # yet.
  cat .git/packed-refs

Slide: Annotated tags

  cd ${BASE_DIR}/simpleRepo

  # Look at a "lightweight" tag
  cat .git/refs/tags/Rel1.0
  # What type of object is it pointing to?
  git cat-file -t $(cat .git/refs/tags/Rel1.0)
  # What's in the object it's pointing to
  git cat-file -p $(cat .git/refs/tags/Rel1.0)

  # Create an annotated tag and look at *it*
  git tag -a -m "Formal release of 1.0" Rel1.0.prod Rel1
  cat .git/refs/tags/Rel1.0.prod
  # What type of object is it pointing to?
  git cat-file -t $(cat .git/refs/tags/Rel1.0.prod)
  # What's in the object it's pointing to
  git cat-file -p $(cat .git/refs/tags/Rel1.0.prod)

Slide: blame

  # Look at the lines of the file on master
  git checkout master
  git blame information.md
  # .. and on Rel1
  git checkout Rel1
  git blame information.md
  # And on master after Rel2 has been merged in.
  git merge Rel2
  git blame information.md
  # Slide

Slide: Tag and commit signing

  # Look at your secret keys
  gpg --list-secret-keys

  # Check out master and make a change to information.md
  git checkout master
  cat <<EOF >> information.md
  An additional line for demonstrating commit-signing.

  EOF

  # Add and commit the change, signing the commit.
  git add information.md
  git commit -S<secretKeyID> -m "Update to information.md"

  # Look at the commit object
  git cat-file -p master

  # Create an annotated tag from the Rel2 branch, signing it.
  git tag -s -u <secretKeyID> -m "Release 2." Rel2.0 Rel2
  # Look at the tag object
  git cat-file -p Rel2.0

  # Verify the signed tag and the signed commit.
  git tag -v Rel2.0
  git log --show-signature -1

Slide: Git Objects blobs

  # Generate the SHA1 of the contents of the ~/.bash_history file as
  # though git would
  cat ~/.bash_history | git hash-object --stdin

Slide: Git Objects trees

  cd ${BASE_DIR}/devTeamDemo/javaBootcampNoEclipse.dev1
  # Get the tree object for the latest version of the top-level of the
  # project
  git cat-file -p master | grep '^tree'
  # Look at the contents of that tree object:
  git cat-file -p $(git cat-file -p master | grep '^tree' | sed 's/^tree //')

Slide: Git Objects commits

  cd ${BASE_DIR}/simpleRepo
  git log --graph --all --decorate --oneline
  git cat-file -p HEAD
  git cat-file -p <anyOtherCommitId>

Slide: Git Objects tags

  # Find all the tag objects
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize)' --batch-all-objects | grep tag
  git cat-file -p <anyOfTheTagObjects>

Slide: "Content Addressable Filesystem"

  # Find all the objects, and select one that refers to a file
  git rev-list --all --objects
  # Look at the contents of the selected object
  git cat-file -p <selectedBlobObject>
  # Use git to get the SHA1 of the contents of that object
  git cat-file -p <selectedBlobObject> | git hash-object --stdin
  # The name of the file for that object is based on the SHA1.
  ls -l .git/objects/...

Slide: The reflog

  # Look at the reflog, then clear it completely and look at it again.
  git reflog
  git reflog expire --expire=now --expire-unreachable=now --verbose --all
  git reflog

Slide: fsck and gc

  git fsck
  git gc

Slide: Useful commands

  # Go into the repo where the large file had been created
  cd ${BASE_DIR}/largeFileRepo
  # List all the objects in increasing order of object size.
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize)' --batch-all-objects | sed -n 's/^blob //p' | sort -n --key=2
  # Look for the file name associated with an object
  git rev-list --objects --all | grep <blobID>
  # Look for the commits that made changes to a specific file
  git log --follow -- "largeInformation.md"

Slide: Permanently removing a file from your git db [1/2]

  # Preserve information on the tags, as you may need this later.
  for tag in $(git tag)
  do
    echo "${tag},$(git log --format="%H,\"%cn\",\"%ci\",\"%s\"" ${tag} | head -1)"
  done | tee /tmp/tag_list.csv

  # Create a local branch for all the remote branches. This does nothing
  # in this demo as there is no remote.
  for branch in $(git branch -r | grep -v HEAD | sed 's/\ \ origin\///')
  do
    git branch ${branch} origin/${branch}
  done

  # Check out each branch and determine the amount of space it uses
  for branch in $(git branch | sed 's/^..//')
  do
    git checkout -q ${branch}
    du -sk . | sed "s/\./${branch}/"
  done | tee /tmp/branch_sizes.out

  # Check out the Rel1 branch
  git checkout Rel1
  # Use git-filter-branch to remove the file from all the commits on the
  # branch
  git filter-branch --tree-filter 'rm -f largeInformation.md' --prune-empty HEAD
  # Update refs (will fail for some branches)
  git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

  # Do the same for all the other branches
  for branch in Rel2 Rel3 Rel4 master
  do
    git checkout ${branch}
    git filter-branch --tree-filter 'rm -f largeInformation.md' --prune-empty HEAD
    git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
  done | tee /tmp/cleanup.out

  # Clear out the reflog
  git reflog expire --expire-unreachable=now --all
  # Run the garbage collector on the repository
  git gc --prune=now
  # Run an FSCK on the repo
  git fsck --unreachable --no-reflogs

  # Check out each branch again and determine the amount of space it
  # uses
  for branch in $(git branch | sed 's/^..//')
  do
    git checkout -q ${branch}
    du -sk . | sed "s/\./${branch}/"
  done | tee /tmp/branch_sizes_post_process.out