Understanding Git
Failed to execute the [groovy] macro. Cause: [The execution of the [groovy] script macro is not allowed in [sergiu:Presentation.PresentationSheet]. Check the rights of its last author or the parameters if it's rendered from another script.]. Click on this message for details.
Understanding git
DVCS basics
What is Version Control
- Source Control, Version Control, Revision Control, Source Configuration Management, Software Configuration Management, Software Change and Configuration Management...
- "Software Configuration Management (SCM) is the task of tracking and controlling changes in the software." [Wikipedia]
- "Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later." [Pro Git book]
- "Many people think of a version control system as a sort of time machine." [Version Control with Subversion book]
- "Distributed version control system (DVCS) keeps track of software revisions and allows many developers to work on a given project without necessarily being connected to a common network." [Wikipedia]
Evolution of version control
- Keeping multiple copies of the same file on the local disk
- Keeping multiple copies of the same file on a remote disk
- Keeping multiple copies of the same file on a shared remote server
- Hiding previous versions in a browsable history, keeping only the head visible by default (branches and HEADs)
Evolution of collaboration
- Sending papers back and forth
- Sending emails back and forth
- Storing files on a shared server
- Storing files in a Central Version Control System
- Now: Remove the Central aspect to get a Distributed Version Control System
The two main purposes of VCS
- Track changes over time
- Help people work collaboratively on the same project
- The Distributed aspect of DVCS tries to improve the second purpose
- but as a consequence also improves the first one by adding reliability through replication and cryptographic security to the codebase
- and also by allowing more parallelism, with no locks and fewer conflicts
Concepts (SVN)
Repository
- A database where the entire history of the project is stored
- Not necessarily in a human-readable format
- Existing versions can be read from the repository
- New versions can be added to the repository
Working Copy
- A local copy of a specific version from the repository
- Can be freely modified, but changes aren't automatically put in the repo
- Usually the latest version, but any previous version can be used
- a.k.a. Workspace, Working Tree
Checkout
- Copy a specific version from the repository into a local working copy
Commit
- Upload a new version based on the working copy to the repository
Update
- Re-fetch the latest version from the central server
Branches
- It is possible to keep parallel development histories
- For example, after releasing version 3.0 of the software, continue developing for the upcoming 4.0 version, but also maintain a 3.0.x branch for eventual critical bugfixes needed before 4.0 is ready
HEAD
- The most recent version on a specific branch
The D in DVCS
Repositories
- Instead of having just one central repository, everyone clones the entire repository locally
- A working copy is always right next to the local repository
- A checkout extracts files from the local repository
- No network transfer is involved
- A commit stores the new version in the local repository
- No network transfer is involved
Collaboration
- You can add as many remote repositories as needed to your local clone
- You can fetch versions from any of your registered remotes
- Fetch just adds new version into your local repository without changing the current working tree or your local branches
- Pull fetches and updates the local branches that track remote branches
- You can issue a pull request for someone else to fetch your changes and include them in their repository
- It is still possible to nominate an accessible repository as the central repository where committers can push directly
About Git
Git
- Distributed Version Control System
- Powerful branching
- Branches have a very low cost (tens of bytes)
- Versatile merging
- Efficient storage
- Data integrity assured by SHA-1 checksums of each file, tree or commit
- Many commands at high (porcelain) and low (plumbing) level
- Steep learning curve, but easy to master after the a-ha! moment
- Easy to make mistakes, easy to recover
- But hard to make irreparable mistakes
GitHub
- Git hosting with many enhancements
- Social interaction
- Organizations
- Following users and repositories
- Public Forks and Pull Requests
- Nice visualization for commits, versions, branches, forks
- Comments on commits
- Remote access APIs
- Also provides basic issue tracker, wiki, web hosting, download hosting
From Subversion to Git
Equivalent actions
Subversion | Git |
---|---|
checkout | clone |
update | fetch or pull |
status | status |
diff | diff |
commit | add + commit + push |
revert | checkout |
Git internals
The Object Database
Blobs and Objects
- Blob = A piece of data in the repository
- Object = blob + type + SHA-1 ID
- They have a SHA-1 validating their contents, thus objects are immutable; changing something means creating a new version of it
- Usually, the contents of a file
- This can be a symlink as well
- Trees, Commits, Tags
- git cat-file -p <SHA> shows the raw contents of an object's blob
- Object types: blob, tree, commit, tag
Blob (File)
- Object of type 'blob'
- Contains just the plain content of the file, be it text or binary
- For symlinks, contains the path to the linked file
- Does not contain a file name, or a file path, or a file mode
Tree (Directory)
- Object of type 'tree'
- Collection of other files and trees
- The blob contains a list of entries
- For each included entry, specifies:
- File mode
- File type (blob, link, tree)
- SHA-1 of the file or subtree object
- File name
- A file does not know its name
- A sub-tree does not know its parent
Tree example
100644 blob 523a688c25b20dee0a9e0a0ebd2eed65545423d5 .gitattributes
100644 blob 53f9e9b271e54f46617ee0f52cfc9c370e5b01fe .gitignore
100644 blob d0f1cacc70bccbcd587c8c5119cf11438eca4563 README.markdown
120000 blob ec54964de17e52fc4ee652d92ade47f35c9fb77d README.symlink
040000 tree f663f5a69a6ca941381dd1c527cd506163113fbd jetty-resources
040000 tree 40b20f4f91d1d85f5a523944522c8f44525ea5c1 ncbieutils-access-service
040000 tree 0c09a04658435dcd43e4cee149bef5ad170aff77 obo2solr
040000 tree 25065bef9f647ffb0ba1c6c571f1afa317082e78 patient-tools
040000 tree 869a3a8cb5863cd6cff802c05a9d4ad71b21dc05 patient-update-listeners
040000 tree 8cae1b0cd809c4a1133f8a3cd8505c3bff3fb229 phenotype-mapping-service
100644 blob 50deded834803b6e2530b9207c52e274177c4636 pom.xml
040000 tree 7a0bdf97ffa40f3decec211229aae3e96e70e73a solr-access-service
040000 tree 329b8aceee152c46199f13773f58667e45b03f78 solr-configuration
040000 tree 513db7cae3ff0a61a83115445adf7e0a5d7d12fd standalone-distribution
040000 tree b7562687ef03b5a1cf47fed14feffdf4ed8e62d8 standalone-ui
040000 tree 3970df2a29c988309410e44699d7d923fe10b560 wiki-database
040000 tree be08365fa5e665ac428c6b450b9622b496ebf222 wiki-distribution
040000 tree 3bd73e7f49c2828f412f0935bddfa1ce0e965ca0 wiki-ui
Files do not know their names
Files do not know their directory
Files do not know they're files!
Commit
- Object of type 'commit'
- A link to a tree, with associated metadata
- Parent commit(s)
- Author
- Committers (can be more than one when retouching commits)
- Dates (for each commit)
- Commit message
- The diff between two versions (a "classical" SVN commit) is actually done by recursively comparing the trees of the two commit objects
Commit examples
tree ee5c210558e3ce9b8fc46568878562e76c5e5c98
parent 30303a2408f7911842e4cc5646f9541097c96fbe
author Sergiu Dumitriu <sergiu@xwiki.org> 1353364745 -0500
committer Sergiu Dumitriu <sergiu@xwiki.org> 1353364745 -0500
Issue #30: Autosave patient sheets
Done.
tree e4bace937c9b270874989d4e1e780ebdc31af64b
parent af7d5a6914693a73081fe0a220bc6cfb543f3ebf
parent 2076271fdd57dee7c66a6238e5d8d53c2ddfe8e5
author Sergiu Dumitriu <sergiu@xwiki.com> 1351284534 -0400
committer Sergiu Dumitriu <sergiu@xwiki.com> 1351284534 -0400
Merge branch 'new-pheno-displayer'
Tag
- Object of type 'tag'
- A link to a commit object, with additional metadata
- Tag name
- Tag author
- Optional strong signature (PGP) to certify the author
- This is an internal, invisible tag object; an actual tag, explained later, is a reference to this tag object
Tag example
object 31024788158cc45879bf15832fa38c4834d434df
type commit
tag xwiki-platform-3.5.1
tagger Sergiu Dumitriu <sergiu@xwiki.com> Sun Apr 29 22:04:28 2012 +0200
Tagging xwiki-platform-3.5.1
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iQEcBAABAgAGBQJPnZ7MAAoJEDXWcnP59QNwmkQH/3763+OGMYE/VUfw5CP//elr
obmCSTYKWlVPf6GGn3kH6IQPD/iJGJLtoBziCU0uVBYPnOpRaKu/fo2jdi5LXR9/
nK6aosyoIwN7oqE+tDBwahujovXcIu93Dkf5vFtsrbSqg43EnB0dm54wWc26LR57
Q6Wp3s/hU6gdl7cHdoNDfGAbziTE6+e61WjkxuR18xdKwNN88G+aYYJAtEy21qaO
aMjyZMMSCTlj1KFSNQpOm5tu/49dzs8TQdZqbf5ykuOJ1WbJUgVirRzvmyisjzvC
HjjnUMkanH7BEZfI4nn9scZkiDTr+XwwoH6VzN81Ea8gu8LNy/fGBhK30KHBgMc=
=xnTk
-----END PGP SIGNATURE-----
Merge Commits
- Object of type 'commit' with more than one parent
- Reunites two (or more) parent branches
- Combines the two (or more) parent trees into a unified tree
- Whenever two or more branches diverge from a common parent, bringing them together requires a merge
- ...or a rebase of one of the branches
History example
| Author: Marta Girdea <marta.girdea@gmail.com>
|
| Issue #62: Sorting not supported by current implementation of the solr access service component
|
* commit 0a066433ce089f7ec8566f01a756d9aa227158f3
|\ Merge: 000ad9b 8816daf
| | Author: Marta Girdea <marta.girdea@gmail.com>
| |
| | Merge branch 'master' of github.com:marta-/cidb
| |
| * commit 8816daf88703189d6e88a5c1f949c01e2ccb1e9a
| | Author: Marta Girdea <marta.girdea@gmail.com>
| |
| | [misc] Fixed translation
| |
* | commit 369894b2ca95eba2660cb9c0d65ae8361951459b
| | Author: Marta Girdea <marta.girdea@gmail.com>
| |
| | [misc] Some disabled fields still show up in the form
| |
* | commit 0ffc9fd56fb8397b4c48265733eed10d13a4fc92
|/ Author: Marta Girdea <marta.girdea@gmail.com>
|
| Issue #61: Delete button fails on homepage
|
All the objects in a git repository form a Directed Acyclic Graph
Rebasing
- Rebase == rewrite a diverging branch of commits so that they appear in line after the “official” branch
- New blobs, trees and new commit objects are created!
- Useful when trying to minimize the number of merge commits
- the parallel nature of the rebased commits is not of importance
- Available as an option for commands that combine branches: merge, pull
- Also a standalone command for rewriting history
git rebase --interactive <older commit>
Rebase versus Merge
Assuming the following commit structure:
/
A---B---C---D---E upstream master
After git pull:
/ /
A---B---C---D---E
After git pull --rebase:
/
A---B---C---D---E
A git repository is a database of objects
The full repository
Repository contents
- The object database
- Repository configuration
- config, info/*, description, hooks/*
- References
- refs/* and packed-refs
- The current index
- The Working Tree, or the local checkout
- It is possible to have a bare repository, without the index and the checkout; this is for server repositories
- A stash, a list of saved patches, not part of the history
References (Heads)
- A link pointing to an existing object
- Just a reference name and the SHA-1 of the target object
- References are not objects!
- Files located in .git/refs/*
- Packed references inside .git/packed-refs
- References can be:
- Local and Remote branches
- Tags (tag = reference to a tag object)
- Stash
- HEAD (the branch or commit that is checked out)
Example references
# pack-refs with: peeled
cfce87925d25ad5cb33d5361bcdbbbdca517ded4 refs/heads/master
161cb5ebafa50395a442c3a1f96501f2607ac75b refs/heads/swizzle-upgrade
8177d7b5152171d7b892682a4023bf9de25f5f61 refs/remotes/origin/feature-solr-search
cfce87925d25ad5cb33d5361bcdbbbdca517ded4 refs/remotes/origin/master
2b232b899fb32c3f7d5be1f020c738cadb9596af refs/remotes/origin/stable-4.1.x
518fecb89bae8d1334341263c2f5bc049636959d refs/remotes/origin/stable-4.2.x
ae7de6e2993ffaffecec56415b76c64243fa1fa1 refs/remotes/origin/stable-4.3.x
584f14233004eae260d1bd3b8711f7f8051921b8 refs/stash
09a740e9c6e4183365130d58b0cd7017f7eace35 refs/tags/xwiki-platform-4.3-milestone-2
^7e8a6514d4e46260ad30551d0197db9ecbe581ab
3398ccdc8639aeb7c441b9dbdb7de417be1b7edd refs/tags/xwiki-platform-4.3-rc-1
^62337c8d3da2864ab147aaacdbc998534ebcff6f
Branches
- A branch is just a reference to a commit object
- A post-it, a label outside the object database
- It doesn't have to point to a commit with no descendants
- Creating a branch == creating a reference
- Deleting a branch == deleting a reference
- All the commits are still kept in the object database, until an explicit pruning is performed
- Local branches can track a remote branch
- Pulling will also try to update local tracking branches
- Pushing will try to update the remote tracked branch
HEAD
- A special type of reference: a symbolic reference
- Unlike branches, the HEAD usually points to another reference, and not directly to a commit object
- must be a local branch
- Checking out a branch means updating the HEAD reference to point to the branch reference
- Committing a new revision means that the reference pointed to by HEAD will be updated as well
- Checking out a commit will cause the HEAD to point directly to the commit object, which means that a new commit will not update any branch; this is called a detached HEAD
HEAD examples
ref: refs/heads/master
$ git checkout feature-docnaming
Switched to branch 'feature-docnaming'
$ cat .git/HEAD
ref: refs/heads/feature-docnaming
$ git checkout xwiki-manager-4.2
Note: checking out 'xwiki-manager-4.2'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
[...]
HEAD is now at 0784c23... [maven-release-plugin] prepare release xwiki-manager-4.2
$ cat .git/HEAD
0784c2334be743ecc64d67f7c7b22f64fcb22044
Database, Index, Working Tree
- The object database contains all the history of the commits, but doesn't have an explicit HEAD
- It's just a DAG of objects, all equally important
- References identify certain nodes as the head of a branch, or as a tagged state
- The Working Tree is a local checkout of a given tree from the database, on which the user can work
- The Index is a buffer between the DB and the workspace, structured as a git tree ready for commit, and a reference against which to compare the working tree
- Also called staging tree, or commit cache
The stash
- A separate list of changes to be saved for later
- Work in progress not ready for a commit, but useful enough to be kept
- Stashing the current state saves both the index and the working tree, hard-resetting to the HEAD
- Any number of stash entries can be saved
- A stashed change can be popped at a later time
- Onto any index with a clean working tree
Code examples
Working with the index
# - update the HEAD reference
# - copy the tree from the DB into the index
# - extract its files (blobs) into the workspace
$ git checkout [<SHA-1> | <ref>]
# Add changed files from the workspace to the staging tree
$ git add [<files> | --all | --interactive]
# Remove blobs from the index and the workspace
$ git rm <files>
# Move blobs to another subtree, index+workspace
# Also for renaming files/directories
$ git mv <original location> <new location>
Git can automatically detect a file move/rename, even if not explicitly specified with a mv operation
Working with the index: checking differences
$ git diff [<files>]
# See changes between the index and the DB (HEAD)
$ git diff --cached
# See changes from another version, comparing the
# tree from the DB with the working tree
$ git diff <ref>
# See changes between any two trees in the DB
$ git diff <ref>..<ref>
Working with the index: dropping changes
# Doesn't touch the working tree
$ git reset [<ref>] [<files>]
# Reset the working tree from the index, re-copying files into the working tree
# Doesn't touch the index
$ git checkout -- <files>
# Reset both the index and the working tree:
# - copy blobs from the database to the Index
# - extract blobs into workspace
# When currently on a branch, looks like "forgetting"/discarding commits
# by going back to another revision and moving the branch head to it
$ git reset --hard <ref>
# Still, untracked files are not discarded by any of these commands;
# to get a really clean state with no extra changes, use:
$ git clean -dxf
Working with the index: commits
$ git status [-s -u]
# Commit the index:
# - copy the staging tree into the DB
# - create a new commit object in the DB
# - update the HEAD or the current branch referefenced by it
# It's the index that gets committed, not the working tree;
# you must 'git add' the changes you want to commit first
$ git commit
# Re-unite with another branch by:
# - merging the staging tree with another tree
# - creating a commit with two parents
$ git merge <ref>
# Clone a "commit" into the current branch
$ git cherry-pick [-x] <ref>
Working with the stash
$ git stash
$ git stash save "a nice name for this changeset"
# Show the contents of the stash
$ git stash list
# Show a particular stash entry
$ git stash show [-p] [stash@{N}]
# Drop a stashed entry, permanently forgetting it
$ git stash drop [stash@{N}]
# Apply a stashed changeset onto the working tree, keeping it in the stash
$ git stash apply [stash@{N}]
# Apply a changeset and drop the entry from the stash
$ git stash pop [stash@{N}]
Working with branches
# This creates a detached HEAD, not on a branch
$ git checkout <ref>
# Checkout an existing branch
$ git checkout <branch name>
# Create a new branch from the current HEAD; does not switch to the new branch
$ git branch <name>
# Checkout a commit and create a branch from it
$ git checkout -b <branch name> <ref>
# List existing local branches
$ git branch
# Delete a branch; only deletes the reference, commits will remain in the DB
$ git branch -d <name>
Working with tags
$ git checkout <tag name>
# Tag the current HEAD
$ git tag <name>
# Create a signed tag; requires setting up GPG
$ git tag -s <name>
# Show existing tag names
$ git tag
# Delete a tag
$ git tag -d <name>
Working with the object database: reading objects
$ git show <SHA-1>
# Show the raw contents of a blob
$ git cat-file -p <SHA-1>
Working with the object database: browsing and searching for commits
$ git log [--graph]
# Show commits with a given message
$ git log --grep=illumina
# Show commits that introduced a given text
$ git log '-Stext to search for'
# Show commits from a certain author
$ git log --author=Sergiu
# Show commits (with diff) from a certain date
$ git log -p '--since=three days ago'
# Show the log on a specific file, following renames
$ git log --follow -- <file>
# Show the repository in a nice GUI
# Some also allow to [un]stage and commit
$ gitx | gitk | gitg | gitview | git gui
Working with the object database: file history
$ git blame <file>
# Blame the file as it was at a given revision
$ git blame <ref> <file>
# Show the contents of a file at a given revision
$ git show <ref>:<file>
prune and fsck
- References point to certain nodes in the DAG
- Most objects are transitively reachable from one of the references
- Unreachable objects are "invisible" if their ID is not known
- git prune can remove these objects
- git reflog can list recent revisions, reachable or unreachable
- git fsck can list these objects
- and restore unreferenced tag objects
Collaboration: remotes
A remote repository is
a source of new objects
Remotes
- Clones of the same git repository located in another place
- Another directory on the same machine
- A repository on another machine accessible via ssh
- An online repository accessible via HTTP or the special git communication protocol
- Or a foreign repository of another type (SVN, CVS...)
- == Related object databases located in other places
- A repository can have as many remotes as it wants
- origin is considered the main remote by convention
clone
- Creates a new local clone of a repositorygit clone git@github.com:compbio-UofT/shrimp.git (for committers)
git clone git://github.com/compbio-UofT/shrimp.git (read-only)
- Creates a new git repository
- Clones some or all of the references from the origin repository
- Fetches the objects reachable from the cloned references from the remote repository into the local database
- Configures the remote as the origin repository
- Sets the master local branch to the remote head (ORIGIN_HEAD)
- Checks out the master branch into the working tree
fetch and pull
- Fetches new values for the remote references
- Also fetches new remote references (new remote branches, new tags)
- unless only certain refs are specified
- Fetches new objects reachable from the updated references
- Does not update the local branches
- git pull also updates the currently checked out branch, if tracking a remote branch, to point to the new remote branch head
- HEAD copies ORIGIN_HEAD
- If the local and remote branch diverged, a merge commit will be created
- unless --rebase is specified
Working with fetch and pull
$ git fetch
# Fetch from <remote>
$ git fetch <remote>
# Also fetch all new tags, even if not on followed branches
$ git fetch --tags
# Remove references to remote branches that no longer exist
$ git fetch --prune
# Only fetch the specified remote branch
$ git fetch origin stable-4.3.x
# Rebase local changes instead of merging
$ git pull --rebase
push
- Updates remote refs using local refs, while sending objects necessary to complete the given refs
- By default, tries to update all remote branches to point to the new heads of the local branches with the same name
- Optionally pushes tags
- Can also be used to delete remote references
- Note that push fails if there are new commits on the remote; pull first to resolve the conflict
Working with push
$ git push
# Push to <remote>
$ git push <remote>
# Also push all tags
$ git push --tags
# Push only a branch
$ git push origin <tracked remote branch name>
# Push a local branch to a new remote branch
$ git push origin <local branch name>:<remote branch name>
# Delete a remote branch
$ git push origin :<remote branch name>
Resolving conflicts
Conflicts
- When merging different branches of the DAG, conflicts may occur
- Also when applying stashed changes and cherry-picking
- Git is pretty good at automatically solving conflicts, but overlapping changes can't be automatically merged
- When a conflict occurs, git puts the working tree in a conflict state and waits for the user to solve the conflict
Resolving a conflict
- git status shows the current state: which files are dirty, which are prepared for commit
- The conflicting files have both changes inside them, marked with <<<< ==== >>>>
- After choosing one of the two variants or combining them, use git add to mark a file as resolved and prepare it for commit
- When all the files are resolved, git commit
- Usually git shows a command line to execute
Resolving a rebase conflict
- When the conflict appears during a rebase, git stores this information and allows to continue the rebase
- As before, resolve all conflicts and add files to the index, then:
- git rebase --continue
- git rebase --abort stops the process
- git rebase --skip skips the current commit and continues with the next commit to rebase
When something is wrong,
analyze the DAG and
try to figure out where you are,
and where you want to get
Debugging tips
- Look at the graphical log and try to see what happens
- git log --graph --decorate
- git reflog shows recent states, even if no longer reachable from references
- Compare local references with the remotes
- If all else fails, hard reset to a previous working state and continue from there, even if some changes will be lost
- You can stash them before resetting to preserve some data
- Don't forget to check on what branch you are before committing
- When panicking, ask for help from someone else