Carnegie Mellon
Computer Science Department |
|
|
|
|
|
|
|
|
|
|
15-410 Git Quickstart
This document is a work in progress. It may not be complete.
To the best of our knowledge, the information that is here is correct. If
you have issues following the instructions in this document, or you have
suggestions to make this document clearer, please send e-mail to .
To the end of more facile development of your projects, we've written
this quick-start guide for using a modern and popular system for source control: the Git version control system. This document
will serve first as a user's reference, second as an explanation
of concepts (although you need not understand all of the concepts to use
Git), and third as evangelism for Git and other distributed version
control systems (although you need not drink my kool-aid to use Git). In
theory, each part should stand alone; you need not know of the concepts to
use the reference, and you need not know of the reference to be evangelized
to. In practice, you may find it useful to read all three parts to
get a deeper understanding of what Git is doing while you aren't looking.
Should you use Git, or something simpler?
On the one hand, other things might be simpler and faster to learn right now.
On the other hand, time spent
learning Git will pay off if you join a project that already uses Git.
Because there are so many revision-control systems currently in use,
there is no guarantee you won't have to learn something else,
but Git is among the more popular systems, so it's a plausible investment.
Quick-start
Obtaining/installing Git
On Andrew UNIX, Git is available for you in /usr/bin.
On non-Andrew Linux systems, Git is typically installed through the distribution's package manager. For Ubuntu, Git is installed by the command sudo apt-get install git-core or sudo apt-get install git depending on release; for Fedora, Git is installed by the command su -c "yum install git". For other distributions, refer to system documentation.
On other systems, or if no Git package is available, the latest version of Git can be obtained
from the official download site.
It is buildable with the traditional ./configure && make
&& sudo make install procedure.
Telling Git about you
Getting your project set up
Traditionally, git's default branch was called "master".
Some people prefer to use a different name for the main branch,
e.g., "main" or "mainline".
If you prefer the traditional name, "master",
skip the commands below marked with "## main".
If you would prefer a name other than "master" or "main",
change the two commands marked "## main" to use your preferred
name instead of "main".
############################################################
# ONE PARTNER executes these commands exactly once
############################################################
$ cd ~/410/usr/$USER/mygroup/REPOSITORY
$ git init --bare p2
$ cd p2 && git symbolic-ref HEAD refs/heads/main ## main
############################################################
# BOTH YOU AND YOUR PARTNER do these
############################################################
$ cd ~/410/usr/$USER/scratch
$ git clone file://$HOME/410/usr/$USER/mygroup/REPOSITORY/p2
$ cd p2
############################################################
# ONE OF YOU adds the .gitignore via your personal repository
############################################################
$ cd ~/410/usr/$USER/scratch/p2
$ git checkout -b main ## main
$ cp ~/410/pub/gitignore .gitignore
$ git add .gitignore
$ git commit -m "Initial commit"
$ git push origin `git symbolic-ref --short HEAD` -u
############################################################
# BOTH YOU AND YOUR PARTNER do these
############################################################
$ git pull
$ git checkout main ## main
-
Creating the repository - You will create your repository
in your 410-provided REPOSITORY directory. It should be named something
sensible, like "p2" for p2. The command git init --bare p2
creates the directory, and makes it ready as a remote. You and your partner
will push and pull from this repository.
-
Cloning the repository - Now that the remote is set up,
you can clone it into a local work directory for yourself via the
git clone command. You can ignore the warning about cloning
an empty repository because you haven't had any commits yet.
The special "file://" syntax (instead of just specifying the pathname)
tells git to actually copy all the files instead of wasting time
trying to make hard linkes between AFS volumes (which doesn't work).
Actually, unless you are working with large projects, doing a full
copy gives you added protection against disk errors or accidental
repository corruption, so this is probably a good habit for you to
pick up even when AFS isn't part of the picture.
-
Making the first commit - After the first commit, the
warning about the repository being empty will go away. This is done by
adding a file, such as the .gitignore file, and then committing and
pushing the file to the remote. Normally, you can type just git
push, but the first time you push you need to specify which branch you
will be pushing to on the remote via git push origin master -u.
-
Cloning when not on AFS - If you or your partner is not
working on AFS, then the clone command won't work, because it will be
trying to clone a path which doesn't exist. You can change the URL,
however to look like you@unix.andrew.cmu.edu:~you/410/... and
then git will push/pull files over SSH.
It is highly recommended that you figure out the real path where your
repository is stored (without any symbolic links) and clone from that path
instead.
Working with Git on a day-to-day basis
These operations will become your new best friends. You will use them many
times per day; it will pay off to become familiar with their operation and
their quirks.
To record changes to every file that Git is tracking, run
git commit -a.
This is the easiest way to commit a bunch of changes,
though arguably it is the method you should use the least.
Git will bring up your editor
to prompt you
for a commit message
(if it brings up the
wrong editor, set your EDITOR environment variable by adding a line
like "export EDITOR=joe" to your bashrc).
To make the best use of some of Git's other features,
you should endeavor to make changes in an order that will make sure that
your project still compiles and runs when you commit. The commit operation
will only record
the change in your local repository, and not yet make your changes visible
to your partner; see the section on pushing and pulling later.
This command will not add new files to Git! If you add a new file
to your repository and use git commit -a without first running
git add, your partner will be very sad when you go to sleep, they
wake up to work, and the file is not there (and they will probably call you
and wake you up).
If you wish to add a short message to your commit on the command line,
you can do so with the -m "message" option.
Before you begin committing, you might wish to read our guidance below about
what makes a good commit!
To record changes to selected entire files that Git is
tracking, run git commit file1 file2
..., where file1... are files that you'd like to record
changes to, and file2... are optional.
-
To record changes to a few parts of some of your changed
files (i.e., patch chunks), you have a few options.
- git commit --interactive. This
will bring up an interactive prompt that will allow you to choose what you'd
like to commit; to get started, try typing "status" at
the "What now?>" prompt to see what's changed, and
then "patch" to interactively make choices. (Resist the
urge to shout "whatnow?!" whenever you see the
prompt.)
- git add -i. This is like commit --interactive, but
it doesn't create any commits. This command just tells git that on the
next commit, include these changes.
- git add -p. This command allows you to add "patches",
meaning small changes made to one or more files.
This makes sense if you changed part of f1.c and part of f2.c to
fix one bug, then changed different parts of f1.c and f2.c to fix
a different bug, and now want to make one commit that fixes the first
bug and a different commit that fixes the second bug (this is a good
thing to want to do).
If you run git add -p once and select the changes to f1.c
and f2.c that fix the first bug, and then git commit,
and then do both steps again, you will end up with two single-purpose
commits.
When you run git add -p you will
be asked, for each chunk of diff in every file, whether it
should be added or not. Some useful commands are: s - split
the current diff up into smaller diffs to accept/reject or y/n -
yes or no to add the current diff.
To add a new file to Git, run git add file1 file2
..., where file1... are files that you'd like to add, and
file2... are optional. Then, to record the newly added file, run
git commit -m "message". If you do this on a file
that already exists, Git will record the changed state at the time you
ran the add.
To delete a file from Git, run git rm file1 file2
..., where file1... are files that you'd like to delete, and
file2... are optional. Then, to record the newly deleted file, run
git commit -m "message". The delete is not
permanent; you can still check out older revisions with that file
intact.
To rename or move a file in Git, run git mv oldname
newname, where oldname and newname are the
obvious. (git mv also has similar semantics to the UNIX command
mv; this is just the most common usage.) To record the moved files,
run git commit.
To see what changes you've made to files that Git is tracking,
run git status to get a list of changed files (and at the same
time, a list of new files and a list of deleted files). To look at the
specific changes, run git diff. Git will produce
diff-formatted output about all of the current unrecorded changes
in your repository.
To make your changes visible to your partner, run git
push. This will "push" your changes into the bare
repository. If your repository is not already up to date with the bare
repository, then git push will fail with a message like remote
is not a strict subset of local, or comments about things not being
"fast-forward"s. Resist the urge to use the -f
option! You'll be sad if you do that. See the section on getting your
partner's changes.
To get your partner's changes, run git pull. This
will "pull" changes from the bare repository into your local
repository. If you and your partner have changed the same sections of the
same file in non-trivial ways that Git could not resolve, then the
pull will leave your repository in a "conflicted" state
with a message beginning with CONFLICT:. Edit the conflicted files
to resolve the conflicts, make sure your project builds, and then run
something like git commit -m "Fix merge conflict" to
record the fix. Your fix will not be visible to your partner until you
push.
It behooves you to try to make "good" commits, both for yourself
and for your partner. To see what a
"good" commit looks like, we should probably first look at what a
"bad" commit is. Here's a transcript of one of your TAs making
quite a few mistakes, all in one short command:
joshua@escape:~/school/15-410-ta/p3-s09p4$ git commit -a -m "Whee!"
[master 6946f54] Whee!
2 files changed, 2 insertions(+), 2 deletions(-)
joshua@escape:~/school/15-410-ta/p3-s09p4$
What's so wrong about this? Well, the most obvious is the message; the
message "Whee!" conveys absolutely no information to your
TA's partner (well, maybe it tells my partner that I was excited about this,
but not much more than that). But there are more substantial issues here.
Let's go back and do this again and see what your TA missed.
joshua@escape:~/school/15-410-ta/p3-s09p4$ git status
# On branch master
# Changed but not updated:
# (use "git add ..." to update what will be committed)
# (use "git checkout -- ..." to discard changes in working
# directory)
#
# modified: kern/mutex.c
# modified: user/progs/vm_explode.c
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# user/progs/mytest.c
no changes added to commit (use "git add" and/or "git commit -a")
joshua@escape:~/school/15-410-ta/p3-s09p4$ git diff
diff --git a/kern/mutex.c b/kern/mutex.c
index 4a13af1..55f4569 100644
--- a/kern/mutex.c
+++ b/kern/mutex.c
@@ -39,7 +39,7 @@ void mutex_init(mutex_t *mp)
void mutex_lock(mutex_t *mp)
{
- make_mutexes_work();
+ make_mutexes_not_work(); // XXX changed briefly to test my demo program
mutex_level++;
diff --git a/user/progs/vm_explode.c b/user/progs/vm_explode.c
index ee8c2b9..94c8c94 100644
--- a/user/progs/vm_explode.c
+++ b/user/progs/vm_explode.c
@@ -72,7 +72,7 @@ int main() {
vanish();
}
}
- printf("parent: all balls accounted for!\n");
+ printf("parent: all children accounted for!\n");
set_status(0);
vanish();
}
joshua@escape:~/school/15-410-ta/p3-s09p4$ git add user/progs/vm_explode.c user/progs/mytest.c
joshua@escape:~/school/15-410-ta/p3-s09p4$ git commit
in your TA's editor...
Modify vm_explode to more accurately describe what it's doing instead of punning
on the P2 test 'juggle', and create a spinoff, mytest.
mytest makes sure that the frubulator is frobbed in the mutexes; you can
make it fail by commenting out the call to make_mutexes_work() in kern/mutex.c.
But make sure not to commit that change! Otherwise we'll both be sad.
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Committer: Joshua Wise <joshua@escape.joshuawise.com>
#
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: user/progs/mytest.c
# modified: user/progs/vm_explode.c
#
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: kern/mutex.c
and back at the shell...
".git/COMMIT_EDITMSG" 30L, 1112C written
[master 6ef906e] Modify vm_explode to more accurately describe what it's doing instead
of punning on the P2 test 'juggle', and create a spinoff, mytest.
1 files changed, 1 insertions(+), 1 deletions(-)
create mode 100644 user/progs/mytest.c
joshua@escape:~/school/15-410-ta/p3-s09p4$
Much better! This time, your TA checked to see what he was changing
before he committed it, added only the files he wanted to commit, and then
wrote a descriptive commit message so that his partner could test this for
himself. Importantly, your TA did not commit the change that would
break his kernel's mutexes while doing this, and hence did not get strangled
in his sleep by his partner.
Strive to emulate this workflow. You may find that you don't need quite
such verbose messages, and git commit -m will work fine for you.
That's OK; but try to make your commit messages at least somewhat
useful.
Time travel with Git
In an ideal world, we would make no errors while writing code. Sadly,
sometimes we wish to travel back to the past and determine what broke. It
is generally considered inadvisable to modify history; if you do, you run
the risk of killing one or more of your parents, and being in a paradoxical
state of existance. If you wish to modify history, you might wish to create
an alternate universe; in Git, we call these alternate universes
"branches". Luckily, branches aren't needed to just go back and
look. You may use these commands somewhat less frequently, but they are no
less important.
To get a graphical view of your repository's history, run
gitk. Marvel at all the pretty colors! Each time in the past that
you or your partner recorded changes using the commit command will
be represented by a line in the pane at the top. Select a line, and more
details about the change will show up in the bottom panes, including the
change's SHA1 ID.
To get a non-graphical list of all changes in a file's
history, run git log file, where file is the file
that you wish to get a change list for. Each change will start with a line
starting with commit, and ending in the change's SHA1 ID.
Some short information about the change will be given to you, including the
message that you specified with the -m option to git
commit.
To go back and view the repository's state as it was after a
change in the past, run git checkout sha1, where
sha1 consists of enough characters from the change's SHA1 ID to
disambiguate it from all other changes. For instance, if the change you're
looking for has ID 311b98a0a1c40ad176103ee8026131fcd0fcc919, then
you may only need to run git checkout 311b98 to get the change
you're looking for.
Do not make any changes when you are viewing the past like this. If you
wish to make changes from the past, use a branch. If you have outstanding
changes that you have not run git commit on when you attempt to
switch to viewing an old version of the repository, Git will give you an
error message like error: You have local changes..., and will
refuse to change what version you are viewing.
To view the most recent change in the repository (i.e.,
recover from viewing a change in the past), run git checkout
master. Any changes that you may have committed from viewing the past
will be lost into the abyss (they are not irrecoverable, but doing so is
beyond the scope of this document).
To revert one or more files to the state in which they were
after you last committed changes or ran a checkout, run
git checkout file1 file2 ..., where file1... are
files that you'd like to revert changes to, and file2... are
optional.
To temporarily save what you're working on to do something
else, you can use the git stash command. When you run git
stash, then anything that you haven't committed gets saved as a
diff onto a "stash stack", and the repository takes on the
appearance of the last commit.
To restore something that you have stashed, run
git stash pop. This will apply the diff on the top of the stack to
whatever the current state of the repository may be. For more info on
git stash, run git stash --help; you may find it to be an
extremely useful tool!
Splitting reality with Git
At some point, you may wish that you could make a change on a previous
version of your tree without affecting the current version (yet); or you
may wish to split reality in half, and work on an experimental side-project
without disrupting main development of your project. Branches in Git are
designed to allow you to do just those things; split away from the main view
of reality from some point in time (be that time now or the past).
To create a new branch from some point in the past, run
git checkout -b branchname sha1, where
branchname is what you want your new branch to be called (pick
something descriptive and without spaces; Experimental_COW might be
a good name if you're experimenting with copy-on-write), and sha1 is
the SHA1 ID of the change that you wish to branch from (see the section
go back and view above). Git will change you over to that branch,
and you can begin recording changes on it immediately.
To change to a different branch, run git checkout
branchname. The branch name that you started on was
master; so to return to the version that's in the bare repostiory,
run git checkout master. (The astute reader will note that this is
the same command as to recover from being in the past.)
To merge from one branch to another branch, first change to
the branch that you want to merge to, then run git pull .
branchname, where branchname is the branch that you want
to merge from. This can be done as many times as you like; there are
no negative consequences from merging repeatedly. (Git considers your other
branch as a 'virtual partner' to pull from.) To publish the pulled and
merged changes from your branch to your partner, you can just run a git
push when you are on the master branch, as normal.
To create a branch based on the current state of your
repository, run git checkout -b branchname. The
semantics are similar to creating a branch from some point in the
past.
To create a tag, run git tag tagname. By
convention, tag names are capitalized, but this is not enforced by Git. A
tag name can be used anywhere a SHA1 ID would otherwise be used; to go back
to the point at which you first got your shell running in Project 3, then
you might run git checkout SHELL_RUNNING. The usual rules apply if
you don't create a branch there; namely, recording changes would be a bad
idea unless you proceed to create a branch.
To rewrite history to clean it up, stop! You might not
want to do this. You might have heard someone talk about using git
rebase to "clean up" history of branches, and you might have
heard someone say that "all git gurus know about
rebase!". rebase has its uses, to be sure, but it's
worth doing a lot of research before using it. To that end, here's the documentation
from the Git book; here's
an article arguing that rebase should never be used; here's
a more balanced article.
The choice is yours; rebase is a very powerful tool, but it is also
capable of making a pretty substantial mess.
Don't lose your data!
Git is meant to track versions of files, but that doesn't mean that you
can't lose data when working with git. There are multiple kinds of data
that might get lost if something goes wrong with git. For more information
about what data you might lose, see
How to use git to lose data.
You can protect yourself against some git-related data loss by
adding these settings to the config file of your shared central
repository (e.g., ~/410/$USER/mygroup/REPOSITORY/p2/config).
[receive]
fsckObjects = true
denyDeletes = true
denyNonFastForwards = true
[gc]
reflogExpire = never
reflogExpireUnreachable = never
pruneExpire = never
rerereresolved = never
rerereunresolved = never
[core]
logAllRefUpdates = true
You can do this by editing the config file directly, or
by using these commands:
############################################################
# One person does these, once.
############################################################
$ cd ~/410/$USER/mygroup/REPOSITORY/p2
$ git config receive.fsckObjects true
$ git config receive.denyDeletes true
$ git config receive.denyNonFastforwards true
$ git config gc.reflogExpire never
$ git config gc.reflogExpireUnreachable never
$ git config gc.pruneExpire never
$ git config gc.rerereresolved never
$ git config gc.rerereunresolved never
$ git config core.logAllRefUpdates true
You may also wish to apply some or all of these settings to your
personal repository, though this is less important because in theory
you are frequently pushing your work to a well-configured central
repository.
Special last-minute warning!
Git is a powerful system which includes the ability for other
people to do many
things that you, personally, should not do.
One particular thing you should not do is experiment
with certain dangerous, exciting, or fancy commands frantically in
the last few hours before an assignment is due.
Unless you already are completely expert in what these
commands do,
the last day is the wrong time to find out.
These commands include:
- git reset
- git revert
- git rebase
"The last minute" is also not a good time to use any
--hard or --force parameters.
Basically each one is designed to throw away data in
some situation, and you are now in a situation in which
you want to avoid throwing away data!
What should you do if you are in turmoil?
- First, store a copy of your personal repository somewhere safe,
e.g., a tarball. Do not skip this step.
- If your repository is not
in a clean state, commit. Do not skip this step.
- If your personal work is not yet pushed to your group's
central repository, push it.
If you think you can't, but you carefully made a copy of your
repository as indicated above,
you may be able to skip this step.
- It should be safe for you to git checkout a
previous commit. You will most likely want to specify
a branch name with the -b parameter,
e.g.,
% git checkout -b turmoil 3435c3f792
- If your commits are fine-grained, you may well be
able to use git cherry-pick to hoist particular
commits from one branch onto another.
Regardless, whatever you do on this branch should be
unable to corrupt your group's central repository,
and you should be able to go back to the saved copy
of your personal repository.
- If you end up with something you like, you can
push it to your common repository and create the
remote branch with
% git push -u
- Submit from this "turmoil" branch--of course,
only after having pushed it to your central repository
first!
Then you can get help
from a git expert at leisure, after submission,
on how to merge this branch onto the trunk or how to
replace the old trunk with this new branch.
Explanation of Concepts
The above involved some simplifications of the underlying concepts of Git
for the purposes of readability and for the purposes of understandability of
an introduction. The simplifications are not disastrous in terms of your
comprehension of what Git is doing behind your back, but you may find it
helpful to know how Git stores data to better work with Git. Tommi
Virtanen's excellent page Git for
Computer Scientists may provide some insight as well, for those who like
to talk about DAGs and are big fans of arrows pointing every which way.
Commits
The basic unit of a point in time stored in Git is a commit.
Each time we spoke of recording changes earlier, it would have been more
correct to say "creating a commit"; I used the words
"recording changes" to distinguish the operation from pushing and
publishing your changes to your partner. A commit, by its nature, is
comprised of a few pieces of information:
- A reference to a parent commit: Each commit has one or
more parent commits that refer to where the commit was derived from; you can
think of the parent commits as previous steps in time from this commit. The
very first commit you make (we called this the initial import earlier) has a
special referenced parent of all zeroes, which Git takes to mean that a
given commit is an initial import.
- A description: This is the text that you enter in the -m
option to git commit.
- One or more changed files: When files are changed, Git records
either a delta -- a binary patch against a file's version in the
parent commit -- or a full version of the file in association with the
commit. The file is technically not stored in the commit; instead, it is
stored as a blob, and the commit contains a reference to the blob.
Each blob can be referenced by many commits, but for most purposes, blobs
behave as if they are "owned" by a commit.
A commit is identified by the SHA1 hash of all of the information that it
contains. This hash is one common form of a refspec -- that is to
say, it is one common way to specify a single commit. Recall that when you
did a checkout to go back in time, you specified a SHA1 hash; in
that case, you were using the SHA1 hash as a refspec.
You may have inferred by now that commits exist in a sort of a tree. Each
commit may have one or more parent commits (a commit with more than one
parent is called a merge commit), and each commit may have zero or
more child commits. You can view the commit tree using gitk, as we
saw above; each commit was identified by a dot, and gitk drew lines
for us between each commit to explicitly show the branches of the tree.
This tree of cryptographic hashes gives Git a few very useful properties.
Git can assure you that nobody has changed the tree that you have based your
work on, because every element in the tree, down to the blobs, is identified
by its cryptographic hash (its SHA1). If a parent object has changed,
either by malicious intent or by disk corruption, Git simply will not be
able to find the parent object, instead of giving you the incorrect data.
This makes Git relatively immune to AFS corrupting its metadata.
Further, it makes it impossible to throw away history. Some version
control systems that we discussed in lecture have versions per file; so
deleting a file may delete its version history, or otherwise create a
discontinuity in how the file is linked in terms of time. Similarly,
renaming a file is not disastrous (although somewhat quirky); the only
changes happen locally in the commit object. If a delete required a
change of history, then the cryptographic hashes would change, and the
entire tree's parent hash would have to change. The cryptographic
hash system, then, makes Git resistant to inadvertant deletion of
history.
Branches, tags, and refspecs -- oh my!
In this section, until now, you've seen only one kind of refspec
-- a SHA1 hash of a commit. But in the quick-start above, you've worked
with more types of refspecs; when you checked out a branch, you used the
refspec that refers to the branch.
Further Reading
Here are some sources you might consult.
|