GitHub#
- This part of the tutorial will go over how to host your project on GitHub
- The virtue of using a
.gitignore
file - The GitHub workflow
- How to solve a merge conflict
What is GitHub?#
There are several popular providers --Bitbucket, GitLab -- that let you store your git repositories but the most widely used is GitHub.
Apart from storing a copy of your projects, GitHub comes with a lot of useful features. For example, you can use it to share your projects with your colleagues so they can see or modify your project. Is is also a collaboration tool that you can use to work as a part of a team.
Getting started on GitHub?#
-
Go to https://github.com and create a free account.
-
Create a new repository called NYC-311
-
The repository URL will then be https://github.com/username/NYC-311
-
Add the remote repository in your local repository
You can then see the remote repository with the following command> git remote add origin https://github.com/username/nyc-311.git
The remote is named "origin" which is a common choice for the primary repository.> git remote -v
Now we are going to push
the local changes on our repository
to the GitHub repository using the command
> git push --set-upstream origin master
pushed
it
to GitHub. When you push changes to GitHub in the future you just
need to use the command
git push origin master
To pull in changes in your repository done by yourself or another person you can use the following command
git pull origin master
To see this in action grab a friend and have them clone
your
repo with the command. First make sure they also have a GitHub
account and add them as a collaborator to your repository. If
you can't find a friend you can still do this part of the tutorial
just skip the parts about using git clone
and git pull
.
> git clone https://github.com/username/nyc-311.git
cd
into
the project folder and add a special file called .gitignore
.
Fire up a text editor and add the following
#Ignore pyc files
*.pyc
.gitignore
file is a special file that git looks at when trying
to make a commit. We added the entry '*.pyc'. This is an instruction
for git to not commit files that end in .pyc
. Files that end
in pyc are python bytecode files that will appear when you run your
python code. All lines prefixed with #
are comments and will be
ignored; they are meant to be read by people not computers.
Now lets add this file to our repository
git add .gitignore
git commit -m "Added .gitignore"
git push origin master
Now on your machine git pull
the new state of the repository
git pull origin master
ls -a
you should then see the .gitignore file.
And if we look at the log using 'git log` you should see the new commit.
GitHub Flow#
In this portion of the tutorial we will go over branching and the general GitHub workflow.
So far we have been doing the "solo" workflow which consists of the following:
mkdir my_working_directory
cd my_working_directory
git init
touch some_file.py
# hack
# hack
git add some_file.py
git commit -m "Working with some awesome idea"
git push origin master
# hack
# more hack
In the GitHub flow we never code anything unless there is a need to. When there is a need we then create an issue on the GitHub repository. Good issues should be
- Clear
- Defined output
- Actionable (written in the Imperative Voice)
- Could be completed at most in few days
- Examples
- Good: /Fix the bug in .../
- Good: /Add a method that does .../
- Bad: /Solve the project/
- Bad: /Some error happen/
Here is how to create a GitHub issue.
Once we have an issue we will then pull from the repo and create a branch. A branch is a copy of the code base separate from the main master branch where we can work on our issue (e.g, fixing a bug, adding a feature) without affecting the master branch during our work and then ultimately merge our change into the master branch.
The flow goes something like this:
## Pull from the repo
git pull
## Decide what do you want to do and create an issue
git checkout -b a-meaningful-name
git checkout -b
creates a new branch which in this case
is named "a-meaningful-name". We can see what branch we are on by using
the command git branch
which will show all the branches in the local
repository and place an * next to the branch we are currently on.
## hack, hack, hack, add/rm, commit
## Push to the repo and create a remote branch
git push
## Create a pull-request and describe your work (Suggest/add a reviewer)
## Someone then reviews your code
## The pull-request is closed and the remote branch is destroyed
## Switch to master locally
git checkout master
## Pull the most recent changes (including yours)
git pull
## Delete your local branch
git branch -d a-meaningful-name
Solving a merge conflict#
As you work on projects with others you will inevitably run into merge conflicts. A merge conflict is caused when you and another person edits the same line of a file. Git will not know which line is the correct one and create a conflict.
Let's make a conflict!
#create a branch called drama
> git checkout -b drama
#now modify the descriptive_stats.py file and change the top 10
#values to top 13
#commit your changes
> git add descriptive_stats.py
> git commit -m "Changed top 10 to top 13 in descriptive_stats.py
#switch back to master
> git checkout master
#now modify the descriptive_stats.py file and change the top 10
#values to top 3
# commit your changes
> git add descriptive_stats.py
> git commit -m "Changed the top 10 to the top 3 in descriptive_stats.py"
> git merge drama
Auto-merging descriptive_stats.py
CONFLICT (content): Merge conflict in descriptive_stats.py
Automatic merge failed; fix conflicts and then commit the result.
On branch master
You have unmerged paths.
(fix conflicts and run "git commit")
Unmerged paths:
(use "git add <file>..." to mark resolution)
both modified: descriptive_stats.py
descriptive_stats.py
, you should
see the following:
from __future__ import print_function
import pandas
fname_data = '311-service-requests.csv'
df_311_calls = pd.read_csv(fname_data)
<<<<<<< HEAD
print(df_311_calls['Complaint Type'].value_counts()[:3])
=======
print(df_311_calls['Complaint Type'].value_counts()[:13])
>>>>>>> drama
>>>>>>>
and <<<<<<<
denote the section of the conflicting code. HEAD
means the following line if from the maser branch while drama
shows that
the preceding line is from the drama branch. The lines of the two branches
are separated by the =======
. Given a merge conflict we have
three choices: either keep the line from the master branch, keep the line
from the drama branch or create an entirely new line. In this case we are
going to keep the line from the master branch. To do that we delete the
merge conflict markers and the line from the drama branch and then make a
commit.
Your code should look like this:
from __future__ import print_function
import pandas
fname_data = '311-service-requests.csv'
df_311_calls = pd.read_csv(fname_data)
print(df_311_calls['Complaint Type'].value_counts()[:3])
git add descriptive_stats.py
git commit -m "Fixed merge conflict in descriptive stats"
That concludes the tutorial on GitHub! Good Luck!
Acknowledgements, References and Further Resources#
This tutorial is derived from tutorials created by Eduardo Blancas Reyes, Benedict Kuester, Adolfo De Unanue, Software Carpentry, and ASU PHY-494.
Furthur resources for becoming a git master are: * Software Carpentry -- a more in-depth intro tutorial
-
15 minute tutorial to learn git - Intro tutorial.
-
git - the simple guide - A simple guide to get to know the most important concepts.
-
A successful git branching model - A model to work with git using branches. This model is widely used in the open source community.
-
Learn Git Branching - Understanding what branches and rebases are, in an amazing interactive tutorial.
-
Reset Demystified - A blog post on
git reset
which develops some useful concepts along the way. -
Understanding git for real by exploring the .git directory - A blog post on what's inside a commit.
Pro Git -- An in-depth discussion written by git masters.