Using the Git Log#
In this portion of the tutorial we are going to explore the git log and how to go back to prior parts of our project. We left off creating a README file. Now the time has come to actually write some code.
Create a python script to output results of analyzing 311 calls#
Let's download some data to work with using the following command you can copy and paste.
> curl -O https://raw.githubusercontent.com/avishekrk/pandas-cookbook/master/data/311-service-requests.csv
Fire up your favorite text editor and let's write a little program
called descriptive_stats.py
to get the most common complaints
from 311 data in NYC.
If you are using nano
invoke the command
> nano descriptive_stats.py
from __future__ import print_function
import pandas as pd
fname_data = '311-service-requests.csv'
df_311_calls = pd.read_csv(fname_data)
print( df_311_calls['Complaint Type'].value_counts()[:5])
Remember, to save the program in nano the command is Ctrl-O
and
the to exit is Ctrl-X
.
We have just created a python program. We can run the program using the following syntax.
> python <program name>
In our case we run the following command:
> python descriptive_stats.py
HEATING 14200
GENERAL CONSTRUCTION 7471
Street Light Condition 7117
DOF Literature Request 5797
PLUMBING 5373
Name: Complaint Type, dtype: int64
Now that we have a working program lets commit it to the repo, just like before.
> git add descriptive_stats.py # to the staging area
> git commit -m "Checking in descriptive_stats.py, output top 5 311 complaints"
-m
option to make an in-line commit message. Our change doesn't
require a lengthy commit message. Now we should have a commited version of descriptive
stats.
Let's now decide that we want the top 10 311 complaints and modify our program to output the top 10 results. Our current program should now be:
from __future__ import print_function
import pandas as pd
fname_data = '311-service-requests.csv'
df_311_calls = pd.read_csv('311-service-requests.csv')
print( df_311_calls['Complaint Type'].value_counts[:10])
git add descriptive_stats.py
git commit -m "Changed the top 5 results to the top 10 results in descriptive_stats.py"
> git log
commit 42c35933c4d52708c2562c1c05361b152a2b9230
Author: Clark Kent <clark.kent@dailyplanet.com>
Date: Sat Nov 12 16:55:29 2016 -0600
Changed the top 5 results to the top 10 results in descriptive_stats.py
commit ab85797b2c3d68fb0c97535080079138888b5556
Author: Clark Kent <clark.kent@dailyplanet.com>
Date: Sat Nov 12 16:52:52 2016 -0600
Checking-in descriptive_stats.py outputs the top 5 311 complaints
commit aaf89fd77e9b43d99fe32823843a7519b2108c90
Author: Clark Kent <clark.kent@dailyplanet.com>
Date: Sat Nov 12 13:45:11 2016 -0600
Checking in README file
* Added short description of the project
* Added python3 as a dependency
git diff
command. The git diff
command is important
for seeing changes in your source code and comparing one commit against
another.
If we now invoke the command
> git diff HEAD~1
diff --git a/descriptive_stats.py b/descriptive_stats.py
index 09b7168..c38d3e3 100644
--- a/descriptive_stats.py
+++ b/descriptive_stats.py
@@ -3,4 +3,4 @@ import pandas as pd
fname_data = '311-service-requests.csv'
df_311_calls = pd.read_csv('311-service-requests.csv')
-print( df_311_calls['Complaint Type'].value_counts()[:5] )
+print( df_311_calls['Complaint Type'].value_counts()[:10] )
HEAD
is shorthand for the latest commit in the repository. HEAD~1
is
a shorthand for the lastest commit minus one. For instance HEAD~20
refers to the
a commit 20 commits ago.
The output of the diff file is the following. The first line looks similiar to a diff command. The second line shows the commit identifiers of the two commits being compared. The next two lines are the files being compared. The interesting part is at the bottom. The line with the - sign is our prior commit. The line with a + sign the the current commit. We can see the difference is the change between 5 to 10.
We can "checkout" old versions of our files using the checkout command. This is a very handy feature for when we break something and want to start from a working copy or if we have an old feature that has since been discarded it can be restored.
Let's now go back to our prior commit using the git checkout
command
git checkout HEAD~1 descriptive_stats.py
descriptive_stats.py
we will have reverted
back to our old version.
git checkout HEAD descriptive_stats.py
Another useful git diff
command is:
git diff --staged
Next up we are going to go over how to host a project on GitHub.