Skip to content

Using the Git Log#

In this portion of the tutorial we are going to explore the git log and how to go back to prior parts of our project. We left off creating a README file. Now the time has come to actually write some code.

Create a python script to output results of analyzing 311 calls#

Let's download some data to work with using the following command you can copy and paste.

> curl -O https://raw.githubusercontent.com/avishekrk/pandas-cookbook/master/data/311-service-requests.csv

Fire up your favorite text editor and let's write a little program called descriptive_stats.py to get the most common complaints from 311 data in NYC.

If you are using nano invoke the command

> nano descriptive_stats.py
Add the following in the text. Don't worry you don't need to know what this means right now. We are loading some data into and finding the top 5 categories of complaints using 311 data.

from __future__ import print_function
import pandas as pd

fname_data = '311-service-requests.csv'
df_311_calls = pd.read_csv(fname_data)
print( df_311_calls['Complaint Type'].value_counts()[:5])

Remember, to save the program in nano the command is Ctrl-O and the to exit is Ctrl-X.

We have just created a python program. We can run the program using the following syntax.

> python <program name>

In our case we run the following command:

> python descriptive_stats.py
When we run the program we should get the following output
HEATING                   14200
GENERAL CONSTRUCTION       7471
Street Light Condition     7117
DOF Literature Request     5797
PLUMBING                   5373
Name: Complaint Type, dtype: int64
These is the top 5 311 complaints for 2010 in NYC.

Now that we have a working program lets commit it to the repo, just like before.

> git add descriptive_stats.py # to the staging area
> git commit -m "Checking in descriptive_stats.py, output top 5 311 complaints"
In this case rather then launching a text editor to write a commit message we used the -m option to make an in-line commit message. Our change doesn't require a lengthy commit message. Now we should have a commited version of descriptive stats.

Let's now decide that we want the top 10 311 complaints and modify our program to output the top 10 results. Our current program should now be:

from __future__ import print_function
import pandas as pd

fname_data = '311-service-requests.csv'
df_311_calls = pd.read_csv('311-service-requests.csv')
print( df_311_calls['Complaint Type'].value_counts[:10])
Let's commit that change:
git add descriptive_stats.py
git commit -m "Changed the top 5 results to the top 10 results in descriptive_stats.py"
If we look at our git log we should new be able to see history of our changes:
> git log


commit 42c35933c4d52708c2562c1c05361b152a2b9230
Author: Clark Kent <clark.kent@dailyplanet.com>
Date:   Sat Nov 12 16:55:29 2016 -0600

    Changed the top 5 results to the top 10 results in descriptive_stats.py

commit ab85797b2c3d68fb0c97535080079138888b5556
Author: Clark Kent <clark.kent@dailyplanet.com>
Date:   Sat Nov 12 16:52:52 2016 -0600

    Checking-in descriptive_stats.py outputs the top 5 311 complaints

commit aaf89fd77e9b43d99fe32823843a7519b2108c90
Author: Clark Kent <clark.kent@dailyplanet.com>
Date:   Sat Nov 12 13:45:11 2016 -0600
        Checking in README file

    * Added short description of the project
    * Added python3 as a dependency
Now let's look at the difference between the two commits in the log using the git diff command. The git diff command is important for seeing changes in your source code and comparing one commit against another.

If we now invoke the command

> git diff HEAD~1

diff --git a/descriptive_stats.py b/descriptive_stats.py
index 09b7168..c38d3e3 100644
--- a/descriptive_stats.py
+++ b/descriptive_stats.py
@@ -3,4 +3,4 @@ import pandas as pd

 fname_data = '311-service-requests.csv'
  df_311_calls = pd.read_csv('311-service-requests.csv')
  -print( df_311_calls['Complaint Type'].value_counts()[:5] )
  +print( df_311_calls['Complaint Type'].value_counts()[:10] )
First HEAD is shorthand for the latest commit in the repository. HEAD~1 is a shorthand for the lastest commit minus one. For instance HEAD~20 refers to the a commit 20 commits ago.

The output of the diff file is the following. The first line looks similiar to a diff command. The second line shows the commit identifiers of the two commits being compared. The next two lines are the files being compared. The interesting part is at the bottom. The line with the - sign is our prior commit. The line with a + sign the the current commit. We can see the difference is the change between 5 to 10.

We can "checkout" old versions of our files using the checkout command. This is a very handy feature for when we break something and want to start from a working copy or if we have an old feature that has since been discarded it can be restored.

Let's now go back to our prior commit using the git checkout command

git checkout HEAD~1 descriptive_stats.py
If we look at our file descriptive_stats.py we will have reverted back to our old version.
git checkout HEAD descriptive_stats.py

Another useful git diff command is:

git diff --staged
where we can examine the differences between files that have been staged for commit and the last commit.

Next up we are going to go over how to host a project on GitHub.