How does Git store information?

In order to make common operations fast and minimize storage space, Git uses a multi-level structure to store data.  In simplified form, this has three key parts:

  1. Every unique version of every file. (Git calls these blobs because they can contain data of any kind.)
  2. tree that tracks the names and locations of a set of files.
  3. commit that records the author, log message, and other properties of a particular commit.


As the diagram shows, each blob is stored only once, and blobs are (frequently) shared between trees. While it may seem redundant to have both trees and commits, a later part of this lesson will show why the two have to be distinct.

Looking at the diagram, which files changed in the last (bottom-most) commit to this repository?

ans: data/northern.csv

What is a hash?

Every commit to a repository has a unique identifier called a hash (since it is generated by running the changes through a pseudo-random number generator called a hash function). This hash is normally written as a 40-character hexadecimal string like   7c35a3ce607a14953f070f0f83b5d74c2296ef93,   but most of the time, you only have to give Git the first 6 or 8 characters in order to identify the commit you mean.

Hashes are what enable Git to share data efficiently between repositories. If two files are the same, their hashes are guaranteed to be the same. Similarly, if two commits contain the same files and have the same ancestors, their hashes will be the same as well. Git can therefore tell what information needs to be saved where by comparing hashes rather than comparing entire files.

Use cd to go into the dentaldirectory and then run git log. What are the first four characters of the hash of the most recent commit?

ans:none above

How can I view a specific commit?

To view the details of a specific commit, you use the command git show with the first few characters of the commit’s hash. For example, the command git show 043070 produces this:

commit 0430705487381195993bac9c21512ccfb511056d
Author: Rep Loop <>
Date:   Wed Sep 20 13:42:26 2017 +0000

    Added year to report title.

diff --git a/report.txt b/report.txt
index e713b17..4c0742a 100644
--- a/report.txt
+++ b/report.txt
@@ -1,4 +1,4 @@
-# Seasonal Dental Surgeries 2017-18
+# Seasonal Dental Surgeries (2017) 2017-18

 TODO: write executive summary.

The first part is the same as the log entry shown by git log. The second part shows the changes; as with git diff, lines that the change removed are prefixed with -, while lines that it added are prefixed with +.

You have been put in the dentaldirectory. (We will now stop reminding you of this…) Use git log to see the hashes of recent commits, and then git show with the first few digits of a hash to look at the most recent commit. How many files did it change?



Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *


October 2020