Git under the Hood, Part 3: the Git Directory

Amir Ebrahimi Fard
Data Management for Researchers
4 min readJul 26, 2021

--

*** This article is inspired by [1] ***

Photo by Alexander Andrews on Unsplash

The Git directory is where all the objects, references, and configuration files are stored. When a Git repository is initialised either by cloning a remote repository or creating a new one, a folder called .git is created in the working directory. This folder follows the structure shown in Figure 1. All Git objects including blobs, trees, tags, and commits are kept in the objects/ subfolder. As mentioned previously, every object is identified by a 40-character hexadecimal sequence called a SHA-1. Git uses a two-level structure [2][3] to keep the number of directory entries within objects to a manageable number. On the first level, objects are placed over 256 sub-directories¹, and on the second level, objects are sorted into different categories depending on the first two characters of their hexadecimal identifier. Objects stored within this structure are often called unpacked (or loose) objects. Please note that on the second level objects are stored with the remaining 38 characters of their hash sequence. For example, if the SHA-1 for a blob is 5ea69784291616a00310f869b2841dc84318fdca the object will be stored as : [GIT_DIR]/.git/objects/5e/a69784291616a00310f869b2841dc84318fdca. In order to make the 40-character hash, the 38-character object names in the subcategories must be prepended with their parent folder name. In addition to the folders where objects are stored, there is another important folder called pack [4]. This folder keeps the compressed version of Git objects as pack files. Git periodically runs a compression on the objects, makes read-only pack files and removes the corresponding objects.

The other folder in the .git directory is refs/ which normally consists of three subdirectories: heads, remotes and tags. These hold files that correspond to local branches, remote branches, and tags, respectively. For example, if you create a branch called features, then a .git/refs/heads/features file will be created and will contain the SHA-1 of the latest commit made in that branch. The hooks/ directory is where the hook programs reside[5][6][7]. These are shell scripts that trigger actions at certain points in Git’s execution. The info/ directory records additional information about the repository. One of the important files in this folder is info/exclude which has the same functionality as .gitignore²; however, it won’t be versioned as it is in the .git folder [8][9]. The log/ folder records all the changes made to references and tags. In the /logs/refs/heads/ all the changes made to the branch(es) tip(s) are recorded. Similarly, in the /logs/refs/tags/ all changes made to tags are recorded.

There are also several files in the root of the .git directory. HEAD is a reference to the currently active branch. HEAD can also point directly to a commit object — this scenario is called a detached HEAD state (discussed in another article in this series (Git Under the Hood, Part 2: Referencing commits)). Another important file is the configuration file, which contains repository-specific configuration instructions. It keeps project-specific Git options and settings, such as remotes, push configurations, tracking branches and more. The index is a binary file that captures the staging area with metadata like timestamps, file names, and also SHAs of Git objects. COMMIT_EDITMSG “contains the commit message of a commit in progress. If a git commit command in progress exits due to an error before execution, any commit message that has been provided by the user (e.g., in an editor session) will be available in this file, but will be overwritten by the next invocation of git commit ”. The ORIG_HEAD file refers to the previous state of HEAD [10]. The packed-refs file records the same information as refs/heads/ and refs/tags/. It is useful for efficient repository access. The last file is description which is an unnamed repository by default — developers should place the actual project name and description in this file[11][12]. It is used by Git as a default way to know the name of the repository. It is used by GitWeb³ (not GitHub or GitLab).

Figure 1: The structure of a Git directory.

Please note that when Git repository is initialised, the bare repository does not contain some of the above-mentioned files. As new files and commits are added by the developer(s), those files appear.

Footnotes

  1. They are named after the first two characters of the SHA-1 sequence which is in the [00 — ff] period.
  2. .gitignore will be explained in depth in another article.
  3. GitWeb is a web interface for Git projects. It can be used to generate a web application with search functionality, an RSS feed and many other features. It can be seen as a native alternative to external services like GitHub.

--

--

Amir Ebrahimi Fard
Data Management for Researchers

Postdoc Researcher on AI Explainability - Interested in the intersection of data, algorithm, and society.