Philipp Heuberger

Roll your own git in go part 2: implementing `init`

Yes yes, I know. The last post was quite boring by Hollywood action movie standards. Nothing to execute, run or anything. But not this time. This time we’re going to create some magic.

Today is all about humble beginnings.

running gogito init in a working directory, seeing how the shell picks up the main branch

Look ma! Isn’t it beautiful how my shell picks up the Git repo I just created using gogito and prints the default branch name main? Fyi, I’m using asciinema + agg for the terminal recording and for some reason it prints the command again when executing it. That’s not actually happening in the shell.

This commit adds all the code and tests for the init feature.

The repository abstraction

First things first, let’s build an abstraction for our repository that holds the path to our working directory, the .git directory, and the configuration.

I guess we won’t need the configuration all that often but it’s nice to have it in there. The package also has some neat helper functions for things like building internal Git paths and creating folders.

package repo

import (
 "fmt"
 "os"
 "path/filepath"

 "github.com/pheuberger/gogito/internal/paths"
)

type Repo struct {
 PathToWorkingDir string
 PathToGitDir     string
 Config           Config
}

// Initializes a new Repo struct from a given path.
//
// It will load .git/config and return the initialized repo. If the working
// directory is not a git repository, the function will return an error "not a
// git repository".
// If repositoryformatversion is not 0, the function will return an error as
// well since we're not equipped to deal with that repo type.
// See https://git-scm.com/docs/repository-version
func From(pathToWorkingDir string) (Repo, error) {
 config, err := readConfig(paths.GitDir(pathToWorkingDir))
 if err != nil {
  return Repo{}, err
 }
 if config.formatVersion != 0 {
  return Repo{}, fmt.Errorf("unsupported repositoryformatversion: %d", config.formatVersion)
 }
 return Repo{
  Config:           config,
  PathToWorkingDir: pathToWorkingDir,
  PathToGitDir:     paths.GitDir(pathToWorkingDir),
 }, nil
}

func IsGitRepo(pathToWorkingDir string) bool {
 _, err := os.Stat(paths.GitDir(pathToWorkingDir))
 return err == nil
}

func (repo Repo) Path(pathElements ...string) string {
 combined := append([]string{repo.PathToGitDir}, pathElements...)
 return filepath.Join(combined...)
}

func (repo Repo) EnsureDirs(pathElements ...string) error {
 path := repo.Path(pathElements...)
 return os.MkdirAll(path, 0777)
}

Creating text files

Git creates several configuration and metadata files. These files are plain text, so let’s create a file package and include a basic Write function. We will likely extend this package when creating compressed Git objects later on.

One exception is the config file. It’s in the INI file format, which we won’t write by hand. We’ll cover it in the next section.

package file

import (
 "os"
)

func Write(filepath string, text string) error {
 f, err := os.Create(filepath)
 if err != nil {
  return err
 } else {
  defer f.Close()
 }

 if _, err := f.WriteString(text); err != nil {
  return err
 }
 return nil
}

Creating the config with Viper

To read and write the config file I’m going to use Viper. It’s a powerful library that supports multiple configuration file formats and is easy to use.

Well, there’s not much more to say. Now, let’s let the code speak for itself.

package repo

import (
 "errors"
 "path/filepath"

 "github.com/spf13/viper"
)

type Config struct {
 formatVersion    int
 fileMode         bool
 bare             bool
 logAllRefUpdates bool
}

const ConfigName = "config"

func WriteDefaultConfig(pathToGitDir string) error {
 return writeConfig(defaultConfig(), pathToGitDir)
}

// Read git config file in git directory specified by pathToGitDir.
//
// Returns error "not a git repository" if the config file is not found or a
// more specific error if the config could not be read.
func readConfig(pathToGitDir string) (Config, error) {
 viper.SetConfigName(ConfigName)
 viper.SetConfigType("ini")
 viper.AddConfigPath(pathToGitDir)

 if err := viper.ReadInConfig(); err != nil {
  if _, ok := err.(viper.ConfigFileNotFoundError); ok {
   return Config{}, errors.New("not a git repository")
  } else {
   // Config file was found but another error was produced
   return Config{}, err
  }
 }
 return Config{
  formatVersion:    viper.GetInt("core.repositoryformatversion"),
  fileMode:         viper.GetBool("core.filemode"),
  bare:             viper.GetBool("core.bare"),
  logAllRefUpdates: viper.GetBool("core.logallrefupdates"),
 }, nil
}

func writeConfig(config Config, pathToGitDir string) error {
 viper.SetConfigType("ini")
 viper.Set("core.repositoryformatversion", config.formatVersion)
 viper.Set("core.filemode", config.fileMode)
 viper.Set("core.bare", config.bare)
 viper.Set("core.logallrefupdates", config.logAllRefUpdates)
 return viper.WriteConfigAs(filepath.Join(pathToGitDir, ConfigName))
}

func defaultConfig() Config {
 return Config{
  formatVersion:    0,
  fileMode:         true,
  bare:             false,
  logAllRefUpdates: true,
 }
}

Finally, the init command

This command checks if a directory is already a Git repo, and if not, it sets up the necessary directories and files.

When you run git init, the following structure is being created:

.git/
├── branches
├── config
├── description
├── HEAD
├── hooks
│   ├── ...
├── info
│   └── exclude
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

FYI: Git usually puts a bunch of samples into the hooks/ directory. We’re just not going to bother with that.

Quick overview what all those files and directories are for:

Wait a minute, what are blobs, trees and commits really?

If you have a minute or two, I’d very much recommend watching Git Internals - How Git Works - Fear Not The SHA! by Scott Chacon. Otherwise here’s a quick rundown:

  1. Blobs: Think of them as the raw content of your files. Every time you save a file in your project, Git takes that file and turns it into a blob (binary large object). Each blob gets a unique hash representing that specific version of the file’s content. So, whenever you change and save a file, Git creates a new blob with a new hash to capture that exact version. This is what’s known as content-addressable storage. More on that in one of the next articles.

  2. Trees: Trees are like directories. They’re Git objects that list the filenames and the hashes of the blobs (or other trees) they contain. This effectively maps out the structure of your files. Think of a tree as a snapshot of a directory at a specific point in time, showing how files are organized and interlinked.

  3. Commits: Commits are snapshots of your repository at a particular moment. They include metadata such as the author, commit message, and a reference to the root tree, linking the commit to the actual content and structure of your project at that time. Each commit builds upon the previous ones, creating a chain of changes and updates over time.

Without further ado, here’s the code:

package subcommands

import (
 "fmt"
 "os"

 "github.com/pheuberger/gogito/internal/file"
 "github.com/pheuberger/gogito/internal/paths"
 "github.com/pheuberger/gogito/internal/repo"
)

const DESCRIPTION_TEXT = "Unnamed repository; edit this file 'description' to name the repository.\n"

// Straight up returns errors wihout polishing them.
// Since this is a learning project, this is fine.
func Init(pathToWorkingDir string) error {
 if repo.IsGitRepo(pathToWorkingDir) {
  fmt.Printf("%s is already a git repository. nothing to do\n", paths.AbsFrom(pathToWorkingDir))
  return nil
 }
 if err := createGitDirectory(pathToWorkingDir); err != nil {
  return err
 }
 if err := repo.WriteDefaultConfig(paths.GitDir(pathToWorkingDir)); err != nil {
  return err
 }

 // It's safe to instantiate a repo object since we just wrote the config.
 // Also, not expecting an error here because we just created the config
 // ourselves and know it to be sound. So ignore.
 repository, _ := repo.From(pathToWorkingDir)
 if err := createDirs(repository); err != nil {
  return err
 }
 if err := createFiles(repository); err != nil {
  return err
 }
 return nil
}

func createGitDirectory(pathToWorkingDir string) error {
 pathToGitDir := paths.GitDir(pathToWorkingDir)
 // mode 0777 before umask
 return os.Mkdir(pathToGitDir, 0777)
}

func createDirs(repository repo.Repo) error {
 dirs := [][]string{
  {"objects"},
  {"objects", "info"},
  {"objects", "pack"},
  {"branches"},
  {"info"},
  {"refs"},
  {"refs", "heads"},
  {"refs", "tags"},
  {"hooks"},
 }

 for _, dir := range dirs {
  if err := repository.EnsureDirs(dir...); err != nil {
   return fmt.Errorf("failed to create internal directory %v: %w", dir, err)
  }
 }
 return nil
}

func createFiles(repo repo.Repo) error {
 files := map[string]string{
  "description":  DESCRIPTION_TEXT,
  "HEAD":         "ref: refs/heads/main\n",
  "info/exclude": "# exclude file\n",
 }

 for path, content := range files {
  if err := file.Write(repo.Path(path), content); err != nil {
   return fmt.Errorf("failed to write internal file %s: %w", path, err)
  }
 }
 return nil
}

Testing

I wrote tests for the init subcommand to make sure the files and directories were created correctly. Quick heads up: I’ve decided I’m not going to cover tests in my articles because they’re not that interesting to talk about. You should definitely check them out, though. Head over to the GitHub repo

Wrapping Up

Phew! That was quite the wall of text. In the next one, we’re going to implement the add command. That means talking more about content-addressing and how git organizes your files.

I started a newsletter for this series, so if you want to get notifications when a new article drops, sign up here.

As always, you can reach me on tweeter, linkedin or email.