git - Git blobless repository

Question

I'm wondering if there's a way to get commit and tree objects only from a remote.

This may sound like a silly question, I'm not sure—I'm new to git plumbing. I'm building an app that associates meta-data with git commits, authorships, and file system structure. My options are to build a cludgy in-database normalization of the data with some sort of hook-enabled syncing mechanism, or to use the powerful native git tools for syncing, attaching metadata, and querying history.

However, since I don't actually need the blob objects, it'd save me a buck or two on hosting if I could shed them somehow. Is this or any incarnation of the concept possible?

score 1 · Accepted Answer

Technically, a commit object only names a tree object, and then the tree object (once found) names more trees and blobs. Thus, a git repository in which all the blob object files were deliberately "broken" (e.g., overwritten with an empty file, or even removed entirely) would work to some degree—in fact, to the same degree that it does if you create such a thing manually:

$ chmod +w .git/objects/f7/0d6b139823ab30278db23bb547c61e0d4444fb
$ : > .git/objects/f7/0d6b139823ab30278db23bb547c61e0d4444fb
$ git status
# On branch master
nothing to commit, working directory clean
$ git cat-file -p HEAD:file
error: object file .git/objects/f7/0d6b139823ab30278db23bb547c61e0d4444fb is empty
fatal: Not a valid object name HEAD:file
$ git fsck
Checking object directories: 100% (256/256), done.
error: object file .git/objects/f7/0d6b139823ab30278db23bb547c61e0d4444fb is empty
error: sha1 mismatch f70d6b139823ab30278db23bb547c61e0d4444fb
error: f70d6b139823ab30278db23bb547c61e0d4444fb: object corrupt or missing
missing blob f70d6b139823ab30278db23bb547c61e0d4444fb

Clearly it sort-of-works. (In fact, git cat-file -p HEAD and git cat-file -p HEAD: also work here, as does git ls-tree -r HEAD.)

The problem you're going to run into immediately is that git prefers to store objects in packs, and transfer packs around, and those will notice the corrupted (or missing, if you rm them) objects. It might not even save that much space, depending on how compressed the objects are in the packs (it's been observed that the repo is sometimes smaller than the checked-out tree!).

score 1 · Accepted Answer

Today, git has "partial clone" options that enable downloading the commits and trees of a repository without its blobs. You can do it by passing --filter=blob:none to the git clone command. This does require the remote you are cloning from to have a new enough git version to support the filter protocol.

git - Git blobless repository

2 回答 2

Related

Reference