7

我有一个包含数千个文件的 Git 存储库,并且想获取每个单独文件的最后一次提交的日期和时间。这可以使用 Python 来完成吗(例如,通过使用类似的东西os.path.getmtime(path))?

4

4 回答 4

8

使用GitPython,这将完成这项工作:

import git
repo = git.Repo("./repo")
tree = repo.tree()
for blob in tree:
    commit = repo.iter_commits(paths=blob.path, max_count=1).next()
    print(blob.path, commit.committed_date)

请注意,这commit.committed_date是“自纪元以来的秒数”格式。

于 2016-02-17T17:58:08.017 回答
4

一个有趣的问题。下面是一个快速而肮脏的实现。我习惯于multiprocessing.Pool.imap()启动子流程,因为它很方便。

#!/usr/bin/env python
# vim:fileencoding=utf-8:ft=python
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# Last modified: 2015-05-24 12:28:45 +0200
#
# To the extent possible under law, Roland Smith has waived all
# copyright and related or neighboring rights to gitdates.py. This
# work is published from the Netherlands. See
# http://creativecommons.org/publicdomain/zero/1.0/

"""For each file in a directory managed by git, get the short hash and
data of the most recent commit of that file."""

from __future__ import print_function
from multiprocessing import Pool
import os
import subprocess
import sys
import time

# Suppres annoying command prompts on ms-windows.
startupinfo = None
if os.name == 'nt':
    startupinfo = subprocess.STARTUPINFO()
    startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW


def main():
    """
    Entry point for gitdates.
    """
    checkfor(['git', '--version'])
    # Get a list of all files
    allfiles = []
    # Get a list of excluded files.
    if '.git' not in os.listdir('.'):
        print('This directory is not managed by git.')
        sys.exit(0)
    exargs = ['git', 'ls-files', '-i', '-o', '--exclude-standard']
    exc = subprocess.check_output(exargs, startupinfo=startupinfo).split()
    for root, dirs, files in os.walk('.'):
        for d in ['.git', '__pycache__']:
            try:
                dirs.remove(d)
            except ValueError:
                pass
        tmp = [os.path.join(root, f) for f in files if f not in exc]
        allfiles += tmp
    # Gather the files' data using a Pool.
    p = Pool()
    filedata = [res for res in p.imap_unordered(filecheck, allfiles)
                if res is not None]
    p.close()
    # Sort the data (latest modified first) and print it
    filedata.sort(key=lambda a: a[2], reverse=True)
    dfmt = '%Y-%m-%d %H:%M:%S %Z'
    for name, tag, date in filedata:
        print('{}|{}|{}'.format(name, tag, time.strftime(dfmt, date)))


def checkfor(args, rv=0):
    """
    Make sure that a program necessary for using this script is available.
    Calls sys.exit when this is not the case.

    Arguments:
        args: String or list of strings of commands. A single string may
            not contain spaces.
        rv: Expected return value from evoking the command.
    """
    if isinstance(args, str):
        if ' ' in args:
            raise ValueError('no spaces in single command allowed')
        args = [args]
    try:
        with open(os.devnull, 'w') as bb:
            rc = subprocess.call(args, stdout=bb, stderr=bb,
                                 startupinfo=startupinfo)
        if rc != rv:
            raise OSError
    except OSError as oops:
        outs = "Required program '{}' not found: {}."
        print(outs.format(args[0], oops.strerror))
        sys.exit(1)


def filecheck(fname):
    """
    Start a git process to get file info. Return a string containing the
    filename, the abbreviated commit hash and the author date in ISO 8601
    format.

    Arguments:
        fname: Name of the file to check.

    Returns:
        A 3-tuple containing the file name, latest short hash and latest
        commit date.
    """
    args = ['git', '--no-pager', 'log', '-1', '--format=%h|%at', fname]
    try:
        b = subprocess.check_output(args, startupinfo=startupinfo)
        data = b.decode()[:-1]
        h, t = data.split('|')
        out = (fname[2:], h, time.gmtime(float(t)))
    except (subprocess.CalledProcessError, ValueError):
        return None
    return out


if __name__ == '__main__':
    main()

示例输出:

serve-git|8d92934|2012-08-31 21:21:38 +0200
setres|8d92934|2012-08-31 21:21:38 +0200
mydec|e711e27|2008-04-09 21:26:05 +0200
sync-iaudio|8d92934|2012-08-31 21:21:38 +0200
tarenc|8d92934|2012-08-31 21:21:38 +0200
keypress.sh|a5c0fb5|2009-09-29 00:00:51 +0200
tolower|8d92934|2012-08-31 21:21:38 +0200

编辑:更新为使用os.devnull(也适用于 ms-windows)而不是/dev/null.

Edit2:用于startupinfo禁止在 ms-windows 上弹出命令提示。

Edit3:用于__future__使其与 Python 2 和 3 兼容。使用 2.7.9 和 3.4.3 进行测试。现在也可以在 github 上找到

于 2012-10-28T13:30:34.623 回答
0

您可以使用GitPython库。

于 2012-10-27T21:53:46.513 回答
0

这对我有用

http://gitpython.readthedocs.io/en/stable/tutorial.html#the-tree-object

根据文档,由于树只允许直接访问其中间子条目,请使用 traverse 方法获取迭代器以递归方式检索条目

它创建了一个生成器对象来完成工作

print tree.traverse()
<generator object traverse at 0x0000000004129DC8>

for blob in tree.traverse():
    commit=repo.iter_commits(paths=blob.path).next()
        print(blob.path,commit.committed_date)
于 2018-05-03T18:53:45.820 回答