我的谷歌应用程序脚本正在遍历用户的谷歌驱动器文件并将文件复制并有时将文件移动到其他文件夹。脚本总是在 5 分钟后停止,日志中没有错误消息。
我在一次运行中对数十甚至数千个文件进行排序。
是否有任何设置或解决方法?
我的谷歌应用程序脚本正在遍历用户的谷歌驱动器文件并将文件复制并有时将文件移动到其他文件夹。脚本总是在 5 分钟后停止,日志中没有错误消息。
我在一次运行中对数十甚至数千个文件进行排序。
是否有任何设置或解决方法?
您可以做的一件事(这当然取决于您要完成的工作)是:
这不是一个万能的解决方案,如果您发布代码,人们将能够更好地为您提供帮助。
这是我每天使用的脚本的简化代码摘录:
function runMe() {
var startTime= (new Date()).getTime();
//do some work here
var scriptProperties = PropertiesService.getScriptProperties();
var startRow= scriptProperties.getProperty('start_row');
for(var ii = startRow; ii <= size; ii++) {
var currTime = (new Date()).getTime();
if(currTime - startTime >= MAX_RUNNING_TIME) {
scriptProperties.setProperty("start_row", ii);
ScriptApp.newTrigger("runMe")
.timeBased()
.at(new Date(currTime+REASONABLE_TIME_TO_WAIT))
.create();
break;
} else {
doSomeWork();
}
}
//do some more work here
}
注意#1:该变量REASONABLE_TIME_TO_WAIT
应该足够大以触发新触发器。(我将其设置为 5 分钟,但我认为它可能会少于此)。
注意#2:doSomeWork()
必须是一个执行相对较快的函数(我会说不到 1 分钟)。
注意#3:Google 已弃用Script Properties
,并Properties Service
取而代之。该功能已相应修改。
注意#4:第二次调用该函数时,它将 for 循环的第 i 个值作为字符串。所以你必须把它转换成整数
单个脚本的最长执行时间为 6 分钟/执行
- https://developers.google.com/apps-script/guides/services/quotas
但是还有其他限制要熟悉。例如,您只允许每天 1 小时的总触发运行时间,因此您不能将一个长函数分解为 12 个不同的 5 分钟块。
也就是说,你真的需要花六分钟来执行的原因很少。JavaScript 在几秒钟内对数千行数据进行排序应该没有问题。可能会损害您的性能的是对 Google Apps 本身的服务调用。
您可以编写脚本以最大限度地利用内置缓存,最大限度地减少读取和写入次数。交替读取和写入命令很慢。为了加快脚本的速度,用一个命令将所有数据读入数组,对数组中的数据执行任何操作,然后用一个命令写出数据。
- https://developers.google.com/apps-script/best_practices
您可以做的最好的事情是减少服务调用的数量。谷歌通过允许大部分 API 调用的批处理版本来实现这一点。
作为一个简单的例子,而不是这个:
for (var i = 1; i <= 100; i++) {
SpreadsheetApp.getActiveSheet().deleteRow(i);
}
这样做:
SpreadsheetApp.getActiveSheet().deleteRows(i, 100);
在第一个循环中,您不仅需要在工作表上调用 100 次 deleteRow,而且还需要获取活动工作表 100 次。第二种变体的性能应该比第一种好几个数量级。
此外,您还应该非常小心,不要在阅读和写作之间频繁地来回走动。您不仅会失去批处理操作的潜在收益,而且 Google 将无法使用其内置缓存。
每次读取时,我们必须先清空(提交)写入缓存以确保您正在读取最新数据(您可以通过调用强制写入缓存
SpreadsheetApp.flush()
)。同样,每次您进行写入时,我们都必须丢弃读取缓存,因为它不再有效。因此,如果您可以避免交叉读取和写入,您将充分利用缓存。
- http://googleappsscript.blogspot.com/2010/06/optimizing-spreadsheet-operations.html
例如,而不是这个:
sheet.getRange("A1").setValue(1);
sheet.getRange("B1").setValue(2);
sheet.getRange("C1").setValue(3);
sheet.getRange("D1").setValue(4);
这样做:
sheet.getRange("A1:D1").setValues([[1,2,3,4]]);
作为最后的手段,如果您的函数确实无法在六分钟内完成,您可以将调用链接在一起或分解您的函数以处理较小的数据段。
您可以将数据存储在缓存服务(临时)或属性服务(永久)存储桶中,以便跨执行检索(因为 Google Apps 脚本具有无状态执行)。
如果您想启动另一个事件,您可以使用Trigger Builder 类创建自己的触发器,或者在紧迫的时间表上设置重复触发器。
另外,尽量减少对谷歌服务的调用量。例如,如果您想更改电子表格中的一系列单元格,请不要读取每个单元格,对其进行变异并将其存储回来。而是将整个范围(使用Range.getValues())读入内存,对其进行变异并一次存储所有内容(使用Range.setValues())。
这应该可以为您节省大量的执行时间。
Anton Soradoi 的回答似乎还可以,但考虑使用缓存服务而不是将数据存储到临时工作表中。
function getRssFeed() {
var cache = CacheService.getPublicCache();
var cached = cache.get("rss-feed-contents");
if (cached != null) {
return cached;
}
var result = UrlFetchApp.fetch("http://example.com/my-slow-rss-feed.xml"); // takes 20 seconds
var contents = result.getContentText();
cache.put("rss-feed-contents", contents, 1500); // cache for 25 minutes
return contents;
}
另请注意,截至 2014 年 4 月,脚本运行时间的限制 为 6 分钟。
G Suite 商务版/企业版/教育版和抢先体验用户:
截至 2018 年 8 月,这些用户的最大脚本运行时间现在设置为 30 分钟。
找出一种方法来拆分你的工作,这样它需要不到 6 分钟,因为这是任何脚本的限制。在第一次通过时,您可以迭代并将文件和文件夹列表存储在电子表格中,并为第 2 部分添加时间驱动的触发器。
在第 2 部分中,在处理列表时删除列表中的每个条目。当列表中没有项目时,删除触发器。
这就是我如何处理一张大约 1500 行的表格,这些表格分布到大约十几个不同的电子表格中。由于对电子表格的调用次数过多,它会超时,但会在触发器再次运行时继续。
如果您使用的是 G Suite 商务版或企业版。启用 App Maker 后,您可以注册 App Maker 的早期访问权限,您的脚本运行时会将运行时间从 6 分钟增加到 30 分钟:)
有关应用程序制造商的更多详细信息,请单击此处
我使用 ScriptDB 来保存我的位置,同时在循环中处理大量信息。该脚本可以/确实超过 5 分钟的限制。通过在每次运行期间更新 ScriptDb,脚本可以从数据库中读取状态并从中断处继续,直到所有处理完成。试试这个策略,我想你会对结果感到满意。
这是一种非常基于Dmitry Kostyuk 关于该主题的绝对优秀文章的方法。
它的不同之处在于它不会尝试计时执行并优雅地退出。相反,它故意每分钟生成一个新线程,并让它们运行直到它们被 Google 超时。这绕过了最大执行时间限制,并通过在多个线程中并行运行处理来加快处理速度。(即使您没有达到执行时间限制,这也会加快速度。)
它在脚本属性中跟踪任务状态,加上一个信号量以确保没有两个线程在任何时候编辑任务状态。(它使用了几个属性,因为每个属性限制为 9k。)
我试图模仿 Google Apps Script iterator.next()
API,但不能使用iterator.hasNext()
,因为那不是线程安全的(请参阅TOCTOU)。它在底部使用了几个外观类。
我将非常感谢任何建议。这对我来说效果很好,通过生成三个并行线程来运行文档目录,将处理时间减半。您可以在配额内生成 20 个,但这对于我的用例来说已经足够了。
该类被设计为插入式,无需修改即可用于任何目的。用户唯一必须做的就是在处理文件时,删除之前超时尝试的任何输出。如果处理任务在完成之前被 Google 超时,迭代器将fileId
多次返回给定值。
为了使日志静音,这一切都通过log()
底部的功能进行。
这是你如何使用它:
const main = () => {
const srcFolder = DriveApp.getFoldersByName('source folder',).next()
const processingMessage = processDocuments(srcFolder, 'spawnConverter')
log('main() finished with message', processingMessage)
}
const spawnConverter = e => {
const processingMessage = processDocuments()
log('spawnConverter() finished with message', processingMessage)
}
const processDocuments = (folder = null, spawnFunction = null) => {
// folder and spawnFunction are only passed the first time we trigger this function,
// threads spawned by triggers pass nothing.
// 10,000 is the maximum number of milliseconds a file can take to process.
const pfi = new ParallelFileIterator(10000, MimeType.GOOGLE_DOCS, folder, spawnFunction)
let fileId = pfi.nextId()
const doneDocs = []
while (fileId) {
const fileRelativePath = pfi.getFileRelativePath(fileId)
const doc = DocumentApp.openById(fileId)
const mc = MarkupConverter(doc)
// This is my time-consuming task:
const mdContent = mc.asMarkdown(doc)
pfi.completed(fileId)
doneDocs.push([...fileRelativePath, doc.getName() + '.md'].join('/'))
fileId = pfi.nextId()
}
return ('This thread did:\r' + doneDocs.join('\r'))
}
这是代码:
const ParallelFileIterator = (function() {
/**
* Scans a folder, depth first, and returns a file at a time of the given mimeType.
* Uses ScriptProperties so that this class can be used to process files by many threads in parallel.
* It is the responsibility of the caller to tidy up artifacts left behind by processing threads that were timed out before completion.
* This class will repeatedly dispatch a file until .completed(fileId) is called.
* It will wait maxDurationOneFileMs before re-dispatching a file.
* Note that Google Apps kills scripts after 6 mins, or 30 mins if you're using a Workspace account, or 45 seconds for a simple trigger, and permits max 30
* scripts in parallel, 20 triggers per script, and 90 mins or 6hrs of total trigger runtime depending if you're using a Workspace account.
* Ref: https://developers.google.com/apps-script/guides/services/quotas
maxDurationOneFileMs, mimeType, parentFolder=null, spawnFunction=null
* @param {Number} maxDurationOneFileMs A generous estimate of the longest a file can take to process.
* @param {string} mimeType The mimeType of the files required.
* @param {Folder} parentFolder The top folder containing all the files to process. Only passed in by the first thread. Later spawned threads pass null (the files have already been listed and stored in properties).
* @param {string} spawnFunction The name of the function that will spawn new processing threads. Only passed in by the first thread. Later spawned threads pass null (a trigger can't create a trigger).
*/
class ParallelFileIterator {
constructor(
maxDurationOneFileMs,
mimeType,
parentFolder = null,
spawnFunction = null,
) {
log(
'Enter ParallelFileIterator constructor',
maxDurationOneFileMs,
mimeType,
spawnFunction,
parentFolder ? parentFolder.getName() : null,
)
// singleton
if (ParallelFileIterator.instance) return ParallelFileIterator.instance
if (parentFolder) {
_cleanUp()
const t0 = Now.asTimestamp()
_getPropsLock(maxDurationOneFileMs)
const t1 = Now.asTimestamp()
const { fileIds, fileRelativePaths } = _catalogFiles(
parentFolder,
mimeType,
)
const t2 = Now.asTimestamp()
_setQueues(fileIds, [])
const t3 = Now.asTimestamp()
this.fileRelativePaths = fileRelativePaths
ScriptProps.setAsJson(_propsKeyFileRelativePaths, fileRelativePaths)
const t4 = Now.asTimestamp()
_releasePropsLock()
const t5 = Now.asTimestamp()
if (spawnFunction) {
// only triggered on the first thread
const trigger = Trigger.create(spawnFunction, 1)
log(
`Trigger once per minute: UniqueId: ${trigger.getUniqueId()}, EventType: ${trigger.getEventType()}, HandlerFunction: ${trigger.getHandlerFunction()}, TriggerSource: ${trigger.getTriggerSource()}, TriggerSourceId: ${trigger.getTriggerSourceId()}.`,
)
}
log(
`PFI instantiated for the first time, has found ${
fileIds.length
} documents to process. getPropsLock took ${t1 -
t0}ms, _catalogFiles took ${t2 - t1}ms, setQueues took ${t3 -
t2}ms, setAsJson took ${t4 - t3}ms, releasePropsLock took ${t5 -
t4}ms, trigger creation took ${Now.asTimestamp() - t5}ms.`,
)
} else {
const t0 = Now.asTimestamp()
// wait for first thread to set up Properties
while (!ScriptProps.getJson(_propsKeyFileRelativePaths)) {
Utilities.sleep(250)
}
this.fileRelativePaths = ScriptProps.getJson(_propsKeyFileRelativePaths)
const t1 = Now.asTimestamp()
log(
`PFI instantiated again to run in parallel. getJson(paths) took ${t1 -
t0}ms`,
)
spawnFunction
}
_internals.set(this, { maxDurationOneFileMs: maxDurationOneFileMs })
// to get: _internal(this, 'maxDurationOneFileMs')
ParallelFileIterator.instance = this
return ParallelFileIterator.instance
}
nextId() {
// returns false if there are no more documents
const maxDurationOneFileMs = _internals.get(this).maxDurationOneFileMs
_getPropsLock(maxDurationOneFileMs)
let { pending, dispatched } = _getQueues()
log(
`PFI.nextId: ${pending.length} files pending, ${
dispatched.length
} dispatched, ${Object.keys(this.fileRelativePaths).length -
pending.length -
dispatched.length} completed.`,
)
if (pending.length) {
// get first pending Id, (ie, deepest first)
const nextId = pending.shift()
dispatched.push([nextId, Now.asTimestamp()])
_setQueues(pending, dispatched)
_releasePropsLock()
return nextId
} else if (dispatched.length) {
log(`PFI.nextId: Get first dispatched Id, (ie, oldest first)`)
let startTime = dispatched[0][1]
let timeToTimeout = startTime + maxDurationOneFileMs - Now.asTimestamp()
while (dispatched.length && timeToTimeout > 0) {
log(
`PFI.nextId: None are pending, and the oldest dispatched one hasn't yet timed out, so wait ${timeToTimeout}ms to see if it will`,
)
_releasePropsLock()
Utilities.sleep(timeToTimeout + 500)
_getPropsLock(maxDurationOneFileMs)
;({ pending, dispatched } = _getQueues())
if (pending && dispatched) {
if (dispatched.length) {
startTime = dispatched[0][1]
timeToTimeout =
startTime + maxDurationOneFileMs - Now.asTimestamp()
}
}
}
// We currently still have the PropsLock
if (dispatched.length) {
const nextId = dispatched.shift()[0]
log(
`PFI.nextId: Document id ${nextId} has timed out; reset start time, move to back of queue, and re-dispatch`,
)
dispatched.push([nextId, Now.asTimestamp()])
_setQueues(pending, dispatched)
_releasePropsLock()
return nextId
}
}
log(`PFI.nextId: Both queues empty, all done!`)
;({ pending, dispatched } = _getQueues())
if (pending.length || dispatched.length) {
log(
"ERROR: All documents should be completed, but they're not. Giving up.",
pending,
dispatched,
)
}
_cleanUp()
return false
}
completed(fileId) {
_getPropsLock(_internals.get(this).maxDurationOneFileMs)
const { pending, dispatched } = _getQueues()
const newDispatched = dispatched.filter(el => el[0] !== fileId)
if (dispatched.length !== newDispatched.length + 1) {
log(
'ERROR: A document was completed, but not found in the dispatched list.',
fileId,
pending,
dispatched,
)
}
if (pending.length || newDispatched.length) {
_setQueues(pending, newDispatched)
_releasePropsLock()
} else {
log(`PFI.completed: Both queues empty, all done!`)
_cleanUp()
}
}
getFileRelativePath(fileId) {
return this.fileRelativePaths[fileId]
}
}
// ============= PRIVATE MEMBERS ============= //
const _propsKeyLock = 'PropertiesLock'
const _propsKeyDispatched = 'Dispatched'
const _propsKeyPending = 'Pending'
const _propsKeyFileRelativePaths = 'FileRelativePaths'
// Not really necessary for a singleton, but in case code is changed later
var _internals = new WeakMap()
const _cleanUp = (exceptProp = null) => {
log('Enter _cleanUp', exceptProp)
Trigger.deleteAll()
if (exceptProp) {
ScriptProps.deleteAllExcept(exceptProp)
} else {
ScriptProps.deleteAll()
}
}
const _catalogFiles = (folder, mimeType, relativePath = []) => {
// returns IDs of all matching files in folder, depth first
log(
'Enter _catalogFiles',
folder.getName(),
mimeType,
relativePath.join('/'),
)
let fileIds = []
let fileRelativePaths = {}
const folders = folder.getFolders()
let subFolder
while (folders.hasNext()) {
subFolder = folders.next()
const results = _catalogFiles(subFolder, mimeType, [
...relativePath,
subFolder.getName(),
])
fileIds = fileIds.concat(results.fileIds)
fileRelativePaths = { ...fileRelativePaths, ...results.fileRelativePaths }
}
const files = folder.getFilesByType(mimeType)
while (files.hasNext()) {
const fileId = files.next().getId()
fileIds.push(fileId)
fileRelativePaths[fileId] = relativePath
}
return { fileIds: fileIds, fileRelativePaths: fileRelativePaths }
}
const _getQueues = () => {
const pending = ScriptProps.getJson(_propsKeyPending)
const dispatched = ScriptProps.getJson(_propsKeyDispatched)
log('Exit _getQueues', pending, dispatched)
// Note: Empty lists in Javascript are truthy, but if Properties have been deleted by another thread they'll be null here, which are falsey
return { pending: pending || [], dispatched: dispatched || [] }
}
const _setQueues = (pending, dispatched) => {
log('Enter _setQueues', pending, dispatched)
ScriptProps.setAsJson(_propsKeyPending, pending)
ScriptProps.setAsJson(_propsKeyDispatched, dispatched)
}
const _getPropsLock = maxDurationOneFileMs => {
// will block until lock available or lock times out (because a script may be killed while holding a lock)
const t0 = Now.asTimestamp()
while (
ScriptProps.getNum(_propsKeyLock) + maxDurationOneFileMs >
Now.asTimestamp()
) {
Utilities.sleep(2000)
}
ScriptProps.set(_propsKeyLock, Now.asTimestamp())
log(`Exit _getPropsLock: took ${Now.asTimestamp() - t0}ms`)
}
const _releasePropsLock = () => {
ScriptProps.delete(_propsKeyLock)
log('Exit _releasePropsLock')
}
return ParallelFileIterator
})()
const log = (...args) => {
// easier to turn off, json harder to read but easier to hack with
console.log(args.map(arg => JSON.stringify(arg)).join(';'))
}
class Trigger {
// Script triggering facade
static create(functionName, everyMinutes) {
return ScriptApp.newTrigger(functionName)
.timeBased()
.everyMinutes(everyMinutes)
.create()
}
static delete(e) {
if (typeof e !== 'object') return log(`${e} is not an event object`)
if (!e.triggerUid)
return log(`${JSON.stringify(e)} doesn't have a triggerUid`)
ScriptApp.getProjectTriggers().forEach(trigger => {
if (trigger.getUniqueId() === e.triggerUid) {
log('deleting trigger', e.triggerUid)
return ScriptApp.delete(trigger)
}
})
}
static deleteAll() {
// Deletes all triggers in the current project.
var triggers = ScriptApp.getProjectTriggers()
for (var i = 0; i < triggers.length; i++) {
ScriptApp.deleteTrigger(triggers[i])
}
}
}
class ScriptProps {
// properties facade
static set(key, value) {
if (value === null || value === undefined) {
ScriptProps.delete(key)
} else {
PropertiesService.getScriptProperties().setProperty(key, value)
}
}
static getStr(key) {
return PropertiesService.getScriptProperties().getProperty(key)
}
static getNum(key) {
// missing key returns Number(null), ie, 0
return Number(ScriptProps.getStr(key))
}
static setAsJson(key, value) {
return ScriptProps.set(key, JSON.stringify(value))
}
static getJson(key) {
return JSON.parse(ScriptProps.getStr(key))
}
static delete(key) {
PropertiesService.getScriptProperties().deleteProperty(key)
}
static deleteAll() {
PropertiesService.getScriptProperties().deleteAllProperties()
}
static deleteAllExcept(key) {
PropertiesService.getScriptProperties()
.getKeys()
.forEach(curKey => {
if (curKey !== key) ScriptProps.delete(key)
})
}
}
如果您将 G Suite 作为Business、Enterprise 或 EDU客户使用,则运行脚本的执行时间设置为:
30 分钟 / 执行
请参阅:https ://developers.google.com/apps-script/guides/services/quotas
这个想法是从脚本中优雅地退出,保存您的进度,创建一个触发器以从您离开的地方重新开始,根据需要重复多次,然后在完成后清理触发器和任何临时文件。
这是关于这个主题的详细文章。
正如许多人提到的,此问题的通用解决方案是跨多个会话执行您的方法。我发现这是一个常见问题,我需要循环进行大量迭代,并且我不希望编写/维护创建新会话的样板文件的麻烦。
因此,我创建了一个通用解决方案:
/**
* Executes the given function across multiple sessions to ensure there are no timeouts.
*
* See https://stackoverflow.com/a/71089403.
*
* @param {Int} items - The items to iterate over.
* @param {function(Int)} fn - The function to execute each time. Takes in an item from `items`.
* @param {String} resumeFunctionName - The name of the function (without arguments) to run between sessions. Typically this is the same name of the function that called this method.
* @param {Int} maxRunningTimeInSecs - The maximum number of seconds a script should be able to run. After this amount, it will start a new session. Note: This must be set to less than the actual timeout as defined in https://developers.google.com/apps-script/guides/services/quotas (e.g. 6 minutes), otherwise it can't set up the next call.
* @param {Int} timeBetweenIterationsInSeconds - The amount of time between iterations of sessions. Note that Google Apps Script won't honor this 100%, as if you choose a 1 second delay, it may actually take a minute or two before it actually executes.
*/
function iterateAcrossSessions(items, fn, resumeFunctionName, maxRunningTimeInSeconds = 5 * 60, timeBetweenIterationsInSeconds = 1) {
const PROPERTY_NAME = 'iterateAcrossSessions_index';
let scriptProperties = PropertiesService.getScriptProperties();
let startTime = (new Date()).getTime();
let startIndex = parseInt(scriptProperties.getProperty(PROPERTY_NAME));
if (Number.isNaN(startIndex)) {
startIndex = 0;
}
for (let i = startIndex; i < items.length; i++) {
console.info(`[iterateAcrossSessions] Executing for i = ${i}.`)
fn(items[i]);
let currentTime = (new Date()).getTime();
let elapsedTime = currentTime - startTime;
let maxRunningTimeInMilliseconds = maxRunningTimeInSeconds * 1000;
if (maxRunningTimeInMilliseconds <= elapsedTime) {
let newTime = new Date(currentTime + timeBetweenIterationsInSeconds * 1000);
console.info(`[iterateAcrossSessions] Creating new session for i = ${i+1} at ${newTime}, since elapsed time was ${elapsedTime}.`);
scriptProperties.setProperty(PROPERTY_NAME, i+1);
ScriptApp.newTrigger(resumeFunctionName).timeBased().at(newTime).create();
return;
}
}
console.log(`[iterateAcrossSessions] Done iterating over items.`);
// Reset the property here to ensure that the execution loop could be restarted.
scriptProperties.deleteProperty(PROPERTY_NAME);
}
您现在可以很容易地使用它,如下所示:
let ITEMS = ['A', 'B', 'C'];
function execute() {
iterateAcrossSessions(
ITEMS,
(item) => {
console.log(`Hello world ${item}`);
},
"execute");
}
它会自动为 ITEMS 中的每个值执行内部 lambda,并根据需要在会话中无缝传播。
例如,如果您使用 0 秒的 maxRunningTime,它将跨 4 个会话运行,并具有以下输出:
[iterateAcrossSessions] Executing for i = 0.
Hello world A
[iterateAcrossSessions] Creating new session for i = 1.
[iterateAcrossSessions] Executing for i = 1.
Hello world B
[iterateAcrossSessions] Creating new session for i = 2.
[iterateAcrossSessions] Executing for i = 2.
Hello world C
[iterateAcrossSessions] Creating new session for i = 3.
[iterateAcrossSessions] Done iterating over items.