0

Given the following document structure:

{
    "id": "123"
    "traits":
     {
        "abc": 6.5
        "def: 66
     }
}

I need to iterate over the documents and remove some of the traits based on criteria. A document with all traits removed should be removed as well.

Finally, I need to keep track of how many traits and documents were removed.

The update and removal operations should not be blocking and may be executed while these documents are being updated or queried.

I have implemented this in python using python-arango by using an update/replace query to remove traits and a remove query to remove documents without traits by executing the following queries:

FOR some_doc IN some_collection
    FILTER <some filter>
    LET updated_doc = ...
    REPLACE some_doc with updated_doc in some_collection OPTIONS { ignoreRevs: false, ignoreErrors: true }
FOR some_doc IN some_collection
    FILTER LENGTH(some_doc.traits)==0
    REMOVE some_doc in some_collection OPTIONS { ignoreRevs: false, ignoreErrors: true }

I then pull statistics from each returned cursor:

cursor = db.aql.execute(remove_traits_query)
stats = cursor.statistics()
modified = stats['modified']

The problem is I need to prevent the possibility that a lookup query initiated during the execution of the above process returns a document with an empty traits object, before the 2nd query (i.e. remove query) is complete.

I tried implementing a transaction then pulling job cursor stats post commit like this:

trx_db = db.begin_transaction(write=collection)
traits_removal_job = trx_db.aql.execute(remove_traits_query)
doc_deletion_job = trx_db.aql.execute(delete_query)
trx_db.commit()
stats = traits_removal_job.result().statistics()

but the cursors of the transaction jobs are empty. I suppose that's because ArangoDB executes transaction as a single Javascript function.

I could filter out empty traits on all lookup queries, but it would be better if I could execute the above removal/update operations either in a single query (impossible in ArangoDB per documentation), or in a transaction (no execution stats?).

Any suggestions?

Thanks in advance!

4

0 回答 0