When you have a large dataset in CSV or JSON format and you would like to import those data into an AuraDB Instance, you can always use Cypher LOAD CSV statement to import the data.
A more efficient way is to use a very popular APOC procedure -- `apoc.periodic.iterate()`. `apoc.periodic.iterate()` has some quite interesting parameters (batchsize, parallel, retries, etc...) you can set for your task to make your job executed in a more fast and efficient way.
When you try to run this procedure from a browser or cypher-shell on your local desktop against an AuraDB Instance, that means your task will have to go from your local environment through the internet and connect to AuraDB Instance running on the public cloud, that means network stability become another key factor that needs to be considered.
One of the best practice here is to use apoc.periodic.submit() to submit your long running job to the AuraDB Instance, so that we don't need to worry about any network stability issue and everything will run on the AuraDB Instance at the backend.
Normally, we will use apoc.periodic.iterate() import CSV file.
CALL apoc.periodic.iterate(
"LOAD CSV WITH HEADERS FROM 'https://storage.googleapis.com/basics.csv' AS row FIELDTERMINATOR '\t'
RETURN row",
"MERGE (m:Movie {id: row.tconst})
ON CREATE SET
m.title = row.primaryTitle,
m.releaseYear = row.startYear,
m.genres = row.genres",
{batchSize:1000, retries: 3})
When we need to import large dataset into an AuraDB Instance, we can modify this statement by using apoc.periodic.submit()
call apoc.periodic.submit(
"insert nodes",
"CALL apoc.periodic.iterate(
'LOAD CSV WITH HEADERS FROM \"https://storage.googleapis.com/basics.csv\" AS row FIELDTERMINATOR \\'\t\\' RETURN row',
'MERGE (m:Movie {id: row.tconst})
ON CREATE SET
m.title = row.primaryTitle,
m.releaseYear = row.startYear,
m.genres = row.genres',
{batchSize:1000, retries: 3})"
)
Comments
0 comments
Please sign in to leave a comment.