Usually, when loading a large scale of data into the Neo4j AuraDB, you might be using LOAD CSV or some other batch scripts. You will then use MATCH/CREATE or MERGE to inject all the nodes and relationships into the AuraDB Instance.
If you have just created an empty AuraDB Instance and started your data loading job, you will soon realize that the data injection speed will get unbearably slow especially if you have a large data set and it will get slower and slower.
The reason behind this is each MERGE statement will have to go through all the nodes inside the AuraDB Instance and make sure there won’t be any duplicated nodes or relationships. Apparently, if you have not created any indexes or constraints on the query properties, you will have to go through all-nodes-scan for each MERGE statement and that will lead to the data loading taking a long time.
Long story short, you need to create indexes or constraints on the search criteria of the data loading cypher statements. This will make sure all the query is using index search and the data loading job will run a lot faster.
When the data is being loaded into the AuraDB Instance, all the related indexes will be updated as well. So that means it’s better NOT to create all other indexes until the data loading job is done.
To summarize the best approach for data loading:
- Create all necessary indexes or constraints for data loading queries.
- Execute data loading job.
- Create all other indexes and constraints.