Typically, when loading data into Neo4j Aura from CSV files, the simplest way is to use Cypher's `LOAD CSV` commands in the Neo4j Browser. However, there is another option for loading data that can be much faster, though it requires some prep work ahead of time.
By using the Neo4j bulk import tool along with the push-to-cloud tool you can load data directly from source CSV files. Because the bulk import tool requires use of the neo4j-admin command, which is not available for Neo4j AuraDB Instances, there are a few steps to make this method work. In short, you would import the data into a local Neo4j database and then, once the data is loaded, push-to-cloud can be used to load it into your AuraDB Instance. We'll walk through those steps in this tutorial.
Before we begin there are some things you'll need to download first, if you haven't already. They are:
- The Neo4j Desktop. This will be used to install your local database and work with it.
- A copy of our Northwind database ZIP package (attached to this tutorial). We'll use this collection of CSV files in the example steps below. While this collection of CSV files are based on our published Northwind samples, they have some modifications for use in importing. We encourage you to review those changes so you can use them as a guide when working with your own data.
- Your own AuraDB Instance.
Starting your local database
After downloading Neo4j Desktop, click on the Add Graph button and choose Create a Local Graph. Give the graph a descriptive name, assign a password, and choose the version. For this, we'll assume you've chosen the latest patch release of 3.5 (3.5.14 at the time of writing). Don't start the graph after creating it as the import tool needs to work on a halted database.
Importing the data locally
Download the attached Northwind ZIP file. Then, in the Neo4j Desktop, in the box with the graph created previously, click the Manage button. Click Terminal to open a terminal window in the graph's install directory. Copy the downloaded ZIP archive to the Import directory in that terminal window. For example, on MacOS you would run:
cp ~/Downloads/northwind.zip import/
cd import && unzip northwind.zip && cd ..
Each of the CSV files has an ID field identified in the header line. For instance, the column in orders.csv named OrderID has been changed to orderID:ID(OrderID). This is because neo4j-admin import needs to know what the ID field for each node will be.
This version of the Northwind CSV files has additional files for the relationships between the types of nodes. For instance, again looking at orders.csv, each order recorded in the file has various ProductID's associated with it. We'll want to correlate these to the PRODUCT nodes that will be created from the Products.csv file. To do so, we need to use some mandatory labels: :START_ID and :END_ID. The orders_products.csv file does this for us:
:START_ID(OrderID),:END_ID(ProductID),UnitPrice,Quantity
10248,11,14,12
10248,42,9.8,10
10248,72,34.8,5
10249,14,18.6,9
10249,51,42.4,40
This tells the import tool that, when it's creating relationships between ORDER and PRODUCT nodes, that the relationship goes from the order to the product, as orders contain products (it also creates properties on that relationship for UnitPrice and Quantity). All of the other relationships we'll need are represented in other CSV files.
The actual import of the data will tell Neo4j exactly what all of this data means, and how to process it. A complete explanation of each of the options to the neo4j-admin import command are out of scope for this tutorial. However, we encourage you to review the documentation on the command for more details.
Go back to your Neo4j Desktop terminal window and run:
bin/neo4j-admin import --ignore-missing-nodes --ignore-duplicate-nodes \
--nodes:CUSTOMER=import/northwind/customers.csv \
--nodes:PRODUCT=import/northwind/products.csv \
--nodes:SUPPLIER=import/northwind/suppliers.csv \
--nodes:EMPLOYEE=import/northwind/employees.csv \
--nodes:CATEGORY=import/northwind/categories.csv \
--nodes:ORDER=import/northwind/orders.csv \
--relationships:PRODUCT=import/northwind/orders_products.csv \
--relationships:SOLD=import/northwind/employee_sold.csv \
--relationships:PURCHASED=import/northwind/customer_orders.csv \
--relationships:SUPPLIES=import/northwind/supplier_products.csv \
--relationships:PART_OF=import/northwind/product_categories.csv \
--relationships:REPORTS_TO=import/northwind/employee_reports_to.csv
This will tell the import tool what nodes to create, which CSV files to use for those nodes, and what relationships to create between nodes. The two ignore options will tell it to ignore any missing or duplicated nodes.
Push-to-cloud
The import should only take a few seconds. Once it completes, you can start the push-to-cloud. Using the Bolt URI for your AuraDB Instance run the following command:
bin/neo4j-admin push-to-cloud --bolt-uri=<your bolt uri> --overwrite=true --database=graph.db --verbose=true
The push-to-cloud will almost certainly take longer than the initial import. We recommend you take this time to refresh your coffee, tea, or soda.
Once the push-to-cloud completes you can make use of the freshly imported data to run reports or use the built-in tutorial to learn more about Cypher using the built in :play northwind-graph command in the Aura browser. Because you've already loaded the data you can skip the first few screens in the tutorial.
Using the steps above and referring to our documentation on neo4j-admin import, you should have no trouble importing your own large datasets into an AuraDB Instance. But if you do run into any problems, or have any questions, we'll be happy to help.
Comments
0 comments
Please sign in to leave a comment.