This article provides a description of the steps necessary to perform compaction or compression of your data store.
The main tool to use is
neo4j-admin copy. This tool is not available to AuraDB Instances.
To proceed you will need to get the backup snapshot from AuraDB, (dump file format) and then upload it to an EC2 instance in AWS. After that, run
neo4j-admin copy to compress the datastore and then push the compressed resulting datastore back to Aura.
Considerations of EC2 instance to select:
There are a couple of important hardware specifications that should be taken into consideration when choosing the EC2 instance to perform the
neo4j-admin copy command.
- Identify the datastore size that you will be working with to compress. Consider that you'll need twice the size of the expanded Dump file
- Choose the EC2 instance with enough memory (memory intensive) to be able to accommodate the datastore entirely or the maximum possible memory that you can use.
- Make sure to choose fast disks (EBS volume) - Recommendation is to use gp3 ebs volume
The summary of the procedure to carry out the data store compression for Aura is as follows
Set up EC2 on AWS
- Create an AWS EC2 instance with AMI Linux 2.
- Make sure to choose the appropriate CPU and RAM to be able to run
neo4j-admin copywith enough disk space to be able to accommodate both source and target datastore.
Install Browser to access Aura from the EC2 instance
The first step once we have the EC2 instance setup is to install the OS GUI with the browser to access the Aura console. Currently, the process to download the backup snapshot directly from the Aura console to the EC2 instance is through the browser by accessing the Aura Console.
This link describes installing GUI on the EC2 Linux environment.
You will need to access the GUI interface of the Linux EC2 instance using VNC (for instance TigerVNC ).
To connect using VNC: you will need to create a tunnel via ssh
After setup in SSH terminal enter
ssh -L 5901:localhost:5901 -i <pemfile.pem> email@example.com
Then open TigerVNC and connect to `localhost:1`.
Also, make sure to install any browser. Chromium works great and is easy to install.
Download the Datastore
Connect to the AuraDB Instance using a browser (Chromium installed) and download the snapshot which will be downloaded as neo4j.dump file.
The downloaded dump file will be available in the Downloads folder on the EC2 instance.
Install and Setup Neo4j
Neo4j will need to be installed to run the
neo4j-admin copy command. No need to start the Neo4j process, just install Neo4j in order to have the scripts in place. Please refer to the documentation to install Neo4j.
As a prerequisite to installing Neo4j, one will need to install Java on the EC2 instance.
Run neo4j-admin load process to extract the dump files prior to running neo4j-admin copy
Now that everything is set up and the dump file is downloaded. The next step is to extract the AuraDB Instance from the dump file.
neo4j-admin copy cannot accept dump file as an input and hence the dump file needs to be extracted.
neo4j-admin load on the dump file and provide an appropriate name for the source AuraDB Instance.
sudo neo4j-admin load --from=/home/ec2-user/Downloads/neo4j.dump --database=devprod
Run the neo4j-admin copy command
Once the contents of the dump file have been extracted using the
neo4j-admin load, the next step is to perform a
There are a couple of important points to consider before running the
neo4j-admin copy to get optimum performance.
- Make sure to set the
--from-pagecacheto as close to as possible of the source datastore size. For example, if the datastore size after performing
neo4j-admin loadis about 107 GB then set page cache to 100 GB or such.
- Set the
--to-pagecacheto a small value like 1-2 GB as the writes are sequential and allocating more page cache to target will not help.
- To compress the datastore specify
--compact-node-store. This will compact the datastore and rewrite the datastore completely in the target DB.
An important point to note is that this will rewrite the node ids. If these are used in the application (not recommended best practice) the application will not work correctly after this operation. Omitting
--compact-node-storewill not do a full compression and retain the node ids and only reclaim some of the space back (sub-optimal).
- Run it as a background process and output the messages from the command to a file as the operation could take some time based on the datastore size.
neo4j-admin copycommand will wipe out the indexes and constraints. But as part of the
neo4j-admin copyindexes and constraints that existed in the source AuraDB Instance will be output as cypher commands (create indexes and constraints statements) in the output file
Retain this file as we will use its content to create indexes and constraints in the target database.
- The target database obtained after the copy should be smaller on disk.
The full command :
sudo neo4j-admin copy --from-database=devprod --to-database=devprodcomp --compact-node-store --from-pagecache=100G --to-pagecache=2G > /tmp/devprodcopy.txt &
Run the neo4j-admin dump command on the target database
neo4j-admin copy command is complete perform the
This step is necessary to use as input for the
push-to-cloud upload feature to AuraDB.
sudo neo4j-admin dump --database=devprodcomp --to=/var/lib/neo4j/data/devprodcomp.dump
Using push-to-cloud to upload to the AuraDB
Before pushing the dump file, make sure that you have either created a new database in AuraDB or you have one you are ready to overwrite with the new content.
Push the dump file to AuraDB using
neo4j-admin push-to-cloud --bolt-uri=neo4j+s://auradb.databases.neo4j.io --dump=/var/lib/neo4j/data/devprodcomp.dump --overwrite
Create the Indexes and Constraints in the Aura target DB
The create statements for indexes and constraints that existed in the source database before the
neo4j-admin copy are all written in an output file obtained from neo4j-admin copy command. Extract the create indexes and constraints statements and run it on the new compressed AuraDB.