This article provides a description of the steps necessary to perform compaction or compression of your data store.
The main tool to use is neo4j-admin copy
. This tool is not available to AuraDB Instances.
To proceed you will need to get the backup snapshot from AuraDB, (dump file format) and then upload it to an EC2 instance in AWS. After that, run neo4j-admin copy
to compress the datastore and then push the compressed resulting datastore back to Aura.
Considerations of EC2 instance to select:
There are a couple of important hardware specifications that should be taken into consideration when choosing the EC2 instance to perform the neo4j-admin copy
command.
- Identify the datastore size that you will be working with to compress. Consider that you'll need twice the size of the expanded Dump file
- Choose the EC2 instance with enough memory (memory intensive) to be able to accommodate the datastore entirely or the maximum possible memory that you can use.
- Make sure to choose fast disks (EBS volume) - Recommendation is to use gp3 ebs volume
Procedure
The summary of the procedure to carry out the data store compression for Aura is as follows
Set up EC2 on AWS
- Create an AWS EC2 instance with AMI Linux 2.
- Make sure to choose the appropriate CPU and RAM to be able to run
neo4j-admin copy
with enough disk space to be able to accommodate both source and target datastore.
Install Browser to access Aura from the EC2 instance
The first step once we have the EC2 instance setup is to install the OS GUI with the browser to access the Aura console. Currently, the process to download the backup snapshot directly from the Aura console to the EC2 instance is through the browser by accessing the Aura Console.
This link describes installing GUI on the EC2 Linux environment.
You will need to access the GUI interface of the Linux EC2 instance using VNC (for instance TigerVNC ).
To connect using VNC: you will need to create a tunnel via ssh
After setup in SSH terminal enter
ssh -L 5901:localhost:5901 -i <pemfile.pem> ec2-user@ec2-instance.eu-west-1.compute.amazonaws.com
Then open TigerVNC and connect to `localhost:1`.
Also, make sure to install any browser. Chromium works great and is easy to install.
Download the Datastore
Connect to the AuraDB Instance using a browser (Chromium installed) and download the snapshot which will be downloaded as neo4j.dump file.
The downloaded dump file will be available in the Downloads folder on the EC2 instance.
Install and Setup Neo4j
Neo4j will need to be installed to run theneo4j-admin copy
command. No need to start the Neo4j process, just install Neo4j in order to have the scripts in place. Please refer to the documentation to install Neo4j.
As a prerequisite to installing Neo4j, one will need to install Java on the EC2 instance.
Run neo4j-admin load process to extract the dump files prior to running neo4j-admin copy
Now that everything is set up and the dump file is downloaded. The next step is to extract the AuraDB Instance from the dump file. neo4j-admin copy
cannot accept dump file as an input and hence the dump file needs to be extracted.
Perform neo4j-admin load
on the dump file and provide an appropriate name for the source AuraDB Instance.
sudo neo4j-admin load --from=/home/ec2-user/Downloads/neo4j.dump --database=devprod
Run the neo4j-admin copy command
Once the contents of the dump file have been extracted using the neo4j-admin load
, the next step is to perform a neo4j-admin copy
.
There are a couple of important points to consider before running the neo4j-admin copy
to get optimum performance.
- Make sure to set the
--from-pagecache
to as close to as possible of the source datastore size. For example, if the datastore size after performingneo4j-admin load
is about 107 GB then set page cache to 100 GB or such. - Set the
--to-pagecache
to a small value like 1-2 GB as the writes are sequential and allocating more page cache to target will not help. - To compress the datastore specify
--compact-node-store
. This will compact the datastore and rewrite the datastore completely in the target DB.
An important point to note is that this will rewrite the node ids. If these are used in the application (not recommended best practice) the application will not work correctly after this operation. Omitting--compact-node-store
will not do a full compression and retain the node ids and only reclaim some of the space back (sub-optimal). - Run it as a background process and output the messages from the command to a file as the operation could take some time based on the datastore size.
- The
neo4j-admin copy
command will wipe out the indexes and constraints. But as part of theneo4j-admin copy
indexes and constraints that existed in the source AuraDB Instance will be output as cypher commands (create indexes and constraints statements) in the output file/tmp/devprodcopy.txt
.
Retain this file as we will use its content to create indexes and constraints in the target database. - The target database obtained after the copy should be smaller on disk.
The full command :
sudo neo4j-admin copy --from-database=devprod --to-database=devprodcomp --compact-node-store --from-pagecache=100G --to-pagecache=2G > /tmp/devprodcopy.txt &
Run the neo4j-admin dump command on the target database
Once theneo4j-admin copy
command is complete perform the neo4j-admin dump
.
This step is necessary to use as input for the push-to-cloud
upload feature to AuraDB. If you are using a 5.x version of Neo4j, please refer to the Using 'neo4j-admin database upload' in Neo4j 5.x to load a database dump to Neo4j Aura article instead of using push-to-cloud.
sudo neo4j-admin dump --database=devprodcomp --to=/var/lib/neo4j/data/devprodcomp.dump
Using push-to-cloud to upload to the AuraDB
Before pushing the dump file, make sure that you have either created a new database in AuraDB or you have one you are ready to overwrite with the new content.
Push the dump file to AuraDB using neo4j-admin push-to-cloud
.
neo4j-admin push-to-cloud --bolt-uri=neo4j+s://auradb.databases.neo4j.io --dump=/var/lib/neo4j/data/devprodcomp.dump --overwrite
If you are using Neo4j 5.x and above please review the below code as this command has changed to the database upload command:
bin/neo4j-admin database upload \
--verbose
--overwrite-destination=true \
--from-path=/var/lib/neo4j/data \
--to-uri=neo4j+s://123456.databases.neo4j.io\
--to-user=neo4j \
--to-password=<myTopSecret> \
devprodcomp.dump
For a more in depth look at this command and process please review the following KBA: Using 'neo4j-admin database upload' in Neo4j 5.x to load a database dump to Neo4j Aura
Create the Indexes and Constraints in the Aura target DB
The create statements for indexes and constraints that existed in the source database before the neo4j-admin copy
are all written in an output file obtained from neo4j-admin copy command. Extract the create indexes and constraints statements and run it on the new compressed AuraDB.
Comments
0 comments
Please sign in to leave a comment.