- If you have loaded a lot of data that you then have deleted, your aura instance disk usage may not reflect well in the metrics displayed in the Monitoring tabs.
- This behaviour is normal and results from an optimization of the datastore: nodes ID reuse.
- This means when a node is deleted, it essentially becomes unreferenced, and instead of involving a write operation (that would consume IOPS and involve a system call), it only affects the metadata.
- As a result, the datastore size on disk doesn't shrink accordingly but becomes essentially "uncompacted" with many unused disk data space reserved but not used.
See more information on space reuse
Currently, Aura does not offer the compaction feature as standard.
While we work to deliver this important management feature, you should follow these steps as the current approach for datastore compaction:
- (on-prem) Set up a Neo4j with the same minor version as Aura (ex Version 4.3.5 for Aura-4.3). We recommend using the Enterprise version to ensure you have access to the essential database administration tools for the next operations. You can also perform the operations through Neo4j Desktop for an easy selection of the version and administration. Also,
- (Aura) Download a Dump file (Export) from your Aura instance
-
(on-prem) Load the dump file
neo4j-admin load
, this will create the store and should require the same disk space as in your Aura instance (ie before disk compaction). See details here
For Neo4j version 5.x, this would beneo4j-admin database load
. See details here - (on-prem) Start the database to validate it is imported properly (from Desktop)
- (on-prem) Stop the database (from Desktop)
-
(on-prem) Run
neo4j-admin copy --compact-node-store
to copy and compact the datastore. See details here.
For Neo4j version 5.x, this would beneo4j-admin copy --compact-node-store
See details here
Note that you should consider the following:- All indexes will not be copied, but a file containing the commands to recreate them later is generated.
- Your machine needs a lot of memory and excellent disk performance (see note on IOPS)
- If your datastore is large you need to help allocate more memory to process large chunks at a time :
--to-pagecache=4G
--from-pagecache=16G
(as close to as possible to the source datastore size) - Disk space: you will need the original store size + the compacted size (or use different disks/partitions ).
- (on-prem) Start the compacted database to validate it starts and that the copy operation worked successfully.
- (on-prem) Stop the database.
-
(on-prem) Create a Dump file (will be based on compacted data)
neo4j-admin dump
. See details here. -
(on-prem) Run
push-to-cloud
the size validation that happens on the client side should succeed. See details here
For Neo4j version 5.x, this would beneo4j-admin database upload
. See details here - (Aura) Once the dump is loaded, go and verify the Metrics -> Storage Used (%) chart and the size change should be reflected (note you may need to select the 6hours time range to see it reflected)
- Connect to your AuraDB Instance and run the cypher commands to re-create the indexes based on the output file from Step #6.1.
Comments
0 comments
Please sign in to leave a comment.