By removing an unnecessary label in a Cypher query, the number of DB (database) hits will be reduced, resulting in faster query execution.
Hence it is important to evaluate if a label is necessary in a query or not.
We will use the default Movie Dataset available when creating Aura to demonstrate this. You can see how to use this dataset here: https://neo4j.com/developer/example-data/
We will attempt to optimize the below query as an example:
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p, r, m
Here is what a profile for this query shows:
PROFILE MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p, r, m

We can see 2215 total DB hits.
The important part is the Filter@neo4j block, which checks if a node is a 'Person' node.
The important part is the Filter@neo4j block, which checks if a node is a 'Person' node.
When this query executes,
- It will first find an anchor node which in this case is the Movie node.
- It will then check the relationships and traverse the ACTED_IN relationships and the nodes attached to them, reaching the Person nodes.
- Finally, the query will check if the related nodes have the Person label, as shown in the screenshot above.
By looking at the data model, we understand that if an ACTED_IN relationship is traversed from a Movie node, the related node will always be a node with the Person label. Hence this final check is not necessary.

We can change the query as below. Notice that we are using just p, the variable for the node and omitting the ':Person' label:
MATCH (p)-[r:ACTED_IN]->(m:Movie) RETURN p, r, m
Here is what a profile for this query shows:
PROFILE MATCH (p)-[r:ACTED_IN]->(m:Movie) RETURN p, r, m

We can still retrieve and return the Movie nodes, but the execution will be faster as we removed the need to check the node for the Person label.
The Filter@neo4j block is no longer there, and there are fewer DB hits. 1871 DB hits compared to the initial 2215 DB hits.
Such optimization can be applied in many cases and can help improve the performance of your queries. Naturally, the larger the number of nodes returned by a query, the more beneficial this technique will be.
You should always use the PROFILE or EXPLAIN keyword before your query when evaluating performance. PROFILE will execute the query and show you the plan and EXPLAIN will show you the plan, and the estimated DB hits without executing the query.
In this example, the graph is very small, so the variability in latency can mean any particular run of the query may take longer than the less performant version.
This will not be the case as an instance scales up.
This will not be the case as an instance scales up.
Please be aware that any changes should be tested and verified, as omitting critical labels can have a negative performance impact, depending on your graph's data model. It is also worth noting that when a new query is executed for the first time, it will take longer because an execution plan needs to be created and cached for it.
Comments
0 comments
Please sign in to leave a comment.