Just like SQL, if you do not properly connect the parts of your query, it will result in a cross (cartesian) product, which is seldom what you want. Take the following example:
MATCH (p:Person), (m:Movie)
RETURN p, m;
In Cypher, what happens is that p
contains all of the nodes in the graph with the :Person
label, and m
contains all of the nodes in the graph with the :Movie
label. Returning both of these results in a combination of each node p
being returned with each node m
, like so:
If there are three nodes with label Person
:
-
Neo,
-
Trinity, and
-
Morpheus
and three nodes with label Movie:
-
The Matrix,
-
The Matrix Reloaded, and
-
The Matrix Revolutions
The result of the above Cypher would be:
p | m |
---|---|
Neo |
The Matrix |
Neo |
The Matrix Reloaded |
Neo |
The Matrix Revolutions |
Trinity |
The Matrix |
Trinity |
The Matrix Reloaded |
Trinity |
The Matrix Revolutions |
Morpheus |
The Matrix |
Morpheus |
The Matrix Reloaded |
Morpheus |
The Matrix Revolutions |
Keep in mind this is a simple example, so the result set is small. With a production size graph this would be a very large and potentially memory intensive query.
In general, inadvertent cross products happen in more complex queries. They are common in queries that contain many WITH
clauses and a close look at the query is needed to flush out the issue. By following general performance best practices, this can easily be avoided.
- Be as specific with your query as possible.
- Make sure to use identifiers to properly tie parts of the query together.
- Only return the data you need.
- Profile slow queries so that you can see where the time and effort is spent.
Note
|
From Neo4j 2.3 on there is a warning issued in Neo4j browser or if you run your query with EXPLAIN that highlights this issue. |
Comments
0 comments
Please sign in to leave a comment.