If you have issues with extracting strings with special chars via cypher-shell you need to verify your locale or regional settings for the output of the stream is UTF-8 or super set of UTF-8
Running an export of AuraDB Instance content via stream (redirected to file) would be subject to the local setting default encoding when written to file
call apoc.export.cypher.all(null,{format:'cypher-shell', useOptimizations: {type: 'unwind_batch', unwindBatchSize: 500}, batchSize:500})
Returns a cypher stream (can be redirected to file) but in the nodes where there are some special characters like "'" (single quote) or "-" (hyphen) etc... these are substituted by "?"
Example : This node text "Bob's bike" would become "Bob?s bike" in the exported file.
Resolution:
To correct this, ensure that the output of the stream remains in UTF-8 encoding by altering the default local settings.
Linux/MAC:
export LC_ALL="en_US.UTF-8"
Then check the output with the command locale
Windows (from Powershell)
PS C:\> Set-WinSystemLocale -SystemLocale <<code>>
and set the redirection to default to UTF-8
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
https://docs.microsoft.com/en-us/powershell/module/international/get-winsystemlocale?view=win10-ps ,
Note: in case of large volumes you may also want to adjust the heap for the cypher-shell via setting JAVA_OPTS (see https://neo4j.com/labs/apoc/4.1/export/cypher/ )
Comments
0 comments
Please sign in to leave a comment.