Sergey.Bochkanov wrote:
My current version is still that the data can fit into multidimensional xy array, but do not fit into 1D structure used to store them. Can you tell me sizes of tree.innerobj.xy and tree.innerobj.splits fields? These should be the largest arrays in the structure.
P.S. If you want, you can submit request for trial of commercial version. ALGLIB for C# with native computational core should not be prone to such 32-bit index limitations - on 64-bit systems it internally uses 64-bit indexes.
Here is the debugger readout of my tree:
_Tree.innerobj {alglib.nearestneighbor.kdtree} alglib.nearestneighbor.kdtree
approxf 0 double
boxmax {double[300]} double[]
boxmin {double[300]} double[]
buf {double[3000000]} double[]
curboxmax {double[300]} double[]
curboxmin {double[300]} double[]
curdist 0 double
debugcounter 0 int
idx {int[3000000]} int[]
kcur 0 int
kneeded 0 int
n 3000000 int
nodes {int[36000000]} int[]
normtype 2 int
nx 300 int
ny 0 int
r {double[3000000]} double[]
rneeded 0 double
selfmatch false bool
splits {double[6000000]} double[]
tags {int[3000000]} int[]
x {double[300]} double[]
xy {double[3000000, 600]} double[,]
xy appears to be 300,000 x 600, while splits is a 6,000,000 element single-dimensional array. My employer already holds a full commercial license to alglib -- I simply hadn't acquired the library because the free downloadable version is more readily available. I'll have to give the commercial edition a try.
As a note, the maximum size of a single dimension of an array of doubles appears to be 2,146,435,071 per the documentation of gcAllowVeryLargeObjects at https://msdn.microsoft.com/en-us/library/hh285054%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396
Update: Tried commercial version (alglib64_hpc.dll):
System.AccessViolationException was unhandled
HResult=-2147467261
Message=Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
The stacktrace is useless (only goes into my code for some reason), but the exception occus on line 2334 of alglib_hpc.cs:
int _error_code = _i_x_kdtreeserialize(&_error_msg, &_x, &_out);
In context:
public static unsafe void kdtreeserialize(kdtree obj, out string s_out)
{
byte *_error_msg = null;
byte *_out = null;
void *_x = obj.ptr;
int _error_code = _i_x_kdtreeserialize(&_error_msg, &_x, &_out);
So I'm not sure exactly how to debug/fix this. Potentially interesting is that the process currently has nearly 28GB of memory allocated.