|
Post by ezmoney on Feb 18, 2013 4:04:21 GMT -5
I have been running thru some data general processing.
2 billion plus bytes about 18 million records...
But the progam has haulted for the last few times with a server error and I have to either rerun or take what is done.
There is no way to restart as it has a merge of two files. It would be a hard issue and spend a lot of time finding the last posted to the new file if that was the case.
This has worked flawlessly until the file became rather large. Most is just read compare and write data
Comments?
Suggestions?
Other?
|
|
|
Post by roguelantern on Feb 21, 2013 2:18:20 GMT -5
Are you reading the data file into an array for merging?
For starters, you might want to check what is going on with your system resources: is the program running stable or slowly eating up the memory or your HD space? I have not done big data file merges with Runbasic, but have done that occasionally with Processing, and it is quite common to run out of memory.
Another thing to look for is to check if the memory allowance for Runbasic can be increased. Not sure what the limitation is now, or how to find it, unfortunately.
|
|
|
Post by meerkat on Feb 21, 2013 9:06:11 GMT -5
You may also consider using SQLite. When starting your program you could always look at your last record and process from there. Finding the last entry would be something like: SELECT * from yourFile where rowid = (SELECT max(a.rowid) FROM yourFile as a) To merge files you could: SELECT * FROM yourFilea LEFT JOIN yourFileb ON yourFilea.something = yourFileb.something AND yourFilea.something1 = yourFileb.something1 .... and continue with the merged data
HTH.. Dan
|
|
|
Post by ezmoney on Feb 21, 2013 21:01:29 GMT -5
Being the file was large is the reason I avoided any arrays...
just a read, compare and print operation..
The 2 records can only be in 1 of 3 states, state 1 ... one record is less than the other state 2 ... one record matches the other record indicting a duplicate record. state 3 ... one record is larger then the other.
The lower value is always pushed first as the data is in alphbetical order.
Works smooth, and run thus the data nicely.. just has a tendancy to have the server error at or near the end. When i print the data file it shows it went thru the ZZ's...
Thus the program must of failed after that but that is only a few lines printing the count of the records 1. last merged as input, 2. new records input, 3. duplicates, 4. new merged file output. 5. records not in master
All of this has worked with smaller files as the master grew in size. It has got me stumped as to why it does this...
I think I may take out the few print statements just to see what happens.
|
|