8. Re: A join in proc sql takes 5 hours and consumes a lot of disk space
SergioSanchez Jun 4, 2014 7:34 AM (in response to PGStats)The data are in an Oracle Server and I make a copy of the datasets connecting to the server through a libname statement and afterdata a;set b (where = (var1<= date and var2>date and var3>date); where "b" is the dataset on the Oracle serverI do a left join, the target in most of the cases is to obtain the surrogate key of the second table and one or two variables more.Something like thisproc sql;create table x asselect a.*, b.var1, b.var2, b.var3from dataset1 as a left join dataset2 as bon (a.var1 = b.var1);quit;There is no index at all in any of the datasets and no message appears in the log about an issue in the performanceHelpful Answer9. Re: A join in proc sql takes 5 hours and consumes a lot of disk space
KurtBremser Jun 4, 2014 8:01 AM (in response to SergioSanchez)Try this:proc sortdata=b /* this is your original oracle data set */(where = (var1<= date and var2>date and var3>date)out=dataset1;by var1;run;proc sortdata=dataset2 (keep=var1 var2 var3)out=data2x;by var1;run;data x;mergedataset1 (in=a)data2x;by var1;if a;run;Compare this method and the SQL method using options fullstimer;Also watch the disks while the jobs are running; you may be surprised by the disk usage(s).I remember when I first came across a piece of code done by a SAS consultant that had > 100 lines. I quickly saw that I could do the same in one create table with ~ 10 lines in PROC SQL, so why bother with all that code? Then I had to wait 5 hours for my SQL to finish, while his code took about 20 minutes to produce the same result. With less than half the disk space.
No comments:
Post a Comment