0

是否可以在下面的 SAS 9.1 示例中使用哈希对象合并以下两个表?主要问题似乎是创建Value变量 w 结果数据集。问题是每次付款可以支付不止一次的费用,有时需要多次付款才能支付一次费用,这种情况可能会同时出现。问题有一些通用名称吗? http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf

data TABLE1;
input ID_client   ID_commodity    Charge;
datalines;
1             111111111      100
1             222222222      200
2             333333333      300    
2             444444444      400
2             555555555      500
;;;;
run;


data TABLE2;
input ID_client_hash     ID_ofpayment  paymentValue;
datalines;
1             11              50    
1             12              50    
1             13              100   
1             14              50    
1             15              50    
2             21              500   
2             22              200   
2             23              100   
2             24              200   
2             25              200
;;;;
run;

data OUT;
input ID_client     ID_commodity    ID_ofpayment    value;
datalines;
1               111111111             11    50
1               111111111             12    50
1               222222222             13    100
1               222222222             14    50
1               222222222             15    50
2               333333333             21    300
2               444444444             21    200
2               444444444             22    200
2               555555555             23    100
2               555555555             24    200
2               555555555             25    200
4

1 回答 1

1

这可能对你有用——我有 9.2 和 9.2 有一些显着的哈希改进,但我认为我表现得很好,只使用了 9.1 中的内容。您可以尝试将其交叉发布到 SAS-L [SAS listserv],因为我仍然相信 Paul Dorfman(即 The Hash Guru)所读到的。

我以为你想把“剩菜”贴出来。如果它没有按照您想要的方式工作,您可能需要处理该部分。这没有经过很好的测试,它适用于您的示例数据集。我称缺少 24 和 25 的商品,因为它们不用于那个。

我很确定有一种比我做的迭代更干净的方法,但是由于我使用的是 9.2+ 并且我们有可用的多数据,所以我一直使用它而不是哈希迭代器,所以我不知道更清洁的方法。

data have;
input ID_client   ID_commodity    Charge;
datalines;
1             111111111      100
1             222222222      200
2             333333333      300    
2             444444444      400
2             555555555      50
;;;;
run;


data for_hash;
input ID_client_hash     ID_ofpayment  paymentValue;
datalines;
1             11              50    
1             12              50    
1             13              100   
1             14              50    
1             15              50    
2             21              500   
2             22              200   
2             23              100   
2             24              200   
2             25              200
;;;;
run;

data want;
*Create hash and hash iterator - must use iterator since 9.1 does not allow multidata option;
if _n_ = 1 then do;
  format id_client_hash paymentValue id_ofpayment BEST12.;
  declare hash h(dataset:'for_hash' , ordered: 'a');
  h.defineKey('ID_client_hash','id_ofpayment'); *note I put id_client_hash, renaming the id - want to be able to compare them;
  h.defineData('id_client_hash','id_ofpayment','paymentValue');
  call missing(id_ofpayment,paymentValue, id_client_hash);
  h.defineDone();
  declare hiter hi('h');
end;

do _t = 1 by 1 until (last.id_client);
 set have;
 by id_client;

 *Iterate through the hash and find the first record with the same ID_client;
 do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
   rc = hi.next();
 end;

 *For the current charge record, iterate through the payment (hash) until all paid up.;
 do while (charge gt 0 and rc eq 0 and ID_client=ID_client_hash);
   if charge ge paymentValue then do; *If charge >= paymentvalue, use up the payment value;
     value = paymentValue; *so whole paymentValue is value;
     charge = charge - paymentValue; *charge is decremented by paymentValue;
     output; *output row;
     _id=ID_client_hash; 
     _pay=id_ofpayment;
     rc = hi.next();
    h.remove(key:_id,key:_pay); *remove payment row from hash now that it has been used up;
   end;
   else do; *this is if (remaining) charge is less than payment - we will not use all of the payment;
     value = charge; *value is the remainder of the charge, ie, how much of payment was actually used;
     paymentValue = paymentValue - charge; *paymentValue is the remainder of paymentValue;
     charge= 0; *charge is zero now;
     output; *output a row;
     h.replace(); *replace paymentValue in the hash with the new value of paymentValue, minus charge;
   end;
 end; *end of iteration through hash - at this point, either charge = 0 or we have run out of payments with that ID;
 if charge gt 0 then do;
   value=-1*charge;
   call missing(id_ofpayment);
   output; *output a row for the charge, which is not paid; 
 end;
 if last.id_client then do;  *this is cleanup, checking to see if we have any leftover payments;
   do while (rc=0); *iterate through the remaining hash;
     do rc = hi.first() by 0 while (rc eq 0 and ID_client ne ID_client_hash);
       rc = hi.next();
     end;
     if rc=0 then do;
         call missing(id_commodity); *to make it clear this is a leftover payment;
         value=paymentValue; *update the value;
         output; *output the payment;
         _id=ID_client_hash;
         _pay=id_ofpayment;
         rc = hi.next();
         if rc= 0 then h.remove(key:_id,key:_pay); *remove the payment just output;
     end;    
   end;
 end;
end;
keep id_client id_ofpayment id_commodity value;
run;

除此之外,这并不是非常快 - 我做了很多可能会浪费的迭代。如果您没有任何未在收费记录中表示的付款 ID_client 记录,它将相对更快 - 您所做的任何事情都会被跳过,因此最终可能会非常慢。

我不相信 hash 是更好的解决方案,至少在 9.2 之前;键控更新可能更好。UPDATE 非常适用于事务数据库结构,这似乎很接近。

于 2013-02-08T20:31:08.740 回答