Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
我有一个包含 1900 万行不同客户(约 10K 客户)的数据框,用于他们在不同日期范围内的日常消费。我已将此数据重新采样为每周消耗,结果数据框为 2M 行。我想知道每个客户的连续日期范围,并选择具有最大(范围)的日期。有任何想法吗?谢谢!
如果您可以发布一些示例代码,那就太好了,这样回复会更具体。
您可能想做一些事情,比如earliest = df.groupby('Customer_ID').min()['Consumption_date']获取每个客户的最早消费日期,以及latest = df.groupby('Customer_ID').max()['Consumption_date']最晚的消费日期,然后取差time_span = latest-earliest来获取每个客户的时间跨度。
earliest = df.groupby('Customer_ID').min()['Consumption_date']
latest = df.groupby('Customer_ID').max()['Consumption_date']
time_span = latest-earliest
知道具体的 df 和变量名会很棒