我想重新采样我的数据帧,以便分辨率从几分钟变为几小时。这种重新采样应仅在工作时间进行。但是,当使用 Pandas 的内置对象时,我遇到了 ValueError。这很奇怪,因为当使用 Hour 偏移对象时它确实有效。请参阅下面的步骤。
>>> import yfinance as yf
>>> import pandas as pd
>>> from pandas.tseries.offsets import Hour, BusinessHour
# Example data with 5 minute interval
>>> mydata = yf.Ticker('ABN.AS').history(interval='5m', period='1mo', start='2020-06-01')
使用 Pandas 的 Hour 偏移对象重新采样mydata
,该过程按预期工作。
>>> mydatahour = mydata.resample(Hour()).mean()
Open High Low Close Volume Dividends Stock Splits
Datetime
2020-06-01 09:00:00+02:00 7.610833 7.648333 7.588333 7.625000 235704.416667 0.0 0.0
2020-06-01 10:00:00+02:00 7.580000 7.602500 7.551667 7.580000 139965.166667 0.0 0.0
2020-06-01 11:00:00+02:00 7.621667 7.634167 7.608333 7.620833 88396.833333 0.0 0.0
2020-06-01 12:00:00+02:00 7.581667 7.595000 7.570000 7.581667 49044.166667 0.0 0.0
2020-06-01 13:00:00+02:00 7.622500 7.628333 7.615833 7.621667 44186.666667 0.0 0.0
... ... ... ... ... ... ... ...
2020-07-09 09:00:00+02:00 8.000000 8.015000 7.973333 7.989167 59975.333333 0.0 0.0
2020-07-09 10:00:00+02:00 7.970000 7.976667 7.952500 7.965000 26809.250000 0.0 0.0
2020-07-09 11:00:00+02:00 7.920000 7.927500 7.909167 7.915833 18170.166667 0.0 0.0
2020-07-09 12:00:00+02:00 7.892500 7.900000 7.885000 7.891667 22465.583333 0.0 0.0
2020-07-09 13:00:00+02:00 7.887692 7.893846 7.880769 7.889231 18324.153846 0.0 0.0
使用 Pandas 的 BusinessHour 对象重新采样mydata
会引发以下异常:
>>> mydatahour = mydata.resample(BusinessHour()).mean()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\bramb\Documents\codes\fanalysis\.venv\lib\site-packages\pandas\core\resample.py", line 937, in g
return self._downsample(_method)
File "C:\Users\bramb\Documents\codes\fanalysis\.venv\lib\site-packages\pandas\core\resample.py", line 1020, in _downsample
self._set_binner()
File "C:\Users\bramb\Documents\codes\fanalysis\.venv\lib\site-packages\pandas\core\resample.py", line 178, in _set_binner
self.binner, self.grouper = self._get_binner()
File "C:\Users\bramb\Documents\codes\fanalysis\.venv\lib\site-packages\pandas\core\resample.py", line 186, in _get_binner
binner, bins, binlabels = self._get_binner_for_time()
File "C:\Users\bramb\Documents\codes\fanalysis\.venv\lib\site-packages\pandas\core\resample.py", line 1009, in _get_binner_for_time
return self.groupby._get_time_bins(self.ax)
File "C:\Users\bramb\Documents\codes\fanalysis\.venv\lib\site-packages\pandas\core\resample.py", line 1449, in _get_time_bins
ax_values, bin_edges, self.closed, hasnans=ax.hasnans
File "pandas\_libs\lib.pyx", line 673, in pandas._libs.lib.generate_bins_dt64
ValueError: Values falls after last bin
在使用具有营业时间频率('BH')的 Grouper 对象时,使用 Pandas 的 groupbymydata
会给我与上述相同的错误。
>>> mydatahour2 = mydata.groupby(pd.Grouper(freq='BH')).mean()
为什么会出现此异常,我该如何想出解决方法?我尝试对 pandas 的 Hour 偏移对象进行子集化,但这是只读的,因此我无法以这种方式对其进行调整。