r - 函数“cut”中的区间上限

Question

我想以某种方式对数据框进行分类R。
假设有如下数据框：

> data = sample(1:500, 5000, replace = TRUE)

为了对这个数据框进行分类，我正在制作这些类：

> data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500))
> table(data.cl)
data.cl
   (0,10]   (10,20]   (20,30]   (30,40]   (40,50] 
      102        80        87       113       117 
  (50,60]   (60,70]   (70,80]   (80,90]  (90,100] 
      101        89        95       106       104 
(100,200] (200,350] (350,480] (480,500] 
     1002      1492      1318       194

如果我想0被包括在内，我只需要添加include.lowest = TRUE：

> data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500),
+ include.lowest = TRUE)
    > table(data.cl)
data.cl
   [0,10]   (10,20]   (20,30]   (30,40]   (40,50] 
      102        80        87       113       117 
  (50,60]   (60,70]   (70,80]   (80,90]  (90,100] 
      101        89        95       106       104 
(100,200] (200,350] (350,480] (480,500] 
     1002      1492      1318       194

在此示例中，这没有显示任何差异，因为0此数据帧中根本没有出现。但是，如果它会，例如，在 class 中4会有元素106而不是元素：102[0,10]

> data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500),
+ include.lowest = TRUE)
    > table(data.cl)
data.cl
   [0,10]   (10,20]   (20,30]   (30,40]   (40,50] 
      106        80        87       113       117 
  (50,60]   (60,70]   (70,80]   (80,90]  (90,100] 
      101        89        95       106       104 
(100,200] (200,350] (350,480] (480,500] 
     1002      1492      1318       194

更改班级限制还有另一种选择。的默认选项cut()是right = FALSE。如果你改变它，right = TRUE你会得到：

> data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500),
+ include.lowest = TRUE, right = FALSE)
> table(data.cl)
data.cl
   [0,10)   [10,20)   [20,30)   [30,40)   [40,50) 
       92        81        87       111       118 
  [50,60)   [60,70)   [70,80)   [80,90)  [90,100) 
      103        89        94       103       103 
[100,200) [200,350) [350,480) [480,500] 
     1003      1497      1320       199

include.lowest现在变为“<code>include.highest”，代价是更改类限制，因此在某些类中返回不同数量的类成员，因为类限制略有变化。
但是如果我想要数据框

> data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500))
> table(data.cl)
data.cl
   (0,10]   (10,20]   (20,30]   (30,40]   (40,50] 
      102        80        87       113       117 
  (50,60]   (60,70]   (70,80]   (80,90]  (90,100] 
      101        89        95       106       104 
(100,200] (200,350] (350,480] (480,500) 
     1002      1492      1318       194

也排除 500，我该怎么办？
当然，人们可以说：“只写data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 499))而不是data.cl = cut(data, breaks = c(seq(0,100,by=10), 200, 350, 480, 500))，因为您正在处理整数。”<br> 没错，但如果不是这种情况，我会使用浮点数来代替? 那我怎么排除500呢？

r - 函数“cut”中的区间上限

0 回答 0

Related

Reference