直方图不同于柱形图,虽然它们都是用方柱来表示。直方图是对变量进行分组后计算组内频数也就是落于组内的观察值的个数,用这个频数对应于方柱的高度而绘制的图形。直方图的英文是histogram而方柱图是bar或者col,连续型变量的bin之间是连接的,中间没有空白,bar和col中间可以留空白。连续性变量直方图使用stat_bin统计变换,离散型变量使用stat_count统计变换。bar使用stat_count变换,col使用stat_identity变换。
频率多边形geom_frqpoly用折线代表频数,而geom_histogram用方柱代表频数。
geom_freqpoly(mapping = NULL, data = NULL, stat = "bin", position = "identity", ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
geom_histogram(mapping = NULL, data = NULL, stat = "bin", position = "stack", ..., binwidth = NULL, bins = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
stat_bin(mapping = NULL, data = NULL, geom = "bar", position = "stack", ..., binwidth = NULL, bins = NULL, center = NULL, boundary = NULL, breaks = NULL, closed = c("right", "left"), pad = FALSE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
最简单的情况:
ggplot(diamonds, aes(carat)) +
geom_histogram()
设置binwidth:
ggplot(diamonds, aes(carat)) +
geom_histogram(binwidth = 0.01)
设置bins
ggplot(diamonds, aes(carat)) +
geom_histogram(bins = 200)
使用变量分组,并堆叠显示
ggplot(diamonds, aes(price, fill = cut)) +
geom_histogram(binwidth = 500)
用折线显示
ggplot(diamonds, aes(price, colour = cut)) +
geom_freqpoly(binwidth = 500)