欢迎投稿(荐稿)计量经济圈,计量相关都行

箱:econometrics666@sina.cn

编辑整理: @计量经济圈(ID: econometrics666); 来源: 熊彼特的厨房 公众号

RDD断点回归设计越来越来越流行,在因果识别中的地位很高,因为是在一个断点处的两端进行比较,那么这个treatment效应能够较好的识别,下面这篇文章将会领着计量经济圈的圈友去认识RDD并且知道操作的步骤和Stata代码。想要进一步了解RDD这种方法以及其他因果推断技巧的,进入计量经济圈社群交流(文后)

目前常见的用于RDD计算的R包分别有rdd, rdrobust, rddtools。Thoemmes等 (2017)的论文比较了这三个包的异同。

Thoemmes(2017)从实践的角度,进一步将RD的步骤分为三类:Assumption Checks,Estimation, Sensitivity Checks。我认为这里的简要归纳要比Lee和Lemieux(2010)更清楚简洁。

Assumption Checks实际上对应了RD的三个前提假设,分别是配置变量存在断点,对应着McCrary(2008)的断点检验;实验效应发生在我们假定的断点处,而不是其他断点处,这对应着安慰剂检验(Placebo Test),在上篇第二部分有展示;实验变量仅仅对我们关心的结果变量有影响,但是对其他前定变量无影响,其实就是检验断点附近前定变量的连续性。

Estimation是对因果效应的估计,又分为参数法(Global Regression)和非参数法(Local Linear Regression),后者对应着不同带宽的选择。

Sensitivity Checks是排除结果对的模型依赖性(Model Dependency)主要是对不同估计方法、带宽选择以及模型中多项式次数的鲁棒性检查,详见上篇推文的第二部分。

当前有三个R语言包可以实现RDD,Thoemmes详细比较了三个包在以上三方面的异同,并归纳如下表。

综合比较而言,rdd的上手成本是最低的也是最基本的选择,如果你熟悉AER,sandwich这几个包的话,那几乎零成本。 rdrobust 在带宽选择和画图方面具备优势,是唯一个输出各类带宽选择及其置信区间的包,当然也是因为这个包CCT开发的。rddtools的强项在于可以自动进行预设与敏感性检查,它可以输出不同带宽选择的估计结果,还可以自动进行安稳剂检验,这是其他包不具备的。如果对于R语言比较陌生的新手,可以选择rddapp这个带有界面的包进行操作。

作者使用一个数据实例展示了三个包联合使用的效果,这里简单归纳一下。

首先展示一下配置变量与结果变量散点图,然后是三步实际操作。

1 第一步:预设检查

(1)断点检验

(2)安慰剂检测

(3)其他前定变量检测

没有画图,文中直接汇报了检验的p值

2 第二步:估计

3 第三步:敏感测试

借着这个图多说一句,带宽的选择实际上是Bias与显著性的权衡取舍,BW越小越无偏,但是也意味着SE会更大,从而导致越发不显著

当然目前还有很多是这几个R包实现不了的,作者列举了两个,一是多个配置变量的估计,二是关于RDD统计解释力(Statistical Power)的估计。

上述步骤实现的R代码如下:

#### PREPERATION ####

library(ggplot2)

library(dplyr)

library(rdd)

library(rddtools)

library(rdrobust)

# set working directory to file location

setwd("C:\\Users\\fjt36-admin\\Dropbox\\WORK\\2016 Winter Cornell\\rdd software jebs")

# read Stata Dataset generated by "./ICPSR_04091/DS0001/04091-0001-Setup.do"

dat = foreign::read.dta(’ICPSR_04091/ICPSR_04091.dta’)

#distribution of treatment, and treatment effect from randomized experiment

xtabs( ~ DC_TRT, dat)

lm_rct1 = lm(SBIQ48 ~ DC_TRT, dat)

summary(lm_rct1)

#treatment effect of randomized experiment interacted with mother’s IQ at time 0

#IQ is centered to provide average treatment effect

lm_rct2 = lm(SBIQ48 ~ DC_TRT * scale(MOMWAIS0), dat)

summary(lm_rct2)

#subset data to create artifical sharp RDD

#here assuming that only mothers with an IQ lower than the median IQ (85) are

#selected for the treament

dat_s = filter(dat,

(MOMWAIS0 >= median(MOMWAIS0) & !DC_TRT) |

(MOMWAIS0 < median(MOMWAIS0) & DC_TRT),!is.na(SBIQ48))

#table to confirm RDD

table(dat_s$DC_TRT, dat_s$MOMWAIS0)

table(dat_s$DC_TRT)

#graphical inspection of all data and then data of only the sharp RDD, here using

#loess smoother with no binning

dat$DC_TRT_LAB <- factor(dat$DC_TRT, labels = c("Control", "Treatment"))

dat_s$DC_TRT_LAB <-

factor(dat_s$DC_TRT, labels = c("Control", "Treatment"))

p1 <- ggplot(dat) +

aes(

x = MOMWAIS0,

y = SBIQ48,

group = DC_TRT_LAB,

color = DC_TRT_LAB,

shape = DC_TRT_LAB

) +

geom_point(aes(shape = DC_TRT_LAB), size = 2) +

geom_smooth(method = ’loess’,

formula = y ~ x,

aes(linetype = DC_TRT_LAB)) + theme_bw() + ylab("Child IQ at age 2") +

xlab("Mother’s IQ\n\n(a)") +

xlim(c(60, 110)) + ylim(c(60, 130)) +

scale_colour_manual(values = c("darkgrey", "black")) +

theme(legend.position = "none")

p2 <- ggplot(dat_s) +

aes(

x = MOMWAIS0,

y = SBIQ48,

group = DC_TRT_LAB,

color = DC_TRT_LAB,

shape = DC_TRT_LAB

) +

geom_point(aes(shape = DC_TRT_LAB), size = 2) +

geom_smooth(method = ’loess’,

formula = y ~ x,

aes(linetype = DC_TRT_LAB)) + theme_bw() + ylab("Child IQ at age 2") +

xlab("Mother’s IQ\n\n(b)") + xlim(c(60, 110)) + ylim(c(60, 130)) +

scale_colour_manual(values = c("darkgrey", "black")) +

theme(legend.title = element_blank()) +

scale_shape_discrete(

name = "DC_TRT",

breaks = c("0", "1"),

labels = c("Treatment", "Control")

)

multiplot(p1, p2, cols = 2)

#naive treatment effect from RDD data

lm_naive = lm(SBIQ48 ~ DC_TRT, dat_s)

summary(lm_naive)

#RDD effect using global, parametric approach "by hand" without any package

lm_rct3 = lm(SBIQ48 ~ DC_TRT * scale(MOMWAIS0), dat_s)

summary(lm_rct3)

#### "rdd" package ####

#McCrary sorting test as implemented in RDD

rdd::DCdensity(dat_s$MOMWAIS0, median(dat$MOMWAIS0), ext.out = TRUE)

# M1, using defaults

lm_rdd = rdd::RDestimate(SBIQ48 ~ MOMWAIS0, dat_s,

cutpoint = median(dat$MOMWAIS0))

summary(lm_rdd)

#plot of RDD as implemented in RDD

plot(lm_rdd)

#single non-outcome covariate test

lm_rdd_nonoutcome = rdd::RDestimate(APGAR5 ~ MOMWAIS0, dat_s,

cutpoint = median(dat$MOMWAIS0))

summary(lm_rdd_nonoutcome)

#### "rdrobust" package ####

# M2

llm_rdrobust1 = rdrobust::rdrobust(dat_s$SBIQ48, dat_s$MOMWAIS0,

c = median(dat$MOMWAIS0))

print(llm_rdrobust1)

#plot of the RDD as implemented in rdrobust, with addition of confidence interval

rdrobust::rdplot(dat_s$SBIQ48,

dat_s$MOMWAIS0,

c = median(dat$MOMWAIS0),

ci = 95)

# M3

bw = rdrobust::rdbwselect_2014(dat_s$SBIQ48,

dat_s$MOMWAIS0,

c = 85,

bwselect = ’IK’)

llm_rdrobust2 = rdrobust::rdrobust(

dat_s$SBIQ48,

dat_s$MOMWAIS0,

c = median(dat$MOMWAIS0),

h = bw$bws[1],

b = bw$bws[2]

)

print(llm_rdrobust2)

# M4

llm_rdrobust3 = rdrobust::rdrobust(dat_s$SBIQ48,

dat_s$MOMWAIS0,

c = median(dat$MOMWAIS0),

h = lm_rdd$bw[1])

print(llm_rdrobust3)

#### "rddtools" package ####

dat_rddtools = rddtools::rdd_data(

y = SBIQ48,

x = MOMWAIS0,

data = dat_s,

cutpoint = median(dat$MOMWAIS0)

)

# M5

llm_rddtools = rddtools::rdd_reg_np(dat_rddtools)

print(llm_rddtools)

# rddtools uses the same density test and plot as rdd

# (call DCDensity from rdd package)

rddtools::dens_test(llm_rddtools)

#default plot of rddtools package

plot(llm_rddtools)

#the default behavior of plotPlacebo() triggers a error, due to NA returned from

# the recalculations of bw by rddtools::rdd_bw_ik()

rddtools::plotPlacebo(llm_rddtools,

same_bw = T,

from = .25,

to = .75)

#sensitivity to bandwidth choice

rddtools::plotSensi(llm_rddtools,

from = 50,

to = 2,

by = -1)

# M6

lm_rddtools1 = rddtools::rdd_reg_lm(dat_rddtools)

print(lm_rddtools1)

# M7

lm_rddtools2 = rddtools::rdd_reg_lm(dat_rddtools,

bw = rddtools::rdd_bw_ik(dat_rddtools))

print(lm_rddtools2)

# M8

lm_rddtools3 = rddtools::rdd_reg_lm(dat_rddtools, bw = lm_rdd$bw[1])

print(lm_rddtools3)

#### "rddapp" package ####

#M9

#this is assuming that the beta version of rddapp is installed - request from authors

lm_rddapp = rddapp::RDestimate(SBIQ48 ~ MOMWAIS0, dat_s,

cutpoint = median(dat$MOMWAIS0))

summary(lm_rddapp)

计量经济圈是中国计量第一大社区,我们致力于推动中国计量理论和实证技能的提升,圈子以海内外高校研究生和教师为主。我们已经建立了计量经济圈社群,受到海内外高校研究生群体的欢迎,大家在里面分享前沿计量资料和热烈讨论各种计量方法。只要能够沉下心来积淀一段时间,我们相信这是对你的研究工作是大有裨益的,欢迎的你加入到咱们这个大家庭戳这里)。进去之后一定要看“群公告”,不然接收不了群息,也不知道怎么进入咱们的微信群和计量论坛。

打开网易新闻 查看精彩图片

帮点击一下下面的小广告,谢谢支持!