JCS Focus |《社会学方法与研究》最新目录与摘要|approach|complete|methods|using|with|社会学方法与研究

JCS Focus

— TheJournal of Chinese Sociology —

本周JCS Focus

将继续为大家带来

社会学国际顶刊

Sociological Methods & Research

（《社会学方法与研究》）

最新目录及摘要

期刊简介

Sociological Methods & Research

关于SMR

Sociological Methods & Research（《社会学方法与研究》，简称SMR）致力推动社会学成为一门累积性的实证科学，欢迎多元主题的研究，但强调文章能够通过方法论问题的系统呈现增进对相关领域现实问题的理解，以及整合既有学术研究。该刊发表综述性文章，特别是批判性分析的研究，同时也欢迎那些论证充分且能提供新发现的原创性研究。总体而言，SMR特色鲜明，其高度关注对社会学科学地位的评估，用稿范围广泛且灵活，而且非常鼓励作者与编辑就稿件的适当性进行沟通。

本期内容

SMR 为季刊，最新一期内容（Volume 53 Issue 3, August 2024）分为“Aticles”和“Corrigendum”两部分，共计16篇文章，详情如下。

原版目录

原文摘要

Sociological Methods & Research

Do Quantitative and Qualitative Research Reflect two Distinct Cultures? An Empirical Analysis of 180 Articles Suggests “no”

David Kuehn, Ingo Rohlfing

The debate about the characteristics and advantages of quantitative and qualitative methods is decades old. In their seminal monograph, A Tale of Two Cultures (2012, ATTC), Gary Goertz and James Mahoney argue that methods and research design practices for causal inference can be distinguished as two cultures that systematically differ from each other along 25 specific characteristics. ATTC’s stated goal is a description of empirical patterns in quantitative and qualitative research. Yet, it does not include a systematic empirical evaluation as to whether the 25 are relevant and valid descriptors of applied research. In this paper, we derive five observable implications from ATTC and test the implications against a stratified random sample of 90 qualitative and 90 quantitative articles published in six journals between 1990–2012. Our analysis provides little support for the two-cultures hypothesis. Quantitative methods are largely implemented as described in ATTC, whereas qualitative methods are much more diverse than ATTC suggests. While some practices do indeed conform to the qualitative culture, many others are implemented in a manner that ATTC characterizes as constitutive of the quantitative culture. We find very little evidence for ATTC's anchoring of qualitative research with set-theoretic approaches to empirical social science research. The set-theoretic template only applies to a fraction of the qualitative research that we reviewed, with the majority of qualitative work incorporating different method choices.

A Crash Course in Good and Bad Controls

Carlos Cinelli, Andrew Forney, Judea Pearl

Many students of statistics and econometrics express frustration with the way a problem known as “bad control” is treated in the traditional literature. The issue arises when the addition of a variable to a regression equation produces an unintended discrepancy between the regression coefficient and the effect that the coefficient is intended to represent. Avoiding such discrepancies presents a challenge to all analysts in the data intensive sciences. This note describes graphical tools for understanding, visualizing, and resolving the problem through a series of illustrative examples. By making this “crash course” accessible to instructors and practitioners, we hope to avail these tools to a broader community of scientists concerned with the causal interpretation of regression models.

A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation

Roderick J. Little, James R. Carpenter, Katherine J. Lee

Missing data are a pervasive problem in data analysis. Three common methods for addressing the problem are (a) complete-case analysis, where only units that are complete on the variables in an analysis are included; (b) weighting, where the complete cases are weighted by the inverse of an estimate of the probability of being complete; and (c) multiple imputation (MI), where missing values of the variables in the analysis are imputed as draws from their predictive distribution under an implicit or explicit statistical model, the imputation process is repeated to create multiple filled-in data sets, and analysis is carried out using simple MI combining rules. This article provides a non-technical discussion of the strengths and weakness of these approaches, and when each of the methods might be adopted over the others. The methods are illustrated on data from the Youth Cohort (Time) Series (YCS) for England, Wales and Scotland, 1984–2002.

Attendance, Completion, and Heterogeneous Returns to College: A Causal Mediation Approach

Xiang Zhou

A growing body of social science research investigates whether the economic payoff to a college education is heterogeneous — in particular, whether disadvantaged youth can benefit more from attending and completing college relative to their more advantaged peers. Scholars, however, have employed different analytical strategies and reported mixed findings. To shed light on this literature, I propose a causal mediation approach to conceptualizing, evaluating, and unpacking the causal effects of college on earnings. By decomposing the total effect of attending a four-year college into several direct and indirect components, this approach not only clarifies the mechanisms through which college attendance boosts earnings, but illuminates the ways in which the postsecondary system may be both an equalizer and a stratifier. The total effect of college attendance, its direct and indirect components, and their heterogeneity across different subpopulations are all identified under the assumption of sequential ignorability. I introduce a debiased machine learning (DML) method for estimating all quantities of interest, along with a set of bias formulas for sensitivity analysis. I illustrate the proposed framework and methodology using data from the National Longitudinal Survey of Youth, 1997 cohort.

The Augmented Social Scientist: Using Sequential Transfer Learning to Annotate Millions of Texts with Human-Level Accuracy

Salomé Do, Étienne Ollion, Rubing Shen

The last decade witnessed a spectacular rise in the volume of available textual data. With this new abundance came the question of how to analyze it. In the social sciences, scholars mostly resorted to two well-established approaches, human annotation on sampled data on the one hand (either performed by the researcher, or outsourced to microworkers), and quantitative methods on the other. Each approach has its own merits - a potentially very fine-grained analysis for the former, a very scalable one for the latter - but the combination of these two properties has not yielded highly accurate results so far. Leveraging recent advances in sequential transfer learning, we demonstrate via an experiment that an expert can train a precise, efficient automatic classifier in a very limited amount of time. We also show that, under certain conditions, expert-trained models produce better annotations than humans themselves. We demonstrate these points using a classic research question in the sociology of journalism, the rise of a “horse race” coverage of politics. We conclude that recent advances in transfer learning help us augment ourselves when analyzing unstructured data.

A Bayesian Semi-Parametric Approach for Modeling Memory Decay in Dynamic Social Networks

Giuseppe Arena, Joris Mulder, Roger Th. A.J. Leenders

In relational event networks, the tendency for actors to interact with each other depends greatly on the past interactions between the actors in a social network. Both the volume of past interactions and the time that has elapsed since the past interactions affect the actors’ decision-making to interact with other actors in the network. Recently occurred events may have a stronger influence on current interaction behavior than past events that occurred a long time ago–a phenomenon known as “memory decay”. Previous studies either predefined a short-run and long-run memory or fixed a parametric exponential memory decay using a predefined half-life period. In real-life relational event networks, however, it is generally unknown how the influence of past events fades as time goes by. For this reason, it is not recommendable to fix memory decay in an ad-hoc manner, but instead we should learn the shape of memory decay from the observed data. In this paper, a novel semi-parametric approach based on Bayesian Model Averaging is proposed for learning the shape of the memory decay without requiring any parametric assumptions. The method is applied to relational event history data among socio-political actors in India and a comparison with other relational event models based on predefined memory decays is provided.

A Sample Size Formula for Network Scale-up Studies

Nathaniel Josephs, Dennis M. Feehan, Forrest W. Crawford

The network scale-up method (NSUM) is a survey-based method for estimating the number of individuals in a hidden or hard-to-reach subgroup of a general population. In NSUM surveys, sampled individuals report how many others they know in the subpopulation of interest (e.g. “How many sex workers do you know?”) and how many others they know in subpopulations of the general population (e.g. “How many bus drivers do you know?”). NSUM is widely used to estimate the size of important sociological and epidemiological risk groups, including men who have sex with men, sex workers, HIV+ individuals, and drug users. Unlike several other methods for population size estimation, NSUM requires only a single random sample and the estimator has a conveniently simple form. Despite its popularity, there are no published guidelines for the minimum sample size calculation to achieve a desired statistical precision. Here, we provide a sample size formula that can be employed in any NSUM survey. We show analytically and by simulation that the sample size controls error at the nominal rate and is robust to some forms of network model mis-specification. We apply this methodology to study the minimum sample size and relative error properties of several published NSUM surveys.

Comparing Egocentric and Sociocentric Centrality Measures in Directed Networks

Weihua An

Egocentric networks represent a popular research design for network research. However, to what extent and under what conditions egocentric network centrality can serve as reasonable substitutes for their sociocentric counterparts are important questions to study. The answers to these questions are uncertain simply because of the large variety of networks. Hence, this paper aims to provide exploratory answers to these questions by analyzing both empirical and simulated data. Through analyses of various empirical networks (including some classic albeit small ones), this paper shows that egocentric betweenness approximates sociocentric betweenness quite well (the correlation is high across almost all the networks being examined) while egocentric closeness approximates sociocentric closeness only reasonably well (the correlation is a bit lower on average with a larger variance across networks). Simulations also confirm this finding. Analyses further show that egocentric approximations of betweenness and closeness seem to work well in different types of networks (as featured by network size, density, centralization, reciprocity, transitivity, and geodistance). Lastly, the paper briefly presents three ideas to help improve egocentric approximations of centrality measures.

The Design and Optimality of Survey Counts: A Unified Framework Via the Fisher Information Maximizer

Xin Guo, Qiang Fu

Grouped and right-censored (GRC) counts have been used in a wide range of attitudinal and behavioural surveys yet they cannot be readily analyzed or assessed by conventional statistical models. This study develops a unified regression framework for the design and optimality of GRC counts in surveys. To process infinitely many grouping schemes for the optimum design, we propose a new two-stage algorithm, the Fisher Information Maximizer (FIM), which utilizes estimates from generalized linear models to find a global optimal grouping scheme among all possible N-group schemes. After we define, decompose, and calculate different types of regressor-specific design errors, our analyses from both simulation and empirical examples suggest that: 1) the optimum design of GRC counts is able to reduce the grouping error to zero, 2) the performance of modified Poisson estimators using GRC counts can be comparable to that of Poisson regression, and 3) the optimum design is usually able to achieve the same estimation efficiency with a smaller sample size.

The Additional Effects of Adaptive Survey Design Beyond Post-Survey Adjustment: An Experimental Evaluation

Shiyu Zhang, James Wagner

Adaptive survey design refers to using targeted procedures to recruit different sampled cases. This technique strives to reduce bias and variance of survey estimates by trying to recruit a larger and more balanced set of respondents. However, it is not well understood how adaptive design can improve data and survey estimates beyond the well-established post-survey adjustment. This paper reports the results of an experiment that evaluated the additional effect of adaptive design to post-survey adjustments. The experiment was conducted in the Detroit Metro Area Communities Study in 2021. We evaluated the adaptive design in five outcomes: 1) response rates, 2) demographic composition of respondents, 3) bias and variance of key survey estimates, 4) changes in significant results of regression models, and 5) costs. The most significant benefit of the adaptive design was its ability to generate more efficient survey estimates with smaller variances and smaller design effects.

Sequential On-Device Multitasking within Online Surveys: A Data Quality and Response Behavior Perspective

Jean Philippe Décieux

The risk of multitasking is high in online surveys. However, knowledge on the effects of multitasking on answer quality is sparse and based on suboptimal approaches. Research reports inconclusive results concerning the consequences of multitasking on task performance. However, studies suggest that especially sequential-multitasking activities are expected to be critical. Therefore, this study focusses on sequential-on-device-multitasking activities (SODM) and its consequences for data quality. Based on probability-based data, this study aims to reveal the prevalence of SODM based on the javascript function OnBlur, to reflect the its determinants and to examine the consequences for data quality. Results show that SODM was detected for 25% of all respondents and that respondent attributes and the device used to answer the survey are related to SODM. Moreover, it becomes apparent that SODM is significantly correlated to data quality measures. Therefore, I propose SODM behavior as a new instrument for researching suboptimal response behavior.

Measuring Class Hierarchies in Postindustrial Societies: A Criterion and Construct Validation of EGP and ESEC Across 31 Countries

Oscar Smallenbroek, Florian R. Hertel, Carlo Barone

In social stratification research, the most frequently used social class schema are based on employment relations (EGP and ESEC). These schemes have been propelled to paradigms for research on social mobility and educational inequalities and applied in cross-national research for both genders. Using the European Working Conditions Survey, we examine their criterion and construct validity across 31 countries and for both genders. We investigate whether classes are welldelineated by the theoretically assumed dimensions of employment relations and we assess how several measures of occupational advantage differ across classes. We find broad similarity in the criterion validity of EGP and ESEC across genders and countries as well as satisfactory levels of construct validity. However, the salariat classes are too heterogeneous and their boundaries with the intermediate classes are blurred. To improve the measurement of social class, we propose to differentiate managerial and professional occupations within the lower and higher salariat respectively. We show that implementing these distinctions in ESEC and EGP improves their criterion validity and allows to better identify privileged positions.

Assessing the Impact of the Great Recession on the Transition to Adulthood

Guanglei Hong, Ha-Joon Chung

The impact of a major historical event on child and youth development has been of great interest in the study of the life course. This study is focused on assessing the causal effect of the Great Recession on youth disconnection from school and work. Building on the insights offered by the age-period-cohort research, econometric methods, and developmental psychology, we innovatively develop a causal inference strategy that takes advantage of the multiple successive birth cohorts in the National Longitudinal Study of Youth 1997. The causal effect of the Great Recession is defined in terms of counterfactual developmental trajectories and can be identified under the assumption of short-term stable differences between the birth cohorts in the absence of the Great Recession. A meta-analysis aggregates the estimated effects over six between-cohort comparisons. Furthermore, we conduct a sensitivity analysis to assess the potential consequences if the identification assumption is violated. The findings contribute new evidence on how precipitous and pervasive economic hardship may disrupt youth development by gender and class of origin.

Improving Estimates Accuracy of Voter Transitions. Two New Algorithms for Ecological Inference Based on Linear Programming

Jose M. Pavía, Rafael Romero

The estimation of RxC ecological inference contingency tables from aggregate data is one of the most salient and challenging problems in the field of quantitative social sciences, with major solutions proposed from both the ecological regression and the mathematical programming frameworks. In recent decades, there has been a drive to find solutions stemming from the former, with the latter being less active. From the mathematical programming framework, this paper suggests a new direction for tackling this problem. For the first time in the literature, a procedure based on linear programming is proposed to attain estimates of local contingency tables. Based on this and the homogeneity hypothesis, we suggest two new ecological inference algorithms. These two new algorithms represent an important step forward in the ecological inference mathematical programming literature. In addition to generating estimates for local ecological inference contingency tables and amending the tendency to produce extreme transfer probability estimates previously observed in other mathematical programming procedures, these two new algorithms prove to be quite competitive and more accurate than the current linear programming baseline algorithm. Their accuracy is assessed using a unique dataset with almost 500 elections, where the real transfer matrices are known, and their sensitivity to assumptions and limitations are gauged through an extensive simulation study. The new algorithms place the linear programming approach once again in a prominent position in the ecological inference toolkit. Interested readers can use these new algorithms easily with the aid of the R package lphom.

Image Clustering: An Unsupervised Approach to Categorize Visual Data in Social Science Research

Han Zhang, Yilang Peng

Automated image analysis has received increasing attention in social scientific research, yet existing scholarship has mostly covered the application of supervised learning to classify images into predefined categories. This study focuses on the task of unsupervised image clustering, which aims to automatically discover categories from unlabelled image data. We first review the steps to perform image clustering and then focus on one key challenge in this task—finding intermediate representations of images. We present several methods of extracting intermediate image representations, including the bag-of-visual-words model, self-supervised learning, and transfer learning (in particular, feature extraction with pretrained models). We compare these methods using various visual datasets, including images related to protests in China from Weibo, images about climate change on Instagram, and profile images of the Russian Internet Research Agency on Twitter. In addition, we propose a systematic way to interpret and validate clustering solutions. Results show that transfer learning significantly outperforms the other methods. The dataset used in the pretrained model critically determines what categories the algorithms can discover.

Corrigendum

Corrigendum to “Individual Components of Three Inequality Measures for Analyzing Shapes of Inequality”

Liao, T. F. (2022). Individual Components of Three Inequality Measures for Analyzing Shapes of Inequality. Sociological Methods & Research, 51(3), 1325-1356. https://doi.org/10.1177/0049124119875961

In this article by Tim Futing Liao, a summation sign inadvertently slipped into Equations (8) and (9). The correct equations should be as:

The errors in the equations did not affect the iIneq R package available at CRAN referred to in the Author’s Note in the paper. Both the iTheilT and the iTheilL commands in the package compute the correct results based on the correct formulae.

以上就是本期 JCS Focus 的全部内容啦！

期刊/趣文/热点/漫谈

学术路上，

JCS 陪你一起成长！

关于 JCS

《中国社会学学刊》（The Journal of Chinese Sociology）于2014年10月由中国社会科学院社会学研究所创办。作为中国大陆第一本英文社会学学术期刊，JCS致力于为中国社会学者与国外同行的学术交流和合作打造国际一流的学术平台。JCS由全球最大科技期刊出版集团施普林格·自然(Springer Nature)出版发行，由国内外顶尖社会学家组成强大编委会队伍，采用双向匿名评审方式和“开放获取”(open access)出版模式。JCS已于2021年5月被ESCI收录。2022年，JCS的CiteScore分值为2.0（Q2），在社科类别的262种期刊中排名第94位，位列同类期刊前36%。2023年，JCS在科睿唯安发布的2023年度《期刊引证报告》（JCR）中首次获得影响因子并达到1.5（Q3）。