Research Program Archives
Upcoming Research Program meetings are now listed on the Seminar page.
April 22, 2008
Diana Miglioretti
Group Health Center for Health Studies, Seattle
Misleading marginal analyses in settings where random effect variances depend on covariates
Clustered data are commonly collected in medical studies and typically analyzed using either marginal or conditional modeling approaches to account for potential correlation induced by unmeasured heterogeneity among clusters. In many cases, it is reasonable to expect that the magnitude of this heterogeneity may depend on a covariate that varies either between or within clusters. We show that when a covariate influences both the conditional mean and the random effect variance, marginal analyses may provide misleading results, suggesting there is no covariate effect or even an effect in the opposite direction of the conditional effect. Conditional models that falsely assume a constant random effect variance may also provide biased estimates. We use simulations to show that this bias decreases as the cluster sizes get larger when the random effect variance depends on between-cluster covariate, but that the bias remains regardless of cluster size when the random effect variance depends on a within-cluster covariate. For conditional models we show how to accommodate non-constant (either between or within cluster) random effects variances. We illustrate our findings using data from the Breast Cancer Surveillance Consortium to examine the effect of radiologist experience on the interpretive performance of mammography.
Joint work with Sebastien JPA Haneuse, Charles McCulloch and John Neuhaus.
March 18, 2008
Ru-Fang Yeh
Division of Biostatistics and CBMB, UCSF
Statistical Inference of Gene regulatory modules from linked multi-level molecular profiling data
As high-throughput biotechnology matures and becomes ubiquitous for monitoring various molecular changes, many studies have begun to generate linked genomic data that simultaneouosly profile events during the multi-step process of gene expression. There have been well-established analytic methods for typical problems using one microarray platform at a time, such as differential expression analysis of transcriptome arrays and DNA aberration detection by array competitive genome hybridization. However, integrative analysis of linked genomic data is largely under-explored and limited to anecdotal successes. Using brain tumor as an example, I will discuss statistical challenges and open problems arising from the hierarchical nature and networked dependency of such data, and present some preliminary results of identifying microRNA-regulated gene modules from mRNA expression and array CGH data using gene set tests and simple linear models.
February 19, 2008
Steve Gregorich
Department of Medicine and CAPS, UCSF
REPEATED MEASURES MODELS WITH MULTIPLE, CORRELATED RANDOM EFFECTS
I will discuss two types of random effects models for repeated measures, with example applications to data from a prospective study of women with non-cancerous uterine conditions. First, I will describe an associative latent growth model, which allows for estimation of covariation between temporal changes (trajectories) in multiple dimensions. Such models are illustrated using longitudinal assessments of women's self-reported sexual, physical, and mental health (e.g., are changes in sexual functioning associated with changes in mental health status?). Second, I will describe spline models of pre- and post-hysterectomy health-related quality of life (HRQOL) trajectories as well as the 'instantaneous' HRQOL change (or 'bump') attributable to the surgical intervention. These models include correlated random intercept, trajectory, and 'bump' effects, which address interesting research questions: e.g., are women's pre-surgical HRQOL trajectories associated with the 'instantaneous' HRQOL changes that are attributable the surgical intervention?
January 29, 2008
Kevin Delucchi
Department of Psychiatry, UCSF
Capturing Group Membership via Growth Mixture Models: A Simulation Study
This talk presents the results of a series of simulations examining the ability of latent growth mixture models GMM) to capture group membership and intercept and slope parameters. The basic model was that of a common design; a longitudinal study with four equally-spaced assessment points. Underlying the observed data were two known populations from which the observed data were generated. We then fit a two-class GMM to the observed data, assigned subjects to their most likely class and compared that assignment to their true membership and the estimates of the intercept and slope to population values. A total of 56 conditions were simulated from a 7 x 2 x 2 x 2 factorial design with 1000 samples per condition: Seven levels of degree of imbalance of sample sizes, 2 differences in intercept means, 2 difference in slope means, and 2 levels of residual variance. This was conducted for total Ns of 300 and 900 with uneven numbers per group.
Focusing on the extreme effect-size conditions, the percentage correctly classified ranged from 58% to 88% correct. When the effects for slope and intercept were large the percentage correctly classified for the larger group increased as the N in that group increased. For the smaller group, as the N declined from 420 to 300, the correct classification rate dropped but then increased as the N declined further reaching a high of only 72%. As the sample size of the larger of the two groups increased, the mean estimated intercept approached the true value of 0. The mean estimates for the second class under the large effect approached the true value of 1.0 up until the N declines to n=240, then, as the sample size declined further, the estimate drops back away from the true value. For the large effect case the mean estimates of slope approached the true value of 0.5 as the N increased for the larger group. For the smaller group, the mean estimated slope initially approached the true value of -0.5 but never reached it and moved away from it in the smallest sample size condition.
These results raise concerns about the quality of results based on GMMs. This brings into question results from applied analyses in which study participants are assigned to their most likely latent group and then are compared on covariates. Further details, implications and future plans will be presented.
January 8, 2008
Dennis Osmond
Department of Epidemiology & Biostatistics, UCSF
An overview of the uses of propensity scores, propensity score weights, and instrumental variables
December 4, 2007
Mei Polley
Department of Neurosurgery, UCSF
Two-Stage Designs for Dose-Finding Trials with a Biologic Endpoint Using Stepwise Tests
We tackle the problem of early phase dose-finding trials with monotone biologic endpoints such as biologic measurements and laboratory values. A specific aim of this type of trial is to identify the minimum dose that exhibits adequate drug activity and shifts the mean of the endpoint from a zero dose, the so-called minimum effective dose. Stepwise tests for dose-finding have been well studied in the context of non-human studies where the sampling plan is done in one stage. We extend the notion of stepwise tests to a two-stage setting in an attempt to reduce the sample size requirement by shutting down unpromising doses in a futility interim. Specifically, we examine four two-stage designs and apply them to design a statin trial with four doses and a placebo in patients with Hodgkin's disease. We discuss the calibration of the design parameters and the implementation of these methods.
Joint work with Ken Cheung, Department of Biostatistics, Columbia University.
November 6, 2007
Rebecca Scherzer
Metabolism Section, VAMC
Closed testing procedures for group sequential clinical trials with multiple survival endpoints
Clinical trials often involve multiple survival endpoints with group sequential monitoring, but most studies specify a primary outcome, following a univariate approach and ignoring multiplicity. This research gives methods for such data. We illustrate the use of marginal proportional hazards models with a Lan and DeMets (1983) type alpha spending function to test multiple survival endpoints at K interim analyses. To adjust for multiplicity at each interim analysis, we consider and extend methods developed by Tang and Geller (1999) and Follmann, et al. (1994). These methods are motivated using survival data from a clinical study of primary biliary cirrhosis. Type I error and power are examined using simulation studies. Slides
October 16, 2007
Mark Segal
Division of Biostatistics and CBMB, UCSF
Re-Cracking the Second Genetic Code
In a recent, widely celebrated, computational biology paper Segal et al., (Nature, 2007) provide extensive evidence supporting the existence of a second genetic code embodied in DNA. This second code pertains to the positioning of nucleosomes (the fundamental repeating subunits of all eukaryotic chromatin) which are responsible for packaging DNA into chromosomes inside the cell nucleus and controlling gene expression. Here, we re-evaluate both the basis for, and performance of, the proposed nucleosome positioning code. Tools employed in this process include the spectral envelope and discriminatory motif finding.
May 8, 2007
Mary Lesperance
Department of Mathematics & Statistics, University of Victoria
GRAPHICAL TECHNIQUES FOR GENE EXPRESSION STUDIES
Correspondence analysis (CA) is a descriptive technique designed for investigating the association between row and column variables by graphically displaying the patterns in the data. It has been widely applied to categorical data. We explore and develop variations of CA techniques to identify differentially expressed genes and to assess the quality of replicate DNA arrays.
Multiple correspondence analysis (MCA) and a related technique called joint correspondence analysis (JCA) are methods for visualizing the joint features of 2 or more categorical variables. We have been working with the Genetic Pathology Evaluation Centre (GPEC) at UBC and the Breast Outcomes Unit (BCOU) at the B.C. Cancer Agency (BCCA) to study relationships between molecular markers and outcomes for breast cancer. Molecular markers and diagnostic variables are typically categorized as positive/negative by pathologists and oncologists, whereas outcome measures such as time to recurrence or breast cancer specific survival time are continuous and possibly censored. We consider fuzzy coding methods to display survival information in an MCA analysis of molecular markers.
March 13, 2007
Chuck McCulloch
Professor and Division Head of Biostatistics, UCSF
The good, the bad, and the ugly of joint modeling using shared and correlated random effects
Multiple outcomes, often with very different marginal distributions, are common in studies in a variety of scientific fields. But direct specification of joint distributions is difficult and has led many to consider building correlated data models through conditional independence along with shared or correlated random effects. After a brief review of these models and some motivating examples, I elaborate the consequences, good and bad, of this type of model specification. An example of a joint model is illustrated using data from the Osteoarthritis Initiative.
February 22, 2007
Yu Shen
Department of Biostatistics, M. D. Anderson Cancer Center
Inference of Tamoxifen's Effects on Prevention of Breast Cancer from a Randomized Controlled Trial
Breast cancer is the most common non-skin cancer among women in the United States, and continues to be an important cause of morbidity and mortality for women at high risk of developing the disease. The advent of preventive intervention and early detection of cancer brings greater hope to the control of breast cancer, while also posing significant challenges to researchers and public health policy makers. To provide quantitative frameworks to describe the natural history of breast cancer; assess the impact of the primary preventive intervention on the natural progression of the disease, we propose a flexible semiparametric model to assess the effects of a preventive agent on the incidence of breast cancer as well as time to the diagnosis of the disease, separately, in the framework of a cure-rate model. We used an estimating equation approach to estimate the unknown parameters, and assessed the semiparametric model assumption with a test based on the area between two survival curves. This is a joint work with Qin and Costantino.
February 13, 2007
Michael Rosenblum
Division of Biostatistics, UCB
Latex Diaphragms In Preventing HIV Among Women:
Statistical Issues in the Methods for Improving Reproductive Health in Africa (MIRA) trial, Conducted Jointly by UCSF and University of Zimbabwe
The MIRA trial is a randomized, controlled trial with two arms, in which a primary intervention (diaphragms and gel) is given only to the treatment arm, and a secondary intervention (condom provision and counseling) is given to both treatment and control arms. In this setting, the standard intent to treat analysis, which compares the mean outcomes in the treatment and control arms, gives an unbiased estimate of the causal effect of assignment to the diaphragm and gel arm, in the presence of condom counseling. However, it may be of more public health interest to estimate the effectiveness of the primary intervention in the absence of the secondary intervention. We attempt to estimate a related parameter: what the causal effect of assignment to the diaphragm and gel arm would be if condom use were set at a fixed level. We describe how we implement a direct effects analysis to estimate this parameter from data collected in the MIRA trial.
January 16, 2007
Ying Lu and Caixia Li
Department of Radiology, UCSF
The added utility of a variable in the presence of other covariates
September 22, 2005
Su-Chun Cheng
Associate Professor of Biostatistics, UCSF
COMBINING MULTIPLE DIAGNOSTIC TESTS WITH NONPARAMETRIC TRANSFORMATION MODELS FOR CLASSIFYING CENSORED EVENT TIMES
November 17, 2004
Kevin Delucchi
Associate Adjunct Professor of Psychiatry, UCSF
LATENT PATTERNS OF CHANGE IN LONGITUDINAL DRUG ABUSE RESEARCH
November 3, 2004
Saunak Sen
Assistant Professor of Biostatistics, UCSF
QUANTITATIVE TRAIT MAPPING STUDY DESIGNS FROM AN INFORMATION PERSPECTIVE
June 2, 2004
John Witte
Professor of Epidemiology & Biostatistics, and Urology, UCSF
STATISTICAL ISSUES IN FAMILY STUDIES
May 5, 2004
John Kornak
UCSF Department of Radiology and Department of Epidemiology & Biostatistics / VA Medical Center Magnetic Resonance Unit
IMPROVING THE EFFECTIVE RESOLUTION OF LOW SIGNAL MAGNETIC RESONANCE IMAGING MODALITIES BY INCORPORATING HIGH RESOLUTION STRUCTURAL INFORMATION
April 21, 2004
Tor Tosteson
Associate Professor of Community and Family Medicine (Biostatistics), Biostatistics Director, MCRC/SPORT, Dartmouth Medical School
CHANGES IN FUNCTIONAL HEALTH STATUS ASSOCIATED WITH TREATMENT FOR LUMBAR SPINE DISORDERS: Methods for longitudinal analysis
Statistical methods for longitudinal data analysis are proposed to provide estimates of change in patient outcomes associated with treatment for discherniation and spinal stenosis in two observational studies with longitudinal follow-up. Special statistical issues include nonlinear trends, unequal timing of visits, variable timing for surgical treatment, treatment, strong regression to the mean and potentially biased followup rates. Surgically treated patients show greater functional health status gains than for non-surgically treated patients, although some inconsistencies are noted between the two studies. The methods and results are discussed in the context of the ongoing Spine Patients Outcomes Research Trial (SPORT).
April 7, 2004
Su-Chun Cheng
Associate Professor of Biostatistics, UCSF
JOURNAL CLUB
Margaret S. Pepe, Tianxi Cai, and Zheng Zhang, "Robust Binary Regression for Optimally Combining Predictors" (April 18, 2003). UW Biostatistics Working Paper Series. Working Paper 198.
March 24, 2004
Chuck McCulloch
Professor and Head of Biostatistics, UCSF
REPEATED MEASURES LOGISTIC REGRESSION: MARGINAL AND CONDITIONAL MODELS
March 10, 2004
Eric Vittinghoff
Associate Adjunct Professor of Biostatistics, UCSF
JOURNAL CLUB
NP Jewell and MJ van der Laan (2002). Current Status Data: Review, Recent Developments and Open Problems.
February 25, 2004
Nancy Hills
Department of Epidemiology, UC Berkeley
STATISTICAL ISSUES IN THE ANALYSIS OF DATA FROM STUDIES OF HUMAN PAPILLOMA VIRUS
January 28, 2003
Peter Gilbert
Statistical Center for HIV/AIDS Research & Prevention (SCHARP)
SENSITIVITY ANALYSES COMPARING OUTCOMES MEASURED ONLY IN A SUBSET SELECTED POST-RANDOMIZATION, WITH APPLICATION TO HIV VACCINE TRIALS
January 14, 2003
John Neuhaus
Professor of Biostatistics, UCSF
JOURNAL CLUB DISCUSSION OF TWO PAPERS ON RESPONSE-SELECTIVE SAMPLING DESIGNS
J. Scott and C. J. Wild. (1997). Fitting Regression Models to Case-Control Data by Maximum Likelihood. Biometrika 84:57-71.
J. F. Lawless, J. D. Kalbfleisch, C. J. Wild. (1999). Semiparametric methods for response-selective and missing data problems in regression. Journal of the Royal Statistical Society, Series B 61:413-438.
December 3, 2003
Ying Lu
Associate Adjunct Professor of Radiology, UCSF
THE OPTIMAL COMBINATION OF MULTIPLE DIAGNOSTIC VARIABLES
November 19, 2003
John Neuhaus
Professor of Biostatistics, UCSF
THE ANALYSIS OF CLUSTERED DATA WITH RESPONSE DEPENDENT CLUSTER SIZES
November 5, 2003
John Kornak
UCSF/VA Medical Center Magnetic Resonance Unit
ISSUES IN THE STATISTICAL ANALYSIS OF fMRI DATA
October 22, 2003
Steve Gregorich
Assistant Adjunct Professor of Medicine/DGIM, UCSF
FITTING MIXED LOGIT MODELS VIA ADAPTIVE QUADRATURE (PART II): BIAS AND COVERAGE OF FIXED AND RANDOM PARAMETER ESTIMATES
October 8, 2003
Eric Vittinghoff
Associate Adjunct Professor of Biostatistics, UCSF
A COST-EFFICIENT CASE-ONLY METHOD FOR EXPLORATORY ANALYSIS OF TREATMENT-COVARIATE INTERACTIONS IN RANDOMIZED TRIALS WITH FAILURE-TIME ENDPOINTS
September 24, 2003
Su-Chun Cheng
Associate Professor of Biostatistics, UCSF
SEMIPARAMETRIC REGRESSION ANALYSIS OF MEAN RESIDUAL LIFE WITH CENSORED SURVIVAL DATA
September 10, 2003
Hua Jin
UCSF Department of Radiology
TREE STRUCTURED SURVIVAL ANALYSIS
May 21, 2003
Saunak Sen
Assistant Professor, CBMB, UCSF
A DISCUSSION ON FALSE DISCOVERY RATES
based on the following three papers:
Yoav Benjamini, Yosef Hochberg. (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57(1):289-300.
Joel Ira Weller, Jiu Zhou Song, David W. Heyen, Harris A. Lewin, and Micha Ron. (1998) A New Approach to the Problem of Multiple Comparisons in the Genetic Dissection of Complex Traits. Genetics 150:1699-1706.
John D. Storey. (2002) A direct approach to false discovery rates. Journal Of The Royal Statistical Society Series B 64(3):479-498, 2002.
April 30, 2003
Peter Bacchetti
Adjunct Professor of Biostatistics, UCSF
SURVIVAL REGRESSION METHODS FOR VERY SMALL STUDIES
April 2, 2003
Steve Gregorich
Assistant Adjunct Professor, DGIM, UCSF
AN ASSESSMENT OF THE PERFORMANCE OF MIXED-EFFECT LOGISTIC MODELS WITH AN EMPHASIS ON CONVERGENCE, BIAS, AND COVERAGE
February 19, 2003
Joan Hilton
Associate Professor of Biostatistics, UCSF
USE OF BASELINE VALUES IN LINEAR MIXED EFFECTS MODELS
February 5, 2003
Peter Bacchetti
Adjunct Professor of Biostatistics, UCSF

