Tweetomics: an omics seminar summarized via Twitter

July 19, 2012

[my tweets from a seminar I went to today, presented in reverse chronological order]

The middle-author problem: if you get little credit for your role in a study, where’s the incentive to check & vouch for the whole project?

Larry Kessler: As part of their training, grad students can re-run publicly available code and see whether they get the same results.

One Duke prof falsified a line of his CV, saying he got a Rhodes Scholarship … to Australia. Not really Rhodes Scholar material.

Kessler: Also, letters of concern to journals should not just be forwarded back to the authors of the article under suspicion.

Witten: How can 2 reviewers fairly evaluate a multidisciplinary 30-person paper? No way to do that, plus it isn’t their job.

Question from audience: role of journal editors and peer review in Duke case?

Kessler: Duke scientists wanted fame, promotions, grants … but weren’t aiming for royalties on diagnostic test. Wasn’t about $$ per se.

@smfullerton asks about conflict of interest, the elephant in the room. Duke scientists had vested interests in clinical trials?

Funders support discovery of exciting new approaches but are reluctant to support validation, so validation naturally gets neglected.

FDA needs to develop a guide on how to bring an omics test to the clinic. It’s been “putzing around” for a decade, Kessler says.

Institutions don’t always support multi-discipline work to ensure proper credit and accountability. What about middle authors?

There is a “bright line” between test discovery/validation on one side and evaluation for clinical utility/use on the other.

Kessler says we think we know how to do discovery and test validation, but lots of mistakes get made. Don’t validate on already-used data!

IOM report explains how to “correctly” translate omics diagnostics from bench to beside, and lists responsibilities of various parties.

Institute of Medicine report was released in March. Larry Kessler (who served on the IoM committee with Witten) will tell us about it now.

Aftermath of Duke debacle: dozens of papers retracted, careers of 162 coauthors jeopardized; public faith undermined.

The Duke problem was not just an academic one. The Duke papers, with their mistakes, were being used to guide clinical trials!

Keith Baggerly & Kevin Coombes of MD Anderson reported mistakes in publicly available Duke data (Annals of Applied Statistics 2009).

Concerns about Duke work were initially brushed off as “squabbles among statisticians.”

2006: Duke people started publishing high-profile papers on using omics to predict cancer outcomes. Others had trouble replicating results.

Challenges of omics data: (6) multidisciplinary analyses require cooperation and trust.

Challenges of omics data: (5) it’s hard to apply intuition to models with thousands of genes. Hard to tell what the data *should* look like.

Challenges of omics data: (4) expense of experiments and limited samples mean that results aren’t always validated.

Challenges of omics data: (3) complicated experiments and analysis increase the likelihood of errors.

Challenges of omics data: (2) Batch effects may cause results in one lab not to generalize to other labs.

Challenges of omics data: (1) many variables (e.g., genes) vs. number of observations (e.g., patients) leads to high risk of overfitting.

Recent @nytimes series “Genetic Gamble” highlights promise and difficulty of using genomics to guide cancer treatment.

Daniela Witten’s definition of omics: characterization of global sets of biological molecules (DNA, RNA, protein, metabolites, etc.).

Today’s topic: recommendations of Institute of Medicine panel on responsible use of omics data in clinical research.

Website for UW Biomedical Research Integrity program: http://depts.washington.edu/uwbri/

BRI seminar will feature Larry Kessler and Daniela Witten, with moderation by @smfullerton.

Now attending the UW Biomedical Research Integrity seminar “Responsible Research in the Era of Omics: Past, Present, & Future.”


  1. One quick thought: The statistical issues raised by these sorts of data are not really new, at least not qualitatively. Yes, “omics” researchers have massive numbers of response variables (e.g., expression levels of thousands of genes), and often have small sample sizes. But it’s not as if “omics” researchers are the first people in history ever to study multivariate data, or engage in multiple comparisons, or need to use exploratory analyses to develop hypotheses for subsequent testing, or to have small sample sizes, or etc. Mistakes like using the same data to first generate and then test hypotheses are literally undergraduate-level mistakes, which do not arise from lack of intuition about what these supposedly-complex data “should” look like. “Omics” researchers love to bang on about how huge and complex their datasets are–as if that somehow excuses or explains making really basic statistical errors. Yes, the typical features of “omics” datasets–many response variables, small sample sizes, etc.–will dictate one’s choice of statistical approach and constrain the precise questions one wants to or can ask. But that’s not at all the same thing as a license to do silly things like conflate exploratory and hypothesis-testing analyses.

    I only vaguely remember reading about the case in question, and of course there were many more problems with this work, including ethical problems. But in terms of the statistical issues, am I totally off-base here?

    • Nobody at the seminar was asking for a license to do silly statistical things. Quite the opposite, actually; they were arguing for stricter oversight, in part because results are used to guide decisions about treating patients. That aside, the presenters seemed to be arguing that mistakes are easier to make and harder to catch in a clinical omics context because of the confluence of factors #1-6 (even though these may not individually be unique to clinical genome stuff). I worry especially about #5 (inability to apply intuition as a reality check when there are hundreds of associations and perhaps no hypothesis) and #6 (giant research teams in which middle authors receive minimal academic credit and thus may not be highly motivated to worry about parts other than their own).

      • Oh, I know they don’t *want* to do silly statistical things. But it sounds to me like they unwittingly *did* do silly statistical things. That is, it still sounds to me like an unmentioned contributing factor here was “no one involved had a firm grasp of basic statistical principles”. Maybe I’m misunderstanding and the errors they made were quite subtle and technical, but so far it doesn’t sound like it to me…

        Perhaps if these people had read my blog ;-)

        http://dynamicecology.wordpress.com/2011/12/12/darwin-in-space-or-spurious-correlation-exemplified/ ;-)

  2. While it’s theoretically possible that nobody involved in the Duke studies had a clue about statistics, I think a more likely explanation is that the stats-savvy people on the team were not empowered to check the work of the others, or weren’t sufficiently committed to the project to do so. That is, irrespective of ignorance, sloppiness, or even fraud by individuals, I think a more general and perhaps more interesting point is that weird things can happen to large datasets collected by large groups of researchers, and that special precautions may be needed to address those.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

%d bloggers like this: