Archive for September, 2010

This recent post reminded me of something that I’ve thought about before.  First, an excerpt:

Fifteen thousand years ago, our ancestors bred dogs to serve man. In merely 150 centuries, we shaped collies to herd our sheep and pekingese to sit in our emperor’s sleeves. Wild wolves can’t understand us, but we teach their domesticated counterparts tricks for fun. And, most importantly of all, dogs get emotional pleasure out of serving their master. When my family’s terrier runs to the kennel, she does so with blissful, self-reinforcing obedience.

When I hear amateur philosophers ponder the meaning of life, I worry humans suffer from the same embarrassing shortcoming.

I’d expect us to shout “life is without mandated meaning!” with lungs full of joy. There are no rules we have to follow, only the consequences we choose for us and our fellow humans. Huzzah!

But most humans want nothing more than to surrender to a powerful force. …

Suppose we did not have any evidence for or against the existence of God.  But, tomorrow the answer is going to be revealed.  What should we be rooting for?  What would make us happier?  To find out that we were created by God, or to find out there is no God?

Do answers to these questions correlate with belief, and, if so, what’s the direction of the causality?  If we prefer God does that affect how we see the evidence (motivated cognition)?  Or, do we first establish a belief about God and then start rationalizing to convince ourselves that the way things are is the way we prefer them?  I have heard a few agnostics and atheists say they wish there was God.  I don’t recall theists saying they wish God didn’t exist (although that kind of thinking is forbidden, right?)

Personally, I LOVE the freedom and uncertainty that comes with not having a conscious designer.  Why do some people prefer God?   Is immortality the only selling point?  (I’m sincerely curious)


Read Full Post »

I have had the following situation happen several times during my research career:  I write code to analyze data; there is some expectation about what the results will be; after running the program, the results are not what was expected; I go back and carefully check the code to make sure there are no errors; sometimes I find an error

No matter how careful you are when it comes to writing computer code, I think you are more likely to find a mistake if you think there is one.  Unexpected results lead one to suspect a coding error more than expected results do.

In general, researchers usually do have general expectations about what they will find (e.g., the drug will not increase risk of the disease; the toxin will not decrease risk of cancer).

Consider the following graphic:

Here, the green region is consistent with what our expectations are.  For example, if we expect a relative risk (RR) of about 1.5, we might not be too surprised if the estimated RR is between (e.g.) 0.9 and 2.0.  Anything above 2.0 or below 0.9 might make us highly suspicious of an error — that’s the red region.  Estimates in the red region are likely to trigger serious coding error investigation.  Obviously, if there is no coding error then the paper will get submitted with the surprising results.

Error scenarios

Let’s assume that there is a coding error that causes the estimated effect to differ from the true effect (assume sample size large enough to ignore sampling variability).

Consider the following scenario:

Type A. Here, the estimated value is biased, but it’s within the expected range.  In this scenario, error checking is probably more casual and less likely to be successful.

Next, consider this scenario:

Type B. In this case, the estimated value is in the red zone.  This triggers aggressive error checking of the type that has a higher success rate.


Type C. In this case it’s the true value that differs from our expectations.  However, the estimated value is about what we would expect.  This triggers casual error checking of the less-likely-to-be-successful variety.

If this line of reasoning holds, we should expect journal articles to contain errors at a higher rate when the results are consistent with the authors’ prior expectations. This could be viewed as a type of confirmation bias.

How common are coding errors in research?

There are many opportunities for hard-to-detect errors to occur.  For large studies, there might be hundreds of lines of code related to database creation, data cleaning, etc., plus many more lines of code for data analysis.  Studies also typically involve multiple programmers.  I would not be surprised if at least 20% of  published studies include results that were affected by at least one coding error.  Many of these errors probably had a trivial effect, but I am sure others did not.

cross-posted at lesswrong (where you will find many interesting comments)

Read Full Post »


A study with ‘negative’ findings (e.g., not a statistically significant exposure effect) is less likely to get published than a study with a positive result.  Publication bias likely is due to several factors:

  • editors and referees are less likely to give a favorable review if it’s a negative study
    • journals have a financial incentive to publish ‘sexy’ findings (sexy findings get media attention); negative studies are typically not as sexy
    • even if editors are consciously trying not to reject negative studies simply because they’re negative, it’s possible that they are more critical of other aspects of the paper than they would have been if the findings were positive.  That is, they could reject the pay by deceiving themselves that they’re rejecting it for other reasons
  • researchers are less likely to submit an article to a journal if it’s a negative study (file drawer effect).  I’ve seen this many times in my own collaborations
    • this is probably largely due to reviewers knowing that negative studies are less likely to get published

Registering trials in a public database is a step in the right direction, but what about observational studies?


Thankfully, there is a very simple solution.  Journal articles should be submitted with the results left out. The paper would contain all of the usual things: introduction/background; description of data (sample sizes; summary statistics (excluding the outcome)); variable definitions; analytical methods; and even shells of tables and figures.  The only things not included would be the actual results.

Reviewers would make a publication decision based on (a) are they asking important research questions that have not yet been definitively answered?; (b) is their research topic one that the journal would be interested in publishing?  (c) are their data appropriate for addressing the research topic?; (d) are their analytic methods appropriate?; (e) are they going to present the results in an appropriate fashion?

Once the publication decision has been made, then the authors would insert the results into the paper.

This has several advantages:

  • reviewers would not be biased against negative studies because they would not know if it was a negative study
  • the file drawer effect would be reduced (I suspect substantially) because reviewers would know that their study would be judged on its merits as a study, not on the results
  • this approach would require very little extra work on the part of the authors

This solution is so simple and obvious.  So why doesn’t every journal have this policy?


After coming up with this idea I googled to see if anyone else has proposed it.  I did find this.  However, I don’t like Robin’s proposal for authors to write several versions of the paper with different conclusions.   I think submitting conclusion-free (as opposed to multi-conclusion) papers is better.

Read Full Post »