Feeds:
Posts
Comments

Archive for the ‘statistics’ Category

At the end of a year, people like to make lists of top movies, books, etc.  What I plan to do instead is write about the things I learned each year. So, here are some brief highlights of things I learned in 2011:

  • Epigenetics, toolkit genes, genetic switches and how most conversations about heritability are flawed.  I learned a lot about imprinted genes from Charlene Lewis (especially BDNF), about toolkit genes from reading Sean Carroll’s Endless Forms Most Beautiful (which I highly recommend) and about all of these topics from (some of) Robert Sapolsky’s lectures on human behavioral biology (which are fantastic, and free on youtube and itunes).
  • Social belonging sits atop the hierarchy of needs.  Sister Y introduced this idea with her blog here: “the need for social belonging is more pressing than the need for food.”  I have noticed that people are far more likely to want to kill (themselves or someone else) when they have been socially shamed, rejected, or ostracized.  NYU Psychology Professor James Gilligan noted:”The emotional cause that I have found just universal among people who commit serious violence, lethal violence is the phenomenon of feeling overwhelmed by feelings of shame and humiliation. I’ve worked with the most violent people our society produces who tend to wind up in our prisons. I’ve been astonished by how almost always I get the same answer when I ask the question—why did you assault or even kill that person? And the answer I would get back in one set of words or another but almost always meaning exactly the same thing would be, ‘Because he disrespected me,’ or ‘He disrespected my mother,’ or my wife, my girlfriend, whatever.”

    In the same program, Pieter Spierenburg pointed out that murder in defense of your reputation used to be viewed as a pretty minor offense: “Originally around 1300 the regular punishment for an honourable killing would be a fine or perhaps a banishment, whereas punishment for a treacherous murder would be execution.”

  • Evidence in favor of our promiscuous past, the most interesting of which is sperm competition.  I was introduced to this topic in Sex at Dawn.
  • Life cycles of parasites.  I learned about this from Robert Sapolsky and This Week in Parasitism.  I particularly love Toxoplasma and fish tapeworm.
  • Lead and crime.  There are a lot of theories about why crime has declined since the 1990s.  These theories include:  legalization of abortion, tougher sentencing, end of crack epidemic, etc.  But I think the most interesting one is the reduction in lead exposure.  Total lead exposure was a non-decreasing function  from 1900 to 1970.  Lead exposure from gasoline increased sharply from 1930 to 1970.   We know that lead exposure, especially chronic exposure, has neurotoxic effects.  It can be particularly damaging to the frontal lobe.  Thus, we would expect that kids who were exposed to lead would be more likely to engage in impulse crimes when they are young adults.   Jessica Reyes documented the link between lead exposure and crime in the US in this paper.   The graph below, taken from her paper, overlays the lead exposure curve and crime rate curve (with a 22 year lag for lead exposure, because 22 is the average age at which violent crimes are committed, so we would expect childhood exposure to lead to have the largest impact approximately 20 years later):

    I think this is pretty compelling, and a fascinating story.  The League of Nations banned lead pain in 1922, but the US failed to adopt the measure.  The US didn’t take serious action until the 1970s.  To this day, lead paint exposure is a serious problem for people living in old homes in large cities.  I would love to see the lead exposure / crime link investigated using data from other countries.
  • Religion. I learned about the history of god, its relation to changes in civilization (how transitions from polytheism to monotheism paralleled changes from foraging to farming, egalitarianism to hierarchy), lots of cool, related neuroscience, etc.  This is work in progress.  Hopefully I will have more to say about it next year.
  • I found Sister Y’s views on nature very insightful.

Read Full Post »

Tink Thomson on The Umbrella Man in Errol Morris’ short film:

The only person under any umbrella in all of Dallas standing right at the location where all the shots come into the limousine.  Can anyone come up with a non-sinister explanation for this?

It does seem weird.  People will naturally ask themselves informal questions such as “could that just be a coincidence?”  We can make the question increasingly formal:

What is the probability that the only person in Dallas holding an umbrella would be standing where the President was shot?

Or even better:

If we were to randomly place the Umbrella Man in one of the locations where a person was standing along the parade route in Dallas, what is the probability that he would end up right where the President was shot?

You will notice that our minds turn a retrospective observation (“hey, there was a guy with an umbrella standing next to the limo.  That’s weird.”) into a prospective randomization question (“if we randomly place the Umbrella Man…”).

We are only asking about the probability of the Umbrella Man standing there, because we already observed that he was standing there.  The observation drove the question.

I suspect that had the President been shot in a different location, we would have identified someone in the crowd that did something that seemed too weird to just be a coincidence.  That’s part of the reason why conspiracy theories are so seductive — there is always some observation that is hard to explain with chance.

I have made this point before, but this example is better than the ones I came up with.

Read Full Post »

As discussed previously, participants in randomized trials are typically blinded to treatment assignment.  This differs from the non-trial setting, where blinding patients to treatment would be considered unethical.  It is unclear the extent to which uncertainty about treatment assignment affects outcomes.  Most randomized trials are not designed to deal with this issue.

Informed consent laws prevent researchers from lying to patients about treatment assignment.  However, we can, to a large extent, affect what people believe about treatment assignment via the allocation probability.   For example, if subjects are informed that there is a 50% chance they will receive a placebo, they should believe that they have about a 50% chance of receiving placebo.  Alternatively, if we tell them that 99.999% of subjects will receive the active drug, they should be pretty confident that they will receive the active drug.  In the latter example, we will obtain something pretty close to the counterfactual we want (Y0,100%) on 0.001% of subjects.   Of course, we would need an enormous sample size to observe many people like that.  Thus, there are the usual tradeoffs between bias and efficiency.

My suggestion is to randomize subjects to one of several arms that have different allocation probabilities.  Assuming the causal effects are a smooth function of the allocation probability, we could extrapolate to obtain estimates of E(Y1,100% -Y0,100%).

For details, see here, or email for reprint (nequal1@gmail.com).

Read Full Post »

Consider the situation where there are two treatments, T=0 or 1.  Let the variable B denote the subject’s confidence (as a percentage) that they have been assigned treatment T=1.  Finally, let the potential outcome Yt,b be the outcome that would be observed if the subject was actually assigned treatment t and were b% confident that they were assigned T=1.

For example, Y1,100% is the outcome that would be observed if the subject was assigned treatment 1 and was sure that they were assigned treatment 1.  Similarly,   Y0,0% is the outcome that would be observed if the subject was assigned treatment 0 and was sure that they were not assigned treatment 1.

I would argue that the causal effect we are most often interested in is  Y1,100% -Y0,100%   That is, the potential outcome if the subject was assigned treatment 1 and was sure they were assigned treatment 1, minus the potential outcome if the subject was assigned treatment 0 but falsely believed they were assigned treatment 1.

To illustrate the idea, imagine that treatment 1 is an active drug and treatment 0 is a placebo.  We are interested in what would happen if the subject believed they were assigned the active drug and did receive the active drug, versus the case where they were assigned placebo but believe it was the active drug.  The difference in these potential outcomes should tell us the effect of the active drug that is not strictly due to knowing that they are taking an active drug.

Using this notation, we can also formally define the placebo effect as Y0,100% -Y0,0% (the difference in potential outcomes if given a placebo, but on the one hand believe it’s an active drug and on the other had know that it’s a placebo).

The problem is that informed consent laws prevent us from directly observing Y1,0% or  Y0,100%  (because it would require lying to subjects about what treatments they are given).  Typically in randomized trials, only one of the following two potential outcomes is observed for each subject: Y1,50% or Y0,50%.  It is unclear how similar a contrast such as  Y1,50% -Y0,50% will be to the contrast we want, Y1,100% -Y0,100%

Thus, most randomized trials with human subjects are not even designed to obtain the variables that we are most interested in.

Read Full Post »

The primary criticism of observational studies is that there is no way to know the extent of unmeasured confounding.

Randomized controlled trials (RCTs) have their own limitations.  They often exclude patients with co-morbid conditions and select the most adherent patients using a pre-randomization run-in phase.

However, there is another problem with RCTs, one that is not widely recognized.  Quoting myself in a forthcoming paper (link to abstract (email me for reprint)):

In RCTs patients have uncertainty about what treatment they are receiving. A patient receiving an active drug or therapy might falsely believe that they are receiving the placebo or sham therapy. Outside of the RCT environment, a patient who is prescribed a drug by their physician will be sure that they are receiving the active drug. We would expect placebo effects to be stronger if patients were unaware that they might be given a placebo. Similarly, we might expect active treatments to be more effective if there was no uncertainty about treatment receipt. While there has been great emphasis about the importance of concealing treatment assignment, this concealment creates uncertainty within the patient about treatment assignment.

Treatment uncertainty could affect subjects’ behavior (such as adherence) and subjective well being.  Given the evidence about placebo effects, it’s not unreasonable to speculate that these uncertainty effects could be substantial.  Further, treatment uncertainty also might also affect who is willing to participate in the studies.  For example, patients’ who want the newest therapy might be unwilling to risk getting randomized to  placebo.

In the next post, I will formalize these ideas.  In the final post of this series, I will propose a solution.

Read Full Post »

Racial resentment

Political science professor Alan Abramowitz of Emory University wrote a paper which, among other things, claims that ‘racial hostility’ is a significant predictor of Tea Party support.   His conclusion was based on results from a survey.  The information on attitudes about race came from a  four-item ‘racial resentment’ scale.    Here is the first item that makes up the racial resentment scale:

Do you agree strongly, agree somewhat, neither agree nor disagree, disagree somewhat, or disagree strongly with this statement?

Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without any special favors.

How can one answer something like this?  What if you reject the premise of the statement?  For example, the two  sentences imply that many minority groups have overcome prejudice without any special favors (whatever that means).  It also implies that blacks have not worked their way up so far, with or without special favors.  What if you think that Italians and Jews worked their way up with special favors?  What if you think that Irish have not worked their way up?  What if you think that blacks already have worked their way up (they’ve come along way since the days of slavery)?  Plus, there is a possible false equivalence here.  Not all obstacles are of the same size.  If blacks ‘work their way up’ without ‘special favors,’ that does not mean that that is ‘the same’ as what other minority groups accomplished.

Special favors

The phrase  ‘special favors’ makes it sound like blacks would be getting some extra goodies from the government that is only available to them (it’s ‘special’).  Well, a lot of people with very little racial resentment would object to any minority group getting special favors.  That’s especially true for people who prefer small government (like Tea Party folks).  As Bob Somerby pointed out, there is no shortage of blacks who would agree that blacks should not get special favors (apparently resenting themselves).

Racial resentment=resentment of blacks

All four items making up the racial resentment scale have to do with resentment of blacks.  For example, the fourth item is:

Do you agree strongly, agree somewhat, neither agree nor disagree, disagree somewhat, or disagree strongly with this statement?

It’s really a matter of some people not trying hard enough; if blacks would only try harder they could be just as well off as whites.

I suppose it’s not surprising that the Tea Party looked like they had a lot of racial resentment when you measure it in this way (since people who identify with them tend to be white conservatives).

What if some of the questions tried measure resentment of whites?  For example:

Do you agree strongly, agree somewhat, neither agree nor disagree, disagree somewhat, or disagree strongly with this statement?

Part of the reason that income inequality in this country is so extreme is because there is still an exclusive club among rich white people.

Based on responses to that question, perhaps Democrats would look like they have a lot of ‘racial resentment.’

I’m not saying that racism is not more prevalent among people who identify with the Tea Party, but let’s do a better job of measuring these things.

Read Full Post »

“Heritability is the proportion of phenotypic variation in a population that is due to genetic variation between individuals.” -wikipedia

Not all genetic variation is inherited.  We used to think that essentially genotype=phenotype (i.e., having a gene meant you had whatever trait goes along with it).  But now we know that all kinds of more complicated stuff is going on that determines whether genes are activated.  So, there isn’t the direct correspondence between genes and proteins like we previously thought.

Further, heritability is typically estimated from twin studies, by comparing DZ and MZ twins.   However, MZ twins likely have a more similar environment than do DZ twins.  MZ twins often share a placenta.  In addition, I have observed that they are often treated differently than DZ twins.  For example, I suspect it’s more common to dress MZ twins alike, to assume they like the same things, etc.  MZ twins also look more alike, and there is no doubt that appearance affects how people are treated.  If it’s true that MZ twins are treated more similarly than DZ twins, we might expect estimates of heritability to increase with age, which is exactly what we find in practice.

Read Full Post »

Exaggerated effect plots

The following graph appeared in Dunbar, R. I. M. and Shultz, S. (2007). Evolution in the Social Brain. Science 317, 1344 (link to abstract).  I recommend clicking on it to see the full version.

There are many problems with the graph and the paper*, but I will focus on the presentation in the graph.  The point estimates are at the top or bottom of each box.  They also have a SE bar.  In almost all cases, if the estimate is positive, then the SE bar extends from the top of the box.  If the estimate is negative, then the SE bar extends down below the box.  Visually, this makes differences look bigger than they are.  For example, if you look at bats, the two rectangles have no overlap, and the one for pairbonded species extends all the way up past 0.4, and the one for ‘other mating strategies’ extends down past -0.3.

Compare this with my graph below, where I get rid of the boxes, and just have 95% confidence intervals (i.e., +/- 2 SEs from point estimate).  Again, I recommend clicking on the graph.

From this graph, we see a huge difference for birds, and possibly a small difference for other groups (besides primates).  But overall, it looks like birds are the only group that we can really say anything about (assuming the bird study was carefully done).

_____________________________________________________________________________________

*Other things wrong with the graph & paper:

1.  where did these data come from?

2. which species do they consider pairbonded in each group?

3.  How much data from each type of species? How was this decided?  You could get small confidence intervals by obtaining a lot of data from one species in a given group.  That might not tell us much about the group as a whole.

4. What methodology was used to pool data from multiple species within a group? This is a complex problem.  It’s unclear what the population parameter is here.

Read Full Post »

I have had the following situation happen several times during my research career:  I write code to analyze data; there is some expectation about what the results will be; after running the program, the results are not what was expected; I go back and carefully check the code to make sure there are no errors; sometimes I find an error

No matter how careful you are when it comes to writing computer code, I think you are more likely to find a mistake if you think there is one.  Unexpected results lead one to suspect a coding error more than expected results do.

In general, researchers usually do have general expectations about what they will find (e.g., the drug will not increase risk of the disease; the toxin will not decrease risk of cancer).

Consider the following graphic:

Here, the green region is consistent with what our expectations are.  For example, if we expect a relative risk (RR) of about 1.5, we might not be too surprised if the estimated RR is between (e.g.) 0.9 and 2.0.  Anything above 2.0 or below 0.9 might make us highly suspicious of an error — that’s the red region.  Estimates in the red region are likely to trigger serious coding error investigation.  Obviously, if there is no coding error then the paper will get submitted with the surprising results.

Error scenarios

Let’s assume that there is a coding error that causes the estimated effect to differ from the true effect (assume sample size large enough to ignore sampling variability).

Consider the following scenario:

Type A. Here, the estimated value is biased, but it’s within the expected range.  In this scenario, error checking is probably more casual and less likely to be successful.

Next, consider this scenario:

Type B. In this case, the estimated value is in the red zone.  This triggers aggressive error checking of the type that has a higher success rate.

Finally:

Type C. In this case it’s the true value that differs from our expectations.  However, the estimated value is about what we would expect.  This triggers casual error checking of the less-likely-to-be-successful variety.

If this line of reasoning holds, we should expect journal articles to contain errors at a higher rate when the results are consistent with the authors’ prior expectations. This could be viewed as a type of confirmation bias.

How common are coding errors in research?

There are many opportunities for hard-to-detect errors to occur.  For large studies, there might be hundreds of lines of code related to database creation, data cleaning, etc., plus many more lines of code for data analysis.  Studies also typically involve multiple programmers.  I would not be surprised if at least 20% of  published studies include results that were affected by at least one coding error.  Many of these errors probably had a trivial effect, but I am sure others did not.

cross-posted at lesswrong (where you will find many interesting comments)

Read Full Post »

Suppose 50% of people in a population have an asymptomatic form of cancer. None of them know if they have it. One of them is randomly selected and a diagnostic test is carried out (the result is not disclosed to them). If they don’t have cancer, they are woken up once. If they do have it, they are woken up 9 times (with amnesia-inducing drug administered each time, blah blah blah). Each time they are woken up, they are asked their credence (subjective probability) for cancer.

Imagine we do this repeatedly, randomly selecting people from a population that has 50% cancer prevalence.

World A: Everyone uses thirder logic

Someone without cancer will say: “I’m 90% sure I have cancer”

Someone with cancer will say: “I’m 90% sure I have cancer.” “I’m 90% sure I have cancer.” “I’m 90% sure I have cancer.” “I’m 90% sure I have cancer.” “I’m 90% sure I have cancer.” “I’m 90% sure I have cancer.” “I’m 90% sure I have cancer.” “I’m 90% sure I have cancer.” “I’m 90% sure I have cancer.”

Notice, everyone says they are 90% sure they have cancer, even though only 50% of them actually do.

Sure, the people who have cancer say it more often, but does that matter? At an awakening (you can pick one), people with cancer and people without are saying the same thing.

World B: Everyone uses halfer logic

Someone without cancer will say: “I’m 50% sure I have cancer”

Someone with cancer will say: “I’m 50% sure I have cancer.” “I’m 50% sure I have cancer.” “I’m 50% sure I have cancer.” “I’m 50% sure I have cancer.” “I’m 50% sure I have cancer.” “I’m 50% sure I have cancer.” “I’m 50% sure I have cancer.” “I’m 50% sure I have cancer.” “I’m 50% sure I have cancer.”

Here, half of the people have cancer, and all of them say they are 50% sure they have cancer.

My question: which world contains the more rational people?

Read Full Post »

Older Posts »

Follow

Get every new post delivered to your Inbox.

Join 83 other followers