Can parents affect their kids personalities, work ethic, degree of conscientiousness, happiness, future income, intelligence, aggressiveness, and so on, through their actions as parents? Or are these things almost entirely determined from genes and other (non-parental (whatever that means)) environmental factors?
This is clearly a causal question (‘affect,’ ‘determined from’). Thus, it requires causal thinking and causal assumptions. I will first give some background on how causal effects are defined. I will show simple examples to gain intuition, and then show situations in which it becomes an extremely difficult problem. I will then return to the parenting example. If you are very familiar with causal inference, you could skip section 1. Also, while I am focused on parenting, much of this could be viewed as a general critique of behavioral genetics discussions.
1 Causality: A primer
In order to understand some of the challenges with answering questions like the one above, it will be helpful to define and discuss estimation of causal effects for much simpler problems.
It is useful to begin by defining specific populations, exposures, and outcomes. You would like to infer the causal effect of an exposure on some outcome. Which exposures do you want to compare? What is the outcome?
1.1 A simple case — a one-time binary intervention
Perhaps the simplest case is a one-time binary exposure. For example, suppose the population is people who were newly diagnosed with pneumonia. We would like to know whether being prescribed a 3 day supply of some antibiotic is more effective than being prescribed a 3 day supply of some placebo. In this example, the exposure is the prescription (not the actual taking of the pills). ‘More effective’ needs to refer to a specific outcome. We could define it as being symptom free in a week.
Notation: Z is exposure (1=antibiotic, 0=placebo), Y is outcome. Let Y[Z=z] be the outcome that would have occurred if the person had been prescribed z. Thus, everyone has two potential outcomes, Y and Y
We are interested in comparing Y[Z=1] and Y[Z=0] in the population. That is, we would like to compare the average outcome if everyone was prescribed z=0 compared with the average outcome if everyone was prescribed z=1 . If we randomize Z, then we can essentially substitute Y[Z=1] with Y|Z=1 and so on (the vertical line means ‘conditional on’ or ‘given’). If we do not randomize Z, we could still estimate the causal effect if we can achieve randomization by stratifying on (or controlling for) the full set of confounding variables X (loosely speaking, variables that affect both treatment assignment) . Note that the variables in X do not include post-exposure variables, since they need to affect Z, but not be affected by Z.
It’s not too hard to picture having a data set that could answer this question. While pneumonia is imperfectly identified, we have defined our population by the diagnosis of pneumonia, regardless of whether the diagnosis is accurate. Similarly, the outcome ‘symptom-free’ is a little subjective (and potentially could be influenced by knowledge about treatment — if there was unsuccessful blinding or no blinding). One could easily imagine a ‘harder’ outcome, such as ‘hospitalization within 30 days after treatment.’ It’s not hard to picture a randomized trial here. It’s also not hard to picture an observational study, where we control for variables such as age, sex, comorbidities, prior medication use, clinical variables such as blood pressure, and laboratory measures.
The point is, it’s not too difficult to define the exposure, outcome, causal effect of interest, and come up with a reasonable study design.
1.2 A more complicated case — repeated binary exposures
In the previous example we focused on a one-time intervention, such as ‘prescribed antibiotic or placebo.’ However, one could imagine caring more about the medication that was actually taken, rather than prescribed. For simplicity, let us imagine that the prescription was for 3 pills (one pill per day). Also, for simplicity, assume that people prescribed placebo will not have access to the antibiotic, and vice versa. We will see that even with these simplifying assumptions, it is a complicated problem.
People could take 0, 1, 2, or 3 pills. They could take them all on the same day, one each day, two on one day and one a week later, and so on. Precisely defining the exposure would mean knowing exactly when each person took each pill. Of course, no two people would do exactly the same thing (unless the experimenter controlled it, which is quite impractical). Thankfully, in most cases, we are satisfied with learning causal effects at higher levels of abstraction. For example, we might just define the exposure as ‘the total number of pills from their 3-day supply within a week after the prescription was written.’ In that case, we would lose details about the timing, but might keep enough information to determine whether antibiotics are more effective for the outcome of interest. Alternatively, one could define the exposure repeatedly over time: complied with prescribed treatment on day 1 (y/n); complied with prescribed treatment on day 2 (y/n), complied with prescribed treatment on day 3 (y/n).
Now we have many possible causal effects of interest. We could compare antibiotics on all 3 days with placebo on all 3 days, or antibiotics on 1 of 3 days with placebo on 2 of 3 days, and so on. Some of these contrasts are likely not of interest. Further, if there are too many of them we might need to make additional structural assumptions in order to identify the parameters (for example, might impose an assumption of a linear dose effect).
One could imagine a full randomized trial here, but that would involve the experimenter controlling on which day people took which pill (so maybe it is not so easy to imagine). A more feasible design is to randomize the prescription (placebo or antibiotic), but and encourage subjects to comply with the recommended use. Thus, treatment offered is randomly assigned but treatment received is not. How would one obtain information about how many pills were taken? There are devices for pill bottles that let investigators know the times the bottle was opened, but that does not guarantee that the person took the pills then or at all. You could ask the subjects how many pills they took, but this is not a guarantee of accuracy. So, the exposure will likely be measured with some error. We can hope that the measurement error distribution is the same in the two treatment arms, but measurement error still can lead to attenuated effects. The larger challenge here has to do with confounding. How do people who take their medication differ from people who do not? If all of the confounding is pre-treatment assignment (i.e., variables like sex, age, pneumonia severity at the time of diagnosis), then we can control for these variables and use standard longitudinal data analysis methods. If, however, there are post-treatment-assignment confounders, things get more complicated. Imagine, for example, that people who are asymptomatic on day 3 are less likely to take their medication on day 3, than are people who are symptomatic on day 3. Further, assume that your compliance with treatment on the first 2 days affected whether you were symptomatic on day 3. One could imagine scenarios where taking the better drug on all 3 days looks worse than taking the less effective drug on the first 2 days only. The problem here is that there exist time-varying variables that are both outcomes of treatment and confounders. To estimate the causal effects of interest, you would need to collect the time-varying confounding information, and then adjust for it, without adjusting away the effect of treatment (there are statistical methods that can handle this problem).
1.3 Diet: now things get ugly
Medications are easy when it comes to causal thinking, relative to the kinds of exposures that we are often interested in. Consider dietary research as an alternative. People would love to know which diets would increase their chances of losing weight, living longer, preventing diseases, etc. Yet, it seems like science-informed dietary advice changes almost as much as clothing fashion. Why?
There are many challenges here. What kind of causal contrast are we interested in? Defining the population might not be hard. Defining the outcome isn’t necessarily hard (it could be something like weight change over a given period of time, or time until death). But how to define the exposure? In section 1.2, we saw that, even for the simple case of 3 pills, the number of possible exposures (if we considered all possible times that the person could take the pills) would be too large. Diet is much more complicated than that.
One thing you could do is randomize people to one of several groups, where each group is encouraged to eat a specific type of diet. The problem here is that compliance will likely be poor, unless there are strong incentives with strong monitoring. Still, this is probably the best approach, and we have learned some things (especially about short-term outcomes) from studies like this.
Observational studies are extremely challenging. One person might eat a bowl of cereal (what kind? how much? what kind of milk? how much?) for breakfast (what time?), a peanut butter sandwich for lunch (what type of bread? how many calories? how much peanut butter?), a candy bar for a snack (how much?), coffee throughout the day (how much? what type? how much cream? what type? how much sugar or artificial sweeter?) and a dinner (you get the idea). Obviously, we need to simplify. We need to pick some aspects of diet to focus on. It could be total calories, or total fat, or total carbs, or something else. Any of those features will likely be measured with a lot of error. Further, controlling for a full set of confounding variables might not be possible (what people eat is likely more dependent on an extremely complicated history of factors than whether or not someone complies with their doctors advice about a medication).
The best studies tend to compare specific types of diets (such as low carb versus low fat). These studies are very difficult to carry out, and have a lot of limitations. But if successful, we can at least have some information about how one type of diet tends to compare to another on specific (typically) short term outcomes.
1.4 Why studying the effects of parenting is even harder
‘Parenting,’ as something whose effects we are interested in, have many similar features to those of diet. However, I think studying the effects of parenting is even more difficult.
‘Parenting’ as the exposure of interest
What constitutes parenting is very complex. With diet, at least it is clear what and when you are doing it (eating and drinking). With parenting, it is much more subtle. All of these are (potentially) examples of parenting: What you do during pregnancy (diet; stress levels; exposure to toxins; etc); who your friends are; where you choose to live; how often you are home with your kids; what you try to encourage; how you try to encourage it; how you treat other people in front of your kids; how much work you require that they do; how you look at them; how often you touch them; what you feed them; what you discuss with them; how open you are with them; how often you lose your temper; what you do when you lose your temper; how you dress them; what you do to help ensure they get enough sleep; how assertive you are with them; the degree to which you apologize or make excuses for them; how often you exercise; how the people in your life interact with your kids; if you have more than one child, how you treat them (including relative to each other), what you do to try and get them to be close to each other, etc.
Parenting is practically 24 hours a day. Almost all parents vary on most of these dimensions at different times during their kids lives (and often within the same day). How would we go about studying it? Again, if we are interested in causality, we would really need to narrow the question dramatically. The easiest thing would be to focus on a particular parenting strategy that one could learn with some instruction. Some parents could try the new strategy while others serve as the control group. Of course, there would be many problems with a study like that, but we would likely learn something (about a specific parenting strategy for a particular situation on a particular outcome).
Another possible study design is to survey parents about how often they do various things (e.g., hit their kids; yell at their kids; read to their kids; etc). These rely on self-report, and are probably not especially reliable. More importantly, these parenting behaviors are not randomly assigned, and will likely be confounded, even after controlling for a lot of variables. To get more reliable measures of parental behavior, parents could be monitored. However, this is an expensive type of study, and the monitoring likely affects behavior. Also, reducing measurement error in the exposure does not solve the confounding problem.
I think the best we can hope for (right now) is to learn about the effectiveness of interventions that encourage certain parenting behavior. The questions really need to be quite narrow. Even with that, there are potentially problems with the outcomes.
What outcomes do parents hope to influence?
Some outcomes are not especially difficult to measure, such as the income of your offspring when they are adults or the score on some intelligence test. However, many of the things that parents hope to influence are extremely difficult to measure. For example, many parents hope that their kids will be polite and respectful. Some parents want their kids to feel appreciative and not entitled. Parents might hope that their children are very inclusive, and never engage in any type of bullying (whether direct or covert). How would one measure these things? Researchers do attempt to quantify these types of variables (often by adding items from a questionnaire), but it is a rough approximation at best.
Just like parenting is an exposure that is high dimensional and highly variable, even within a person, the range of things that parents hope to affect is large as well. It would be important to link the specific exposures to the specific outcomes, if one wishes to make claims about the effectiveness of parenting.
2 Generality and confidence
You might assume that people would be the most confident in evidence that came from studies of type 1.1, and the least confident in when it comes to making broad conclusions about things that are difficult to study (like diet and parenting). Unfortunately, I often read extremely confident claims about far reaching claims related to behavioral genetics. For example:
“[...]personality resemblances between biological relatives are due almost entirely to heredity, rather than environment.” -Judith Rich Harris
“[...]parenting in general (across the broad range that constitutes “normal” parenting) has no impact on how children turn out.” -JayMan
“With a few exceptions, the effect of parenting on adult outcomes ranges from small to zero.” -Bryan Caplan
“A handy summary of the three laws is this: Genes 50 percent, Shared Environment 0 percent, Unique Environment 50 percent” -Steven Pinker (The Blank Slate)
Notice that these states are very general and display extreme confidence. For example, 2 of the above quotes refer to effects of parenting (not comparing 1 type of parenting style with another — but ALL (normal) parenting styles). Also, notice the confidence. Judith Rich Harris just tells us that it’s true. It’s a fact. JayMan says “has no impact” (none; without a doubt; this question has been answered). Bryan Caplan at least says “with few exceptions,” but his statement also displays confidence that few people could get away with making about quality studies of the type in Section 1.1 above. Finally, Steven Pinker puts ‘shared enivronment’ (as if we can measure that) at 0 percent. It is very common in behavioral genetics circles to quote attributable percentages (i.e., “it’s 50% genetic) with confidence – as if we know what percent of something is from genes (and that’s assuming it’s even a reasonable question).
3 How do we learn about the effect of genes on behavior?
To a large degree, the answer is heritability studies (such as twin and adoption studies), which are laden with problems.
Like ‘biological’ and ‘genetic,’ the term ‘heritability’ can have many possible meanings. In an abstract sense, we inherit everything (genes, environment, gene-environment interaction, and so on). However, in this post I will refer to heritability in the technical sense. Wikipedia does a nice job of explaining it (including fairly accurate caveats): “The heritability of a trait within a population is the proportion of observable differences in a trait between individuals within a population that is due to genetic differences. ” Again, ‘genetic’ here refers to variations in DNA sequences on coding regions, not gene expression. A key point to realize is that heritability (which is a number between 0 and 1) depends on the population in question: “heritability can change without any genetic change occurring (e.g. when the environment starts contributing to more variation).” If I study a population that has a lot of variation in environmental features that are relevant to the emergence or degree of the trait of interest, then heritability will be smaller than in a population with a more stable environment. Thus, for heritability to be meaningful, we need to have an understanding of the underlying population about which we are making heritability statements. “What proportion of this trait is genetic?” “It depends on the environment.”
Ideally, how would one design a heritability study?
Think of genotype and environment as two factors. What we would like to do is randomly and independently vary the two and record outcomes. Then, at a given environmental condition, we could see how much variation in genes affect the outcome, and vice versa. Almost certainly there would be gene-environment interaction, in the sense that holding the environment fixed in one state and varying genotype might have different outcomes (beyond additive) than if we held environment at a different state and varied genotype.
You cannot (at least, we can’t right now) vary genes and environment independently. Here are a few reasons:
1. The more genetically related two people are, the more likely they are to have similar physical features: height, skinniness, hair color, attractiveness etc. This will likely cause people to interact with more genetically similar people more similarly than with more distant relatives, on average. Thus, there will be correlations in parenting, even if the parents themselves were randomly assigned (e.g., if MZ twins were separated at birth and randomly assigned to different parents). Based on that alone, we’d expect MZ twins to have more similar outcomes than DZ twins, even if genes had no direct effect on the outcomes. Further, if genes do directly affect behavior, that will again likely lead to more similar parenting styles the more genetically related the kids are. Suppose, for example, there was a strong genetic component to some personality type – like shyness or aggressiveness. It is easy to imagine that shyness or aggressiveness in a child might cause a parent to interact differently with them than they would if the kid had different personality types. Essentially, genes recruit the environment. Or at least it seems likely that they do. Thus, unless children are raised by AI that were programmed to treat all children exactly the same, there is just no way to make the environment independent from genes .
2. Genes generally don’t have one single function independent from environment. For example, promoters take cues from the environment, and determine to what degree a gene expresses. As I discussed in the previous paragraph, genes recruit the environment. Thus, not only would it be extremely difficult to make environment independent from genes, but the varying environments affect gene expression (gene-environment interaction). Much of the role of epigenetics is probably happening during early development — making it likely that MZ twins look more ‘genetically’ alike than DZ twins in twin studies, or adopted kids more like their birth parents in adoption studies (because gene-environment interaction took place before separation).
So what do people actually do?
Rather than answering truly causal questions, behavioral genetics researchers instead tend to estimate the proportion of variation of outcomes that is due to variation in genes (and possibly variation in shared environment and unique environment) and then infer causality from that. Consider, for example, twins raised apart. They share either 100% or 50% of their genes, but (supposedly) have different environments. As mentioned above, they do not actually have truly different environments (spent first 9 months together; genes recruit environment). But let’s imagine that we truly can separate genes from environment. These types of studies (twin; adoption; and others) focus on estimation of variation of outcomes between and within pairs. The idea is that genetically identical people should have less variation in outcomes due to genes. How much less tells us about the role of genes. If we estimate that 50% of an outcome is explained by genes and 0% is explained by shared environment, then it appears that parenting doesn’t matter (although it’s hard to argue that parents do not affect unique environment as well).
Notice, however, that what parents actually do is ignored! We have pretty much no idea the ways in which parenting varied in these studies. Imagine, for simplicity, that there exists just 26 unique parenting styles: a, b, c,…,z. If what parents do does not matter, then, on average, Y[Z=a] is approximately Y[Z=b] … is approximately Y[Z=z] (again, Y is the outcome, and Z is the parenting style variable, which we set to whatever we want). In reality there are far more parenting styles. Yet, we made no actual comparisons between any of these. Instead, we just figured “well, I’m sure parenting varies in this population, and if parenting matters, outcomes should vary accordingly.” Suppose, for example, the optimal parenting style is Z=k. Perhaps only a small percentage of the population do Z=k, but would if they knew it was optimal. If parenting matters, some parenting styles will be better than average and some will be worse (it has to be this way). So, what do we actually learn from these variability studies? I’m not sure. Are you?
Parenting has strong social component. People tend to conform (I suspect). So, maybe parenting does matter, and most people are doing it right. Or, maybe it does matter, and most are doing it wrong. Or maybe it doesn’t matter. Or maybe they are doing well for some outcomes and poorly for others. We don’t know.
I don’t doubt that behaviors have genetic influence. I also think these heritability studies can be useful. In many cases I think they can give us an idea about which phenotypes are most canalized (for the environments we investigate). My objections have to do with how the results from these studies are interpreted. My objections are: (1) all human heritability studies that I am aware of do not come close to separating genetic effects from environmental effects (assuming that this is even a meaningful distinction, considering that there is (probably) always gene-environment interaction), so we should not make statements about the percentage of a variation that is due to genes; (2) for environmental features like parenting, causal statements should be avoided completely unless specific parenting features are investigated using appropriate causal methods; (3) even though many people are aware of some of the limitations of these studies, that doesn’t prevent them from making extreme claims with extreme confidence.
A brief summary of some of my criticisms:
- you cannot test a super global null hypothesis like ‘parenting doesn’t matter’, without actually measuring parenting ;
- early development is when (probably) most gene-environment interaction happens, and popular heritability study designs cannot account for that;
- genes recruit environment (parenting is partially an outcome of genes);
- parenting is really hard to study, so one should be highly suspicious of claims about its lack of impact.
 I’m focused here on population average causal effects. The basic ideas hold for other types of causal effects.
 The assumption that treatment is randomly assigned given covariates X is often called the ignorability assumption. Z being randomized given X means that it is independent of the potential outcomes Y and Y within every level of the set of covariates X.
 It is important to note that I am not talking about two correlated exposures that were measured at the same time, and affect an outcome later on. What we want is something like this:
Even this is greatly simplified. In reality, we’d have genes affecting expression, both affecting parenting, possibly all of them affecting early version of the outcome, early versions of the outcome affecting gene expression and parenting, etc. Thus, we see that parenting is, in part, an outcome of genes. It is also a confounder. So to properly estimate causal effects of genes and parenting, we would need to use causal inference methods that can accommodate this situation.
 I call it ‘super global’ because neither the exposure (the specific parenting styles) nor the outcome nor the population are mentioned. But even when the outcome and population are described, it doesn’t get around the fact that you cannot test the null of no parenting effect if you don’t measure parenting.