A Replication of Karlan and List (2007)

Author

Jerry Wu

Published

April 23, 2025

Introduction

Dean Karlan at Yale and John List at the University of Chicago conducted a field experiment to test the effectiveness of different fundraising letters. They sent out 50,000 fundraising letters to potential donors, randomly assigning each letter to one of three treatments: a standard letter, a matching grant letter, or a challenge grant letter. They published the results of this experiment in the American Economic Review in 2007. The article and supporting data are available from the AEA website and from Innovations for Poverty Action as part of Harvard’s Dataverse.

The study was a natural field experiment involving 50,083 past donors to a U.S.-based civil liberties nonprofit. Participants were randomly assigned to either a control group (receiving a standard fundraising appeal) or a treatment group (receiving a letter mentioning a matching grant offer). Within the treatment group, participants were further randomly assigned to sub-treatments varying the matching ratio ($1:$1, $2:$1, $3:$1), the maximum match amount ($25,000, $50,000, $100,000, or unstated), and the suggested donation (“ask amount”) (equal to prior gift, 1.25×, or 1.5×). The experiment tested whether these pricing manipulations influenced donor behavior. While offering any match increased response rates and revenue per solicitation, larger match ratios did not produce statistically significant differences in giving. The study also explored how effects varied by geography and found greater responsiveness in “red” states (which had voted for George W. Bush in 2004). This nuanced field experiment contributed robust evidence to the demand-side economics of charitable giving.

This project seeks to replicate their results.

Data

Description

import pandas as pd
df = pd.read_stata('karlan_list_2007.dta')
df.describe()
treatment control ratio2 ratio3 size25 size50 size100 sizeno askd1 askd2 ... redcty bluecty pwhite pblack page18_39 ave_hh_sz median_hhincome powner psch_atlstba pop_propurban
count 50083.000000 50083.000000 50083.000000 50083.000000 50083.000000 50083.000000 50083.000000 50083.000000 50083.000000 50083.000000 ... 49978.000000 49978.000000 48217.000000 48047.000000 48217.000000 48221.000000 48209.000000 48214.000000 48215.000000 48217.000000
mean 0.666813 0.333187 0.222311 0.222211 0.166723 0.166623 0.166723 0.166743 0.222311 0.222291 ... 0.510245 0.488715 0.819599 0.086710 0.321694 2.429012 54815.700533 0.669418 0.391661 0.871968
std 0.471357 0.471357 0.415803 0.415736 0.372732 0.372643 0.372732 0.372750 0.415803 0.415790 ... 0.499900 0.499878 0.168561 0.135868 0.103039 0.378115 22027.316665 0.193405 0.186599 0.258654
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.009418 0.000000 0.000000 0.000000 5000.000000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.755845 0.014729 0.258311 2.210000 39181.000000 0.560222 0.235647 0.884929
50% 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 1.000000 0.000000 0.872797 0.036554 0.305534 2.440000 50673.000000 0.712296 0.373744 1.000000
75% 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 1.000000 1.000000 0.938827 0.090882 0.369132 2.660000 66005.000000 0.816798 0.530036 1.000000
max 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 ... 1.000000 1.000000 1.000000 0.989622 0.997544 5.270000 200001.000000 1.000000 1.000000 1.000000

8 rows × 48 columns

Variable Description
treatment Treatment
control Control
ratio Match ratio
ratio2 2:1 match ratio
ratio3 3:1 match ratio
size Match threshold
size25 $25,000 match threshold
size50 $50,000 match threshold
size100 $100,000 match threshold
sizeno Unstated match threshold
ask Suggested donation amount
askd1 Suggested donation was highest previous contribution
askd2 Suggested donation was 1.25 x highest previous contribution
askd3 Suggested donation was 1.50 x highest previous contribution
ask1 Highest previous contribution (for suggestion)
ask2 1.25 x highest previous contribution (for suggestion)
ask3 1.50 x highest previous contribution (for suggestion)
amount Dollars given
gave Gave anything
amountchange Change in amount given
hpa Highest previous contribution
ltmedmra Small prior donor: last gift was less than median $35
freq Number of prior donations
years Number of years since initial donation
year5 At least 5 years since initial donation
mrm2 Number of months since last donation
dormant Already donated in 2005
female Female
couple Couple
state50one State tag: 1 for one observation of each of 50 states; 0 otherwise
nonlit Nonlitigation
cases Court cases from state in 2004-5 in which organization was involved
statecnt Percent of sample from state
stateresponse Proportion of sample from the state who gave
stateresponset Proportion of treated sample from the state who gave
stateresponsec Proportion of control sample from the state who gave
stateresponsetminc stateresponset - stateresponsec
perbush State vote share for Bush
close25 State vote share for Bush between 47.5% and 52.5%
red0 Red state
blue0 Blue state
redcty Red county
bluecty Blue county
pwhite Proportion white within zip code
pblack Proportion black within zip code
page18_39 Proportion age 18-39 within zip code
ave_hh_sz Average household size within zip code
median_hhincome Median household income within zip code
powner Proportion house owner within zip code
psch_atlstba Proportion who finished college within zip code
pop_propurban Proportion of population urban within zip code

Balance Test

As an ad hoc test of the randomization mechanism, I provide a series of tests that compare aspects of the treatment and control groups to assess whether they are statistically significantly different from one another.

import numpy as np
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
from scipy import stats

test_vars = ['mrm2', 'couple', 'female', 'ave_hh_sz']
results = {}

for var in test_vars:
    df_clean = df[['treatment', 'control', var]].dropna()

    treatment_group = df_clean[df_clean['treatment'] == 1][var]
    control_group = df_clean[df_clean['control'] == 1][var]

    # Manual t-test
    mean_diff = treatment_group.mean() - control_group.mean()
    n1, n2 = len(treatment_group), len(control_group)
    var1, var2 = treatment_group.var(ddof=1), control_group.var(ddof=1)
    se = np.sqrt(var1 / n1 + var2 / n2)
    t_stat = mean_diff / se
    df_denom = (var1 / n1 + var2 / n2) ** 2
    df_num = (var1**2) / (n1**2 * (n1 - 1)) + (var2**2) / (n2**2 * (n2 - 1))
    df_ttest = df_denom / df_num
    p_value_ttest = 2 * (1 - stats.t.cdf(np.abs(t_stat), df_ttest))
    # Linear regression
    X = sm.add_constant(df_clean['treatment'])
    y = df_clean[var]
    model = sm.OLS(y, X).fit()
    coef = model.params['treatment']
    p_value_reg = model.pvalues['treatment']
    print("================================================")
    print(f'{var} Analysis: \n')
    print(f'{var} Treatment mean: {treatment_group.mean()}')
    print(f'{var} Control mean: {control_group.mean()}')
    print(f'{var} All Mean: {df_clean[var].mean()}')
    print('________________________________________________')
    print('t-test: \n')
    print(f't-statistic: {t_stat}')
    print(f'p-value: {p_value_ttest}')
    print('________________________________________________')
    print('Linear Regression: \n')
    print(f'Coefficient on Treatment: {coef}')
    print(f'p-value: {p_value_reg}\n')
================================================
mrm2 Analysis: 

mrm2 Treatment mean: 13.011828117981734
mrm2 Control mean: 12.99814226643495
mrm2 All Mean: 13.00726808034823
________________________________________________
t-test: 

t-statistic: 0.1195315522817725
p-value: 0.9048549631450833
________________________________________________
Linear Regression: 

Coefficient on Treatment: 0.013685851546779986
p-value: 0.904885973177816

================================================
couple Analysis: 

couple Treatment mean: 0.09135794896957802
couple Control mean: 0.0929748269737245
couple All Mean: 0.0918974149381833
________________________________________________
t-test: 

t-statistic: -0.5822577486767693
p-value: 0.5603971270058028
________________________________________________
Linear Regression: 

Coefficient on Treatment: -0.0016168780041463048
p-value: 0.5593646446996638

================================================
female Analysis: 

female Treatment mean: 0.2751509208469954
female Control mean: 0.2826978395250627
female All Mean: 0.27766887200849466
________________________________________________
t-test: 

t-statistic: -1.7535132542519636
p-value: 0.07952338672686232
________________________________________________
Linear Regression: 

Coefficient on Treatment: -0.007546918678066679
p-value: 0.07869095826986866

================================================
ave_hh_sz Analysis: 

ave_hh_sz Treatment mean: 2.4300146102905273
ave_hh_sz Control mean: 2.427002429962158
ave_hh_sz All Mean: 2.4290122985839844
________________________________________________
t-test: 

t-statistic: 0.8233500123023987
p-value: 0.4103151242417935
________________________________________________
Linear Regression: 

Coefficient on Treatment: 0.003012174284715988
p-value: 0.409801160289328

To assess the randomization, I tested several baseline variables (e.g., months since last donation, gender, couple status, average household size within zip) using both t-tests and linear regressions and in every case the results from the two methods were nearly identical. None of the variables showed statistically significant differences at the 95% level, confirming balance between treatment and control groups. This supports the validity of the randomization and mirrors the role of Table 1 in the paper, which demonstrates baseline equivalence.

Experimental Results

Charitable Contribution Made

First, I analyze whether matched donations lead to an increased response rate of making a donation.

import matplotlib.pyplot as plt
df_bar = df[['treatment', 'control', 'gave']].dropna()

df_bar['group'] = df_bar.apply(lambda row: 'Treatment' if row['treatment'] == 1 else 'Control', axis=1)

donation_rates = df_bar.groupby('group')['gave'].mean()

plt.figure(figsize=(6, 5))
ax = donation_rates.plot(kind='bar')

for i, value in enumerate(donation_rates):
    ax.text(i, value + 0.0001, f'{value:.3f}', ha='center', va='bottom')
plt.ylabel('Proportion Donated')
plt.title('Donation Rate by Group')
plt.ylim(0, 0.03)
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

The bar plot compares donation rates between the treatment and control groups. The treatment group, which received a matching donation offer, had a higher donation rate (2.2%) than the control group (1.8%). This visual evidence suggests that the presence of a matching grant increased the likelihood of donating, consistent with the main findings in the paper.

import numpy as np
import statsmodels.api as sm
from scipy import stats

df_binary = df[['treatment', 'control', 'gave']].dropna()

treatment_group = df_binary[df_binary['treatment'] == 1]['gave']
control_group = df_binary[df_binary['control'] == 1]['gave']

# Manual t-test calculation
mean_diff = treatment_group.mean() - control_group.mean()
n1, n2 = len(treatment_group), len(control_group)
var1, var2 = treatment_group.var(ddof=1), control_group.var(ddof=1)
se = np.sqrt(var1 / n1 + var2 / n2)
t_stat = mean_diff / se
df_denom = (var1 / n1 + var2 / n2) ** 2
df_num = (var1**2) / (n1**2 * (n1 - 1)) + (var2**2) / (n2**2 * (n2 - 1))
df_ttest = df_denom / df_num
p_value_ttest = 2 * (1 - stats.t.cdf(np.abs(t_stat), df_ttest))

# Linear regression:
X = sm.add_constant(df_binary['treatment'])
y = df_binary['gave']
model = sm.OLS(y, X).fit()
coef = model.params['treatment']
p_value_reg = model.pvalues['treatment']

print("================================================")
print(f' \'gave\' Analysis: \n')
print(f'\'gave\' Treatment mean: {treatment_group.mean()}')
print(f'\'gave\' Control mean: {control_group.mean()}')
print(f'Mean Difference: {mean_diff}')
print(f"'gave' All mean: {df_binary['gave'].mean()}")
print('________________________________________________')
print('t-test: \n')
print(f't-statistic: {t_stat}')
print(f'p-value: {p_value_ttest}')
print('________________________________________________')
print('Linear Regression: \n')
print(f'Coefficient on Treatment: {coef}')
print(f'p-value: {p_value_reg}')
================================================
 'gave' Analysis: 

'gave' Treatment mean: 0.02203856749311295
'gave' Control mean: 0.017858212980164198
Mean Difference: 0.00418035451294875
'gave' All mean: 0.020645728091368328
________________________________________________
t-test: 

t-statistic: 3.2094621908279835
p-value: 0.001330982345091547
________________________________________________
Linear Regression: 

Coefficient on Treatment: 0.004180354512949377
p-value: 0.001927402594901797

To test whether matched donations increase giving, I compared donation rates between the treatment and control groups using a t-test and a bivariate regression. The treatment group had a slightly higher donation rate (2.2% vs. 1.8%), and the difference was statistically significant in both tests. This matches results in Table 2A, Panel A of the original study and suggests that even a modest match offer can meaningfully boost donation rates. The finding highlights how small psychological nudges like matching gifts can influence charitable behavior.

import statsmodels.formula.api as smf

df_probit = df[['gave', 'treatment']].dropna()

probit_model = smf.probit('gave ~ treatment', data=df_probit).fit(disp=False)

probit_summary = probit_model.summary2().as_text()

marginal_effects = probit_model.get_margeff().summary().as_text()

print(marginal_effects)
       Probit Marginal Effects       
=====================================
Dep. Variable:                   gave
Method:                          dydx
At:                           overall
==============================================================================
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
treatment      0.0043      0.001      3.104      0.002       0.002       0.007
==============================================================================

To replicate Table 3, Column 1 of Karlan and List (2007), I ran a probit regression with a binary outcome for donation and treatment assignment as the sole predictor. The marginal effect of treatment was 0.0043, closely matching the 0.004 reported in the paper. This confirms that the presence of a matching grant increased the probability of donating by roughly 0.4 percentage points, a statistically significant effect. While small in magnitude, the result reinforces the finding that subtle changes in perceived impact, such as matching gifts, can meaningfully influence donation behavior.

Differences between Match Rates

Next, I assess the effectiveness of different sizes of matched donations on the response rate.

from scipy.stats import ttest_ind
df_match = df[df['treatment'] == 1][['gave', 'ratio2', 'ratio3']].dropna()

# Create labels for ratio group (1:1, 2:1, 3:1)
def classify_ratio(row):
    if row['ratio2'] == 1:
        return '2:1'
    elif row['ratio3'] == 1:
        return '3:1'
    else:
        return '1:1'

df_match['match_ratio'] = df_match.apply(classify_ratio, axis=1)

# Pairwise t-tests between ratios
ratios = ['1:1', '2:1', '3:1']
pairwise_results = {}

for i in range(len(ratios)):
    for j in range(i + 1, len(ratios)):
        group1 = df_match[df_match['match_ratio'] == ratios[i]]['gave']
        group2 = df_match[df_match['match_ratio'] == ratios[j]]['gave']
        t_stat, p_value = ttest_ind(group1, group2, equal_var=False)
        print("================================================")
        print(f'{ratios[i]} vs {ratios[j]}\n')
        print(f't-statistic: {t_stat}')
        print(f'p-value: {p_value}\n')
================================================
1:1 vs 2:1

t-statistic: -0.965048975142932
p-value: 0.33453078237183076

================================================
1:1 vs 3:1

t-statistic: -1.0150174470156275
p-value: 0.31010856527625774

================================================
2:1 vs 3:1

t-statistic: -0.05011581369764474
p-value: 0.9600305476940865

To test whether the size of the match ratio influenced donation behavior, I conducted a series of pairwise t-tests comparing response rates between the 1:1, 2:1, and 3:1 match groups. None of the differences were statistically significant at the 95% level. For example, the difference between the 2:1 and 1:1 groups yielded a p-value of 0.33, and the difference between the 3:1 and 2:1 groups had a p-value of 0.96. These results support the authors’ statement in Table 2A and on page 8 of the paper: while match offers increase giving relative to no match, larger match ratios do not provide additional benefit in terms of increasing the likelihood of donating.

# Alternative: use ratio as a categorical variable
model2 = smf.ols('gave ~ ratio', data=df).fit()
model2_summary = model2.summary2().as_text()
print(model2_summary)
                  Results: Ordinary least squares
====================================================================
Model:              OLS              Adj. R-squared:     0.000      
Dependent Variable: gave             AIC:                -53252.8233
Date:               2025-04-23 16:43 BIC:                -53217.5376
No. Observations:   50083            Log-Likelihood:     26630.     
Df Model:           3                F-statistic:        3.665      
Df Residuals:       50079            Prob (F-statistic): 0.0118     
R-squared:          0.000            Scale:              0.020217   
----------------------------------------------------------------------
               Coef.    Std.Err.      t      P>|t|     [0.025   0.975]
----------------------------------------------------------------------
Intercept      0.0179     0.0011   16.2245   0.0000    0.0157   0.0200
ratio[T.1]     0.0029     0.0017    1.6615   0.0966   -0.0005   0.0063
ratio[T.2]     0.0048     0.0017    2.7445   0.0061    0.0014   0.0082
ratio[T.3]     0.0049     0.0017    2.8016   0.0051    0.0015   0.0083
--------------------------------------------------------------------
Omnibus:             59812.754     Durbin-Watson:        2.005      
Prob(Omnibus):       0.000         Jarque-Bera (JB):     4316693.217
Skew:                6.740         Prob(JB):             0.000      
Kurtosis:            46.438        Condition No.:        4          
====================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors
is correctly specified.

To test whether the match ratio affects donation behavior, I regressed the binary outcome gave on ratio as a categorical variable. Using the 1:1 match as the reference group, I found that the 2:1 and 3:1 match ratios had slightly higher donation rates, with coefficients of 0.0048 and 0.0049 respectively, both statistically significant at the 1% level. The 1:1 coefficient was smaller and not statistically significant. These results suggest that higher match ratios may slightly increase the likelihood of donating, although the effect is small in magnitude and inconsistent with earlier t-test results.

response_rates = df_match.groupby('match_ratio')['gave'].mean()

diff_2v1_direct = response_rates['2:1'] - response_rates['1:1']
diff_3v2_direct = response_rates['3:1'] - response_rates['2:1']

coef_2v1_reg = model2.params['ratio[T.2]'] - model2.params['ratio[T.1]']
coef_3v2_reg = model2.params['ratio[T.3]'] - model2.params['ratio[T.2]']

print("Direct from data: \n")
print(f"2:1 vs 1:1: {diff_2v1_direct}")
print(f"3:1 vs 2:1: {diff_3v2_direct}")
print("================================================")
print("From regression coefficients: \n")
print(f"2:1 vs 1:1: {coef_2v1_reg}")
print(f"3:1 vs 2:1: {coef_3v2_reg}")
Direct from data: 

2:1 vs 1:1: 0.0018842510217149944
3:1 vs 2:1: 0.00010002398025293902
================================================
From regression coefficients: 

2:1 vs 1:1: 0.0018842510217151158
3:1 vs 2:1: 0.00010002398025313504

To assess whether larger match ratios increase the likelihood of donating, I calculated the differences in response rates both directly from the data and from regression coefficients. The results were nearly identical across both methods:

  • The difference between 2:1 and 1:1 was about 0.19 percentage points.

  • The difference between 3:1 and 2:1 was effectively zero.

  • The difference between 3:1 and 1:1 was again about 0.20 percentage points.

These findings confirm that while moving from a 1:1 to a 2:1 or 3:1 match may result in a small increase in donation likelihood, the differences are minimal and statistically weak. This supports the paper’s conclusion that larger match ratios do not meaningfully improve response rates beyond the effect of having a match at all.

Size of Charitable Contribution

In this subsection, I analyze the effect of the size of matched donation on the size of the charitable contribution.

df_amount = df[['amount', 'treatment', 'control']].dropna()

treatment = df_amount[df_amount['treatment'] == 1]['amount']
control = df_amount[df_amount['control'] == 1]['amount']
t_stat, p_value = stats.ttest_ind(treatment, control, equal_var=False)
print('T-test Results: ')
print('_______________________________')
print(f'T-statistic: {t_stat}\nP-Value: {p_value}')
T-test Results: 
_______________________________
T-statistic: 1.9182618934467577
P-Value: 0.055085665289183336

I conducted a t-test to compare average donation amounts between the treatment and control groups. The test produced a t-statistic of 1.92 and a p-value of 0.055, which is just above the conventional 5 percent significance threshold. This suggests a weak, but not statistically significant, indication that the treatment may have increased donation amounts. While the result hints at a possible effect, it is not strong enough to draw a firm conclusion about the impact of matched donations on how much people give.

df_positive = df[df['amount'] > 0]

treatment = df_positive[df_positive['treatment'] == 1]['amount']
control = df_positive[df_positive['control'] == 1]['amount']
t_stat, p_value = stats.ttest_ind(treatment, control, equal_var=False)
print('T-test Results: ')
print('_______________________________')
print(f'T-statistic: {t_stat}\nP-Value: {p_value}')
T-test Results: 
_______________________________
T-statistic: -0.5846089794983359
P-Value: 0.5590471865673547

To analyze how much people donate conditional on giving, I restricted the data to respondents who made a donation and ran a t-test comparing donation amounts between treatment and control groups. The t-test produced a t-statistic of -0.58 and a p-value of 0.56, indicating no statistically significant difference in donation amounts. This suggests that while matched donations may influence whether someone gives, they do not affect how much donors give once they’ve decided to contribute. Because treatment was randomly assigned, the coefficient has a causal interpretation, but in this case, the effect size is negligible.

df_donated = df[df['amount'] > 0]

treatment_donors = df_donated[df_donated['treatment'] == 1]['amount']
control_donors = df_donated[df_donated['control'] == 1]['amount']

treatment_mean = treatment_donors.mean()
control_mean = control_donors.mean()

fig, axes = plt.subplots(1, 2, figsize=(12, 5), sharey=True)

# Treatment histogram
axes[0].hist(treatment_donors, bins=30, edgecolor='black')
axes[0].axvline(treatment_mean, color='red', linestyle='dashed', linewidth=2)
axes[0].set_title('Treatment Group')
axes[0].set_xlabel('Donation Amount')
axes[0].set_ylabel('Frequency')
axes[0].annotate(f'Mean = ${treatment_mean:.2f}', xy=(treatment_mean, 10),
                 xytext=(treatment_mean + 10, 20), arrowprops=dict(facecolor='red', arrowstyle='->'))

# Control histogram
axes[1].hist(control_donors, bins=30, edgecolor='black', color='orange')
axes[1].axvline(control_mean, color='red', linestyle='dashed', linewidth=2)
axes[1].set_title('Control Group')
axes[1].set_xlabel('Donation Amount')
axes[1].annotate(f'Mean = ${control_mean:.2f}', xy=(control_mean, 10),
                 xytext=(control_mean + 10, 20), arrowprops=dict(facecolor='red', arrowstyle='->'))

plt.tight_layout()
plt.show()

The histograms show the distribution of donation amounts among individuals who gave, separated by treatment group. Both distributions are right-skewed, with most donations concentrated at lower amounts. The red dashed lines mark the mean donation in each group: $43.87 for treatment and $45.54 for control. The similarity in means visually confirms earlier statistical results, indicating that while the presence of a match may influence whether someone donates, it does not significantly affect how much they give once they’ve decided to contribute.

Simulation Experiment

As a reminder of how the t-statistic “works,” in this section I use simulation to demonstrate the Law of Large Numbers and the Central Limit Theorem.

Suppose the true distribution of respondents who do not get a charitable donation match is Bernoulli with probability p=0.018 that a donation is made.

Further suppose that the true distribution of respondents who do get a charitable donation match of any size is Bernoulli with probability p=0.022 that a donation is made.

Law of Large Numbers

np.random.seed(42)

# Control group: Bernoulli(p=0.018), 100,000 draws
control_sim = np.random.binomial(n=1, p=0.018, size=100000)

# Treatment group: Bernoulli(p=0.022), 10,000 draws
treatment_sim = np.random.binomial(n=1, p=0.022, size=10000)

# diff_vector = treatment_sim - np.random.choice(control_sim, size=10000)
diff_vector = treatment_sim - control_sim[:10000]

cumulative_avg = np.cumsum(diff_vector) / np.arange(1, len(diff_vector) + 1)

true_diff = 0.022 - 0.018

plt.figure(figsize=(10, 5))
plt.plot(cumulative_avg, label='Cumulative Average of Differences')
plt.axhline(y=true_diff, color='red', linestyle='--', label=f'True Difference = {true_diff:.3f}')
plt.title('Law of Large Numbers: Cumulative Avg of Bernoulli Differences (Treatment - Control)')
plt.xlabel('Number of Simulated Samples')
plt.ylabel('Cumulative Average Difference')
plt.legend()
plt.tight_layout()
plt.show()

This plot illustrates the Law of Large Numbers using simulated donation data. I calculated the cumulative average difference in donation rates between treatment (2.2 percent) and control (1.8 percent) groups across 10,000 simulated comparisons. The blue line shows how the average difference stabilizes, while the red dashed line marks the true difference (0.004). As more samples accumulate, the cumulative average converges to the true value, confirming that larger samples yield more reliable estimates.

Central Limit Theorem

sample_sizes = [50, 200, 500, 1000]
simulations = 1000
p_control = 0.018
p_treatment = 0.022

diff_distributions = {}

np.random.seed(42)

for n in sample_sizes:
    diffs = []
    for _ in range(simulations):
        control_draw = np.random.binomial(1, p_control, n)
        treatment_draw = np.random.binomial(1, p_treatment, n)
        mean_diff = treatment_draw.mean() - control_draw.mean()
        diffs.append(mean_diff)
    diff_distributions[n] = diffs

fig, axes = plt.subplots(2, 2, figsize=(12, 8))
axes = axes.flatten()

for i, n in enumerate(sample_sizes):
    axes[i].hist(diff_distributions[n], bins=30, edgecolor='black', alpha=0.7)
    axes[i].axvline(0, color='black', linestyle='--', label='Zero')
    axes[i].set_title(f"Sample Size = {n}")
    axes[i].set_xlabel("Mean Difference (Treatment - Control)")
    axes[i].axvline(p_treatment - p_control, color='red', linestyle='--', label='True Difference')
    axes[i].set_ylabel("Frequency")
    axes[i].legend()

plt.tight_layout()
plt.show()

These histograms show the distribution of average differences in donation rates between treatment and control groups across 1,000 simulations at sample sizes of 50, 200, 500, and 1000. At smaller sizes, the distributions are wide and zero (red) is near the center, reflecting high uncertainty. As the sample size grows, the distributions narrow and more centered around the true difference (black) and zero (red) moves toward the tail, making it less likely. This demonstrates the Central Limit Theorem and shows that larger samples improve our ability to detect small treatment effects.