Max Planck Institute for Political and Social Science
Published
May 17, 2026
In the following, I demonstrate how to estimate multinomial logistic regressions on survey-weighted data and compute average marginal effects and predictive margins in R. The aim is to produce results that closely match the corresponding results in Stata. While R provides excellent tools for regression analysis, working with survey-weighted multinomial models and reporting results in terms of average marginal effects requires combining several packages in a way that can be difficult to figure out. I hope this walkthrough proves useful both for those who wish to estimate multinomial logistic regressions with survey data but do not have access to Stata, and for those who have previously used R only for data visualization and simpler models and now seek to extend their use of R to additional model types.
1 Data
I employ a simulated dataset comprising the categorical variables voting intention (vote), gender (gender), and perceived fairness of one’s received share of societal prosperity (fairshare), as well as the continuous variable age (age). Voting intention serves as the dependent variable, while the remaining characteristics function as explanatory variables.
Contains data from ../data/test_voting.dta
Observations: 1,586
Variables: 5 15 Feb 2026 16:11
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-----------------------------------------------------------------------------------------------------------------------------------------------------------
age float %11.0g alter Age
weight double %10.0g Weight
vote float %9.0g vote Vote intention
gender float %9.0g gender Gender
fairshare float %14.0g fairshare
Get fair share?
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:
vote -- Vote intention
---------------------------------------------------------------
| Freq. Percent Valid Cum.
------------------+--------------------------------------------
Valid 1 SPD | 318 20.05 20.05 20.05
2 CDU/CSU | 280 17.65 17.65 37.70
3 AfD | 173 10.91 10.91 48.61
4 Greens | 594 37.45 37.45 86.07
5 Other | 221 13.93 13.93 100.00
Total | 1586 100.00 100.00
---------------------------------------------------------------
gender -- Gender
--------------------------------------------------------------
| Freq. Percent Valid Cum.
-----------------+--------------------------------------------
Valid 1 Male | 837 52.77 52.77 52.77
2 Female | 749 47.23 47.23 100.00
Total | 1586 100.00 100.00
--------------------------------------------------------------
fairshare -- Get fair share?
----------------------------------------------------------------------
| Freq. Percent Valid Cum.
-------------------------+--------------------------------------------
Valid 1 More than fair | 315 19.86 19.86 19.86
2 Fair | 899 56.68 56.68 76.54
3 Less than fair | 372 23.46 23.46 100.00
Total | 1586 100.00 100.00
----------------------------------------------------------------------
Age
-------------------------------------------------------------
Percentiles Smallest
1% 18 16
5% 25 16
10% 32 16 Obs 1,586
25% 44 16 Sum of wgt. 1,586
50% 59 Mean 56.3285
Largest Std. dev. 16.88207
75% 69 90
90% 78 91 Variance 285.0043
95% 82 95 Skewness -.3369928
99% 87 96 Kurtosis 2.436615
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
weight | 1,586 .9181104 1.169748 .0658024 15.05075
2 Workflow in Stata
2.1 Model
Using these variables, I estimate a multinomial logistic regression model with survey weights. The model yields coefficients in the metric of logits (log odds), which in the multinomial case express log odds relative to the reference category of the dependent variable. Coefficients for categorical predictors are additionally expressed relative to their respective reference category.
use"../data/test_voting.dta", clearmlogit vote age i.fairshare i.gender [pweight = weight], base(1)fitstat
Since logit coefficients are difficult to interpret for a general audience — and often for researchers as well — it is good practice to convert them into predicted probabilities. This yields results in an intuitive metric and provides estimates for all categories of the dependent variable, including the reference category.
One common approach to converting results into the metric of probabilities is to compute average marginal effects, which express the effect of each independent variable as a change in the predicted probability of each outcome of the dependent variable. In Stata, average marginal effects can be obtained using the margins command with the dydx option. Since the corresponding code must be specified separately for each category of the dependent variable, I employ a loop.
The resulting coefficients for gender, for instance, indicate how much the predicted probability of voting for the SPD differs between women and men. In our case, women are 2.7 percentage points less likely than men to vote for the SPD, though this difference does not reach conventional levels of statistical significance.
use"../data/test_voting.dta", cleareststo clearforval k = 1/5 { quie: mlogit vote age i.fairshare i.gender [pweight = weight], base(1) quie: margins, dydx(*) predict(outcome(`k')) postest sto m`k'}esttab m1 m2 m3 m4 m5, compress nogap label b(4) se(4) /// mtitle(SPD CDU AfD Greens Other)
(1) (2) (3) (4) (5)
SPD CDU AfD Greens Other
---------------------------------------------------------------------------------
Age 0.0045*** 0.0032*** -0.0022** -0.0013 -0.0041***
(0.0007) (0.0008) (0.0007) (0.0009) (0.0010)
More than fair 0.0000 0.0000 0.0000 0.0000 0.0000
(.) (.) (.) (.) (.)
Fair -0.0412 0.0390 0.0614* -0.1281** 0.0689*
(0.0381) (0.0396) (0.0297) (0.0468) (0.0337)
Less than fair -0.0256 -0.0155 0.1137** -0.2471*** 0.1744***
(0.0450) (0.0393) (0.0388) (0.0509) (0.0517)
Male 0.0000 0.0000 0.0000 0.0000 0.0000
(.) (.) (.) (.) (.)
Female -0.0272 -0.0434 -0.0090 0.0996** -0.0200
(0.0271) (0.0282) (0.0309) (0.0326) (0.0368)
---------------------------------------------------------------------------------
Observations 1586 1586 1586 1586 1586
---------------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
2.3 Predictive Margins
To additionally obtain the predicted probabilities of voting for each party across subgroups, one can likewise use the margins command. These values are particularly useful for plotting the model results.
For the analyses in R, we require several packages. The CMAverse package estimates weighted multinomial logistic regressions, while additional packages are needed for computing marginal effects (marginaleffects), predictive margins (modelbased), and fit measures (performance).
Survey-weighted multinomial logistic regression models can be estimated using the svymultinom function, which accepts R’s standard formula notation. Note that the weight variable must be referenced using base-R notation and must have the same (number of) cases as the full model. The svymultinom function computes the weighted logits and obtains the variance–covariance matrix via a jackknife procedure, in which the model is re-estimated repeatedly by systematically excluding individual observations and adjusting the weights accordingly.1 We will need both the coefficient estimates and the variance–covariance matrix separately later on, so they are stored in separate objects here.
# read datadata<-read_dta(here("data", "test_voting.dta"))|>unlabelled()# fit modelfit_svy<-svymultinom(vote~age+fairshare+gender, weights =data$weight, data =data)fit_coefs<-fit_svy|>pluck("NAIVEreg")fit_vcov<-fit_svy|>pluck("vcov")# standard summarysummary(fit_svy)
As in Stata, we now convert the logit coefficients into average marginal effects. This is accomplished via the avg_slopes function, which takes the weighted coefficient estimates and the variance–covariance matrix as separate arguments. Survey weights must likewise be applied at this stage. (In Stata, margins automatically inherits the weighting from the preceding mlogit call; in R, the weights must be passed to avg_slopes explicitly.)
# estimate AMEsame<-avg_slopes(fit_coefs, type ="probs", vcov =vcov(fit_svy), wts =data$weight)# show first columnsame|>select(1:5)
Term Group Contrast Estimate Std. Error
age SPD dY/dX 0.00448 0.000716
age CDU/CSU dY/dX 0.00318 0.000771
age AfD dY/dX -0.00219 0.000734
age Greens dY/dX -0.00134 0.000918
age Other dY/dX -0.00413 0.001092
fairshare SPD Fair - More than fair -0.04114 0.039175
fairshare SPD Less than fair - More than fair -0.02557 0.046425
fairshare CDU/CSU Fair - More than fair 0.03896 0.041617
fairshare CDU/CSU Less than fair - More than fair -0.01547 0.041352
fairshare AfD Fair - More than fair 0.06138 0.030927
fairshare AfD Less than fair - More than fair 0.11374 0.040463
fairshare Greens Fair - More than fair -0.12812 0.048472
fairshare Greens Less than fair - More than fair -0.24708 0.053121
fairshare Other Fair - More than fair 0.06892 0.034977
fairshare Other Less than fair - More than fair 0.17438 0.054638
gender SPD Female - Male -0.02716 0.027560
gender CDU/CSU Female - Male -0.04341 0.028853
gender AfD Female - Male -0.00902 0.032137
gender Greens Female - Male 0.09960 0.033525
gender Other Female - Male -0.02000 0.038451
To arrange these values in a table, some additional cleaning of the output is required. Specifically, we need one column for the parties and one for the categories of the independent variables. The group variable already contains correctly labeled party names. The contrast variable, however, is labeled dY/dX for continuous variables and includes the reference category alongside the levels of each factor variable.2 Both must be corrected before proceeding.
# get clean contrast column:# delete reference category (regex)# term name for metric variableame_clean<-ame|>mutate( contrast =str_remove(contrast, " - .*"), contrast =replace_when(contrast,contrast=="dY/dX"~str_to_title(term)), term ="")# show first columnsame_clean|>select(1:4)
Group Contrast Estimate
SPD Age 0.00448
CDU/CSU Age 0.00318
AfD Age -0.00219
Greens Age -0.00134
Other Age -0.00413
SPD Fair -0.04114
SPD Less than fair -0.02557
CDU/CSU Fair 0.03896
CDU/CSU Less than fair -0.01547
AfD Fair 0.06138
AfD Less than fair 0.11374
Greens Fair -0.12812
Greens Less than fair -0.24708
Other Fair 0.06892
Other Less than fair 0.17438
SPD Female -0.02716
CDU/CSU Female -0.04341
AfD Female -0.00902
Greens Female 0.09960
Other Female -0.02000
Term:
Having obtained and formatted the average marginal effects, we must additionally extract the number of observations and goodness-of-fit statistics, which are to be reported in the table notes.
The predictive margins can be obtained with the estimate_means function. Its output can easily be transformed into a tibble for further processing and for plotting. The plot supports the same modifications as any ggplot graphic.
margins<-estimate_means(fit_coefs, by ="gender", estimate ="population", vcov =vcov(fit_svy), wts =data$weight)margins
Average Counterfactual Predictions
gender | Response | Probability | SE | 95% CI | t(1566)
---------------------------------------------------------------
Male | SPD | 0.20 | 0.02 | [0.16, 0.25] | 9.55
Male | CDU/CSU | 0.20 | 0.02 | [0.16, 0.24] | 9.39
Male | AfD | 0.15 | 0.02 | [0.11, 0.19] | 6.88
Male | Greens | 0.24 | 0.02 | [0.20, 0.28] | 12.03
Male | Other | 0.20 | 0.03 | [0.15, 0.25] | 7.99
Female | SPD | 0.18 | 0.02 | [0.14, 0.21] | 9.93
Female | CDU/CSU | 0.16 | 0.02 | [0.12, 0.19] | 7.79
Female | AfD | 0.14 | 0.02 | [0.09, 0.19] | 5.76
Female | Greens | 0.34 | 0.03 | [0.29, 0.40] | 12.90
Female | Other | 0.18 | 0.03 | [0.13, 0.24] | 6.34
Variable predicted: vote
Predictors modulated: gender
Predictors averaged: age (56), fairshare
Predictions are on the probs-scale.
The following tables compare the results from Stata and R side by side, reporting point estimates and standard errors for each. The comparison shows that R produces slightly larger, i.e. more conservative, standard errors, possibly reflecting differences in how the variance–covariance matrix is computed (jackknife in R versus the robust sandwich estimator in Stata). Point estimates are virtually identical.
4.1 Logits
Stata
R
Estimate
SE
Estimate
SE
CDU_CSU
Age
−0.006
0.008
−0.006
0.008
Female
−0.100
0.240
−0.100
0.245
Fair
0.428
0.333
0.428
0.347
Less than fair
0.036
0.374
0.036
0.390
Constant
0.071
0.542
0.071
0.565
AfD
Age
−0.044
0.007
−0.044
0.007
Female
0.092
0.291
0.092
0.300
Fair
0.819
0.386
0.819
0.400
Less than fair
1.108
0.424
1.108
0.439
Constant
1.162
0.505
1.162
0.521
Greens
Age
−0.031
0.006
−0.031
0.006
Female
0.512
0.206
0.512
0.210
Fair
−0.128
0.262
−0.128
0.269
Less than fair
−0.675
0.328
−0.675
0.339
Constant
2.075
0.420
2.075
0.432
Other
Age
−0.051
0.008
−0.051
0.009
Female
0.052
0.286
0.052
0.296
Fair
0.749
0.370
0.749
0.383
Less than fair
1.201
0.431
1.200
0.449
Constant
1.774
0.496
1.774
0.514
4.2 Marginal Effects
Stata
R
Estimate
SE
Estimate
SE
SPD
Age
0.004
0.001
0.004
0.001
Female
−0.027
0.027
−0.027
0.028
Fair
−0.041
0.038
−0.041
0.039
Less than fair
−0.026
0.045
−0.026
0.046
CDU/CSU
Age
0.003
0.001
0.003
0.001
Female
−0.043
0.028
−0.043
0.029
Fair
0.039
0.040
0.039
0.042
Less than fair
−0.015
0.039
−0.015
0.041
AfD
Age
−0.002
0.001
−0.002
0.001
Female
−0.009
0.031
−0.009
0.032
Fair
0.061
0.030
0.061
0.031
Less than fair
0.114
0.039
0.114
0.040
Greens
Age
−0.001
0.001
−0.001
0.001
Female
0.100
0.033
0.100
0.034
Fair
−0.128
0.047
−0.128
0.048
Less than fair
−0.247
0.051
−0.247
0.053
Other
Age
−0.004
0.001
−0.004
0.001
Female
−0.020
0.037
−0.020
0.038
Fair
0.069
0.034
0.069
0.035
Less than fair
0.174
0.052
0.174
0.055
4.3 Predictive Margins
Stata
R
Estimate
SE
Estimate
SE
SPD
Male
0.204
0.021
0.204
0.021
Female
0.177
0.017
0.177
0.018
CDU/CSU
Male
0.199
0.021
0.199
0.021
Female
0.156
0.019
0.156
0.020
AfD
Male
0.149
0.021
0.149
0.022
Female
0.140
0.023
0.140
0.024
Greens
Male
0.244
0.020
0.244
0.020
Female
0.344
0.026
0.344
0.027
Other
Male
0.203
0.025
0.203
0.025
Female
0.183
0.027
0.183
0.029
Footnotes
Under the hood, the function uses the svydesign and as.svrepdesign functions from the survey package.↩︎
R, like Stata, computes the average marginal effect (dY/dX) for continuous variables and the average discrete change for categorical variables. In practice, both are commonly referred to as average marginal effects.↩︎