Summarizing Data in SAS – PROC SUMMARY
Welcome back to Mindful Data Minds! In this session, we’ll explore PROC SUMMARY, a powerful SAS procedure used to calculate descriptive statistics across all observations or within specific groups. It’s very similar to PROC MEANS, but with a few key differences.
Watch the Full Tutorial
What You Will Learn
- How to use PROC SUMMARY for descriptive statistics.
- Why it differs from PROC MEANS (no default print).
- How to select specific variables and statistics.
- The difference between CLASS and BY.
- How to save results into a dataset with OUTPUT.
- How to interpret the
_TYPE_variable in grouped outputs.
General Syntax
proc summary data=<dataset> print;
var <numeric variables>;
class <grouping variables>;
by <grouping variables>;
output out=<dataset> <statistics> / autoname;
run;
- DATA= → Input dataset
- PRINT → Required if you want results displayed (PROC SUMMARY does not print by default)
- VAR → Numeric variables to analyze
- CLASS / BY → Grouping variables
- OUTPUT → Save results into a dataset
Default Behavior
By default, PROC SUMMARY does not print anything. You must use PRINT or OUTPUT.
proc summary data=sashelp.cars print;
run;
Produces only the number of observations.
Selecting Specific Variables
You can specify which numeric variables to summarize:
proc summary data=sashelp.cars print;
var msrp invoice enginesize;
run;
Displays N, Mean, Std Dev, Min, and Max for the selected variables.
Grouping Data: CLASS vs. BY
CLASS Statement
proc summary data=sashelp.cars print;
class origin;
var msrp invoice enginesize;
run;
Produces one table with statistics grouped by Origin.
BY Statement
proc sort data=sashelp.cars out=a;
by origin;
run;
proc summary data=a print;
by origin;
var msrp invoice enginesize;
run;
Produces separate tables for each Origin.
Note: Data must be sorted before using BY.
Difference:
- CLASS → One combined table.
- BY → Multiple separate tables.
Saving Results with OUTPUT
You can save results into a dataset:
proc summary data=sashelp.cars;
class origin;
var msrp invoice enginesize;
output out=cars_class n mean median min max / autoname;
run;
Creates a dataset cars_class with statistics for each variable, automatically named (e.g., msrp_mean, invoice_min).
Understanding TYPE Variable
When using CLASS:
_TYPE_=0→ Overall statistics (no grouping)._TYPE_=1→ Grouped statistics (by Origin).
With BY, only grouped statistics are produced.
Next Step
Continue learning by exploring the next tutorial in this series. Also subscribe to get notified about new lessons.
Have a Question?
Drop your doubts in the comments below or contact us.
