Summarizing Data in SAS – PROC SUMMARY

Welcome back to Mindful Data Minds! In this session, we’ll explore PROC SUMMARY, a powerful SAS procedure used to calculate descriptive statistics across all observations or within specific groups. It’s very similar to PROC MEANS, but with a few key differences.

Watch the Full Tutorial

What You Will Learn

  • How to use PROC SUMMARY for descriptive statistics.
  • Why it differs from PROC MEANS (no default print).
  • How to select specific variables and statistics.
  • The difference between CLASS and BY.
  • How to save results into a dataset with OUTPUT.
  • How to interpret the _TYPE_ variable in grouped outputs.

General Syntax

proc summary data=<dataset> print;
   var <numeric variables>;
   class <grouping variables>;
   by <grouping variables>;
   output out=<dataset> <statistics> / autoname;
run;
  • DATA= → Input dataset
  • PRINT → Required if you want results displayed (PROC SUMMARY does not print by default)
  • VAR → Numeric variables to analyze
  • CLASS / BY → Grouping variables
  • OUTPUT → Save results into a dataset

Default Behavior

By default, PROC SUMMARY does not print anything. You must use PRINT or OUTPUT.

proc summary data=sashelp.cars print;
run;

Produces only the number of observations.

Selecting Specific Variables

You can specify which numeric variables to summarize:

proc summary data=sashelp.cars print;
   var msrp invoice enginesize;
run;

Displays N, Mean, Std Dev, Min, and Max for the selected variables.

Grouping Data: CLASS vs. BY

CLASS Statement

proc summary data=sashelp.cars print;
   class origin;
   var msrp invoice enginesize;
run;

Produces one table with statistics grouped by Origin.

BY Statement

proc sort data=sashelp.cars out=a;
   by origin;
run;

proc summary data=a print;
   by origin;
   var msrp invoice enginesize;
run;

Produces separate tables for each Origin.

Note: Data must be sorted before using BY.

Difference:

  • CLASS → One combined table.
  • BY → Multiple separate tables.

Saving Results with OUTPUT

You can save results into a dataset:

proc summary data=sashelp.cars;
   class origin;
   var msrp invoice enginesize;
   output out=cars_class n mean median min max / autoname;
run;

Creates a dataset cars_class with statistics for each variable, automatically named (e.g., msrp_mean, invoice_min).

Understanding TYPE Variable

When using CLASS:

  • _TYPE_=0 → Overall statistics (no grouping).
  • _TYPE_=1 → Grouped statistics (by Origin).

With BY, only grouped statistics are produced.

Next Step

Continue learning by exploring the next tutorial in this series. Also subscribe to get notified about new lessons.

Have a Question?

Drop your doubts in the comments below or contact us.

Leave a Reply

Your email address will not be published. Required fields are marked *