Subsetting in SAS – Part 3 (KEEP Statement)
Welcome back to Mindful Data Minds! In the last two sessions, we learned about horizontal subsetting using IF and WHERE. Now, let’s move to vertical subsetting — which means selecting specific columns instead of all of them.
In SAS, vertical subsetting is done using two statements:
- KEEP → Keeps only the columns you specify.
- DROP → Removes the columns you specify.
In this session, we’ll focus on the KEEP statement.
Watch the Full Tutorial
What You Will Learn
- How to use KEEP to select specific columns.
- How to keep all numeric or character variables.
- How to use ranges (–) and sequences (-) for column selection.
- How to use KEEP as a dataset option for efficiency.
- How to apply KEEP when creating new columns or multiple datasets.
Starting Point
We’ll use the built-in dataset SASHELP.CARS, which has 428 rows and 15 columns. Some of the columns are: Make, Model, Type, Origin, Drivetrain, MSRP, Invoice, EngineSize, Cylinders, Horsepower, MPG_City, MPG_Highway, Weight, Wheelbase, Length
Example 1: Keep Specific Columns
Suppose we only want Make and Model:
data c1;
set sashelp.cars;
keep make model;
run;
Result: Dataset c1 with only 2 columns (Make and Model) and 428 rows.
Example 2: Keep All Numeric Columns
SAS provides shortcuts to select all numeric or character variables.
data c2a;
set sashelp.cars;
keep _numeric_;
run;
Result: Dataset with all 10 numeric columns.
Similarly, for character variables:
data c2b;
set sashelp.cars;
keep _character_;
run;
Result: Dataset with 5 character columns (Make, Model, Type, Origin, Drivetrain).
Example 3: Keep a Range of Columns
You can use double dash (–) to keep all columns between two variables.
data c3;
set sashelp.cars;
keep make--origin;
run;
Result: Dataset with columns Make, Model, Type, and Origin.
Example 4: Keep Sequentially Named Columns
data c4;
set sashelp.appliance;
keep units_1-units_13 cycle;
run;
Result: Dataset with Units_1 through Units_13 plus Cycle.
Example 5: Keep Columns by Name Pattern
You can keep all variables starting with a letter. For example, keep all variables starting with M:
data c5;
set sashelp.cars;
keep m:;
run;
Result: Columns like Make, Model, MSRP, MPG_City, MPG_Highway are kept.
Example 6: Using KEEP as a Dataset Option
Instead of writing KEEP inside the Data Step, you can use it as a dataset option. This is more efficient because SAS only loads the required columns into memory.
Example:
data c6;
set sashelp.cars(keep=make model);
run;
Result: Dataset with only Make and Model, using less memory and time.
Example 7: Keep While Creating New Columns
You can also use KEEP when creating new columns:
data c7;
set sashelp.cars;
new_col = MSRP * 10;
keep make new_col;
run;
Result: Dataset with Make and the new calculated column.
Example 8: Different KEEP Options for Different Outputs
If you’re creating multiple datasets in one step, you can apply KEEP differently:
data c8(keep=make new_col) c9(keep=make _numeric_);
set sashelp.cars;
new_col = MSRP * 10;
run;
Result:
- c8 → Make and new_col.
- c9 → Make and all numeric columns.
Next Step
Continue learning by exploring the next tutorial in this series. Also subscribe to get notified about new lessons.
Have a Question?
Drop your doubts in the comments below or contact us.
