Subsetting in SAS – Part 2 (WHERE Statement)
Welcome back to Mindful Data Minds! In the previous session, we learned how to subset datasets using the IF statement. In this session, we’ll explore the WHERE statement and understand how it differs from IF.
Watch the Full Tutorial
What You Will Learn
- How to use the WHERE statement for filtering.
- The difference between IF and WHERE.
- Why WHERE can’t filter on newly created columns.
- How to combine conditions with AND and OR.
- A simple analogy to remember the difference between IF and WHERE.
Starting Point
We’ll again use the built-in dataset SASHELP.CARS.
data b1;
set sashelp.cars;
run;
Dataset b1 has 428 rows and 15 columns.
Using WHERE Statement
Filter by Car Make
data b2;
set sashelp.cars;
where lowcase(strip(make))='acura';
run;
Result: Only 7 rows for Acura cars.
Filter by MSRP
Now let’s filter cars with MSRP > 30000:
data b3;
set sashelp.cars;
where lowcase(strip(make))='acura';
where MSRP > 30000;
run;
Result: 190 rows.
Notice the difference: IF would have allowed multiple conditions separately, but WHERE applies only the last condition if you try to stack them.
Difference Between IF and WHERE
- IF Statement → Can be used multiple times in the same Data Step. It works after data is read, so you can filter based on new columns you create.
- WHERE Statement → Can only be used once. It works before data is read, so it can filter only on existing columns in the dataset
Combining Conditions
You can combine conditions in WHERE using AND or OR.
Example: Acura cars with MSRP > 30000
data b4;
set sashelp.cars;
where lowcase(strip(make))='acura' and msrp>30000;
run;
Result: Only Acura cars priced above 30,000.
Example: Acura OR cars priced above 30000
data b5;
set sashelp.cars;
where lowcase(strip(make))='acura' or msrp>30000;
run;
Result: All Acura cars plus any car with MSRP greater than 30,000.
Key Limitation of WHERE
If you create a new column in your Data Step, WHERE cannot filter on it because WHERE runs before the data is read.
Example:
data b6a;
set sashelp.cars;
new_col=msrp*10;
if new_col>300000;
run;
Works fine with IF (190 records).
But with WHERE:
data b6b;
set sashelp.cars;
new_col=msrp*10;
where new_col>300000;
run;
❌ Error: Variable NEW_COL is not on file SASHELP.CARS.
Because WHERE can only filter on existing columns, not newly created ones.
Next Step
Continue learning by exploring the next tutorial in this series. Also subscribe to get notified about new lessons.
Have a Question?
Drop your doubts in the comments below or contact us.
