Sorting Data in SAS – PROC SORT and First./Last. Concept
Welcome back to Mindful Data Minds! In this session, we’ll learn how to sort datasets using PROC SORT and explore the powerful FIRST. and LAST. automatic variables in SAS.
Watch the Full Tutorial
What You Will Learn
- How to sort datasets using PROC SORT.
- How to create new sorted datasets.
- How to remove duplicates with NODUPKEY.
- How to use FIRST. and LAST. automatic variables.
- How to select first, last, or nth records within groups.
Importing Data
We start with an Excel file containing:
- Employee Details → Employee ID, First Name, Last Name, Date of Birth.
- Employee Designation → Employee ID, Designation (with duplicates like Recruiter → Senior Recruiter → Lead Recruiter).
We import both sheets using PROC IMPORT:
proc import datafile="sample data.xlsx"
out=emp_data
dbms=xlsx replace;
sheet="Emp_details";
run;
proc import datafile="sample data.xlsx"
out=emp_des
dbms=xlsx replace;
sheet="Emp_des";
run;
Sorting Data with PROC SORT
Basic syntax:
proc sort data=emp_data;
by employeeID;
run;
Result: Sorts data by Employee ID in ascending order.
Creating a New Sorted Dataset
proc sort data=emp_data out=s_emp_data;
by employeeID;
run;
Keeps the original dataset intact and creates a new sorted dataset.
Sorting in Descending Order
proc sort data=emp_data out=s_emp_data;
by employeeID descending firstName;
run;
Result: Sorts by Employee ID (ascending) and First Name (descending).
Removing Duplicates
PROC SORT can also remove duplicates.
Remove Duplicates by Key
proc sort data=emp_des out=s_emp_des nodupkey;
by employeeid;
run;
Result: Keeps the first occurrence of each Employee ID, removes duplicates.
Output Duplicate Records
proc sort data=emp_des out=s_emp_des dupout=d_emp_des nodupkey;
by employeeid;
run;
Result: Creates one dataset with unique records and another with duplicates.
Note: NODUP (removes entire duplicate rows) is deprecated in SAS 9.4. Use NODUPKEY instead.
FIRST. and LAST. Automatic Variables
When sorting with BY, SAS creates two automatic binary variables:
- FIRST.variable → 1 for the first occurrence of a BY group.
- LAST.variable → 1 for the last occurrence of a BY group.
data check;
set s_emp_des;
by employeeID;
first_flag = first.employeeID;
last_flag = last.employeeID;
run;
For Employee ID with multiple records:
- FIRST. = 1 for the first record.
- LAST. = 1 for the last record.
- Middle records = 0.
Selecting Specific Records
Last Record Only:
data last_rec;
set s_emp_des;
by employeeID;
if last.employeeID;
run;
Nth Record (e.g., 2nd):
data nth_rec;
set s_emp_des;
by employeeID;
if first.employeeID then count=1;
else count+1;
if count=2;
run;
Useful for selecting second, third, or any specific record within a group.
Download Section
Next Step
Continue learning by exploring the next tutorial in this series. Also subscribe to get notified about new lessons.
Have a Question?
Drop your doubts in the comments below or contact us.
