Functions in SAS – Part 1 (Character Functions)
Welcome back to Mindful Data Minds! In this session, we’ll explore character functions in SAS. Since SAS has only two data types — character and numeric — functions are grouped accordingly. Dates are treated as numeric, so we’ll cover them later. Today, let’s focus on character functions using the SASHELP.BASEBALL dataset.
Watch the Full Tutorial
What You Will Learn
- How to extract substrings with SUBSTR.
- How to split text using SCAN.
- How to clean strings with COMPRESS.
- The difference between INDEX and FIND.
- The difference between TRANSLATE and TRANWRD.
- How to change case with LOWCASE/UPCASE.
- How to join strings with CAT, CATX, CATT, CATS.
Preparing the Dataset
We split the dataset into two parts:
- BASEBALL_CHAR → All character variables (Name, Team, League, Division, Position, Div).
- BASEBALL_NUMERIC → All numeric variables.
data baseball_char(keep= _character_) baseball_numeric(keep= _numeric_);
set sashelp.baseball;
run;
This helps us practice character functions separately.
Substring Function
Extracts part of a string.
data a;
set baseball_char;
short_division = substr(division,1,1);
run;
Result: First character of Division (E for East, W for West).
Substring function has three parts:
- First argument → variable name.
- Second → starting position.
- Third → number of characters to extract.
Scan Function
Splits a string into chunks based on a delimiter.
data b;
set baseball_char;
first_name = scan(name,-1,",");
last_name = scan(name,1,",");
run;
Result: Separates first and last names from the Name variable.
Scan function has three parts:
- First argument → variable name.
- Second → Chunk position that needs to be extracted
- Third → Delimiter on the basis of which split will be done. Delimiter here is a comma.
Note:
- Negative positions pick from the end.
Compress Function
Removes unwanted characters.
data c;
set baseball_char;
comp_name = compress(name,",");
comp_pos = compress(position,"","d");
comp_pos_ch = compress(position,"","kd");
run;
Result:
- COMP_NAME = Removes commas from name
- COMP_POS = Removes digits from position
- COMP_POS_CH = keep only digits in position
Compress function has three parts:
- First argument → variable name.
- Second → Unwanted charater that needs to be removed
- Third → Modifier (optional)
Modifiers:
- d → remove digits.
- kd → keep digits (remove characters).
Index vs. Find
Both locate substrings, but with differences:
data d;
set baseball_char;
i=index(name,",");
f=find(name,", a","i");
run;
- INDEX → Finds position of a character (no modifiers).
- FIND → Finds substring with modifiers (e.g.,
ifor case-insensitive).
Translate vs. Tranwrd
Replace characters or substrings.
data e;
set baseball_char;
lg_t_late = translate(league,"ABC","Nat");
lg_t_wrd = tranwrd(league,"Nat","XYZ");
run;
Example: “NAT” replaced with “XYZ” using TRANWRD.
- TRANSLATE → Works character by character.
- TRANWRD → Works on substrings.
Case Conversion
data f;
set baseball_char;
l_name = lowcase(name);
u_name = upcase(name);
run;
- LOWCASE(name) → Converts to lowercase.
- UPCASE(name) → Converts to uppercase.
Concatenation Functions
Join strings together.
data g;
set b;
c_nm = cat(first_name,last_name);
cx_nm = catx(",", first_name,last_name);
ct_nm = catT(first_name,last_name);
cs_nm = catS(first_name,last_name);
run;
- CAT → Simple concatenation.
- CATX → Concatenation with delimiter.
- CATT → Trims trailing spaces.
- CATS → Strips leading and trailing spaces.
Example: “Andy Elson” joined as “Andy,Elson” with CATX.
Next Step
Continue learning by exploring the next tutorial in this series. Also subscribe to get notified about new lessons.
Have a Question?
Drop your doubts in the comments below or contact us.
