Wednesday, November 18, 2009

COMPRESS with third argument

To remove all non-printable characters from your string i.e all ASCII's below 32 and above 126, you can do by providing "KW" (keep writable) as THIRD argument to compress function.
Example
data eject_a_page;
name = "Vikas" || '12'x || " Gaddu";
newname = compress(name,,"kw");
run;

do leave loop and continue !!!

There are two statements in SAS, leave and continue. Difference between two is seldom not understood.

data leave;
do i = 1 to 5;
do j = 1 to 5;
if j = 2 or j = 4 then leave;
output;
end;
end;
run;

Here SAS will leave inner do loop completely once the first condition j=2 gets satisfied. So in output you will never have j = 2 or 3 or 4 or 5.

OUTPUT
i j
1 1
2 1
3 1
4 1
5 1

Same program with continue statement.


data continue;
do i = 1 to 5;
do j = 1 to 5;
if j = 2 or j = 4 then continue;
output;
end;
end;
run;

Here SAS will skip the inner loop for both conditions j = 2 and j=4, but it will not completely leave the loop. So ouput for this will look like

i j
1 1
1 3
1 5
2 1
2 3
2 5
. .
. .
. .
5 3
5 5

Thursday, June 25, 2009

SUBSTR function can be used on both side of assignment statment

data name;
name = "vikus";
substr(name,4,1) = "a";
run;

proc print ;
run;

This will change u to a in name variable.

data name;
name = "Vikas Gaddu";
firstname = substr(name,1,5);
run;

proc print;
run;

Firstname variable will contain Vikas.

Wednesday, June 24, 2009

Creating Special Missing Values

data student_score;
student = 1; score=50; output;
student = 2; score =.a; output;
student = 3; socre=.n; output;
run;

proc print;
run;

Here special numeric value a means that student was absent and n means incomplete test. So single missing values can have multiple meaning in SAS.

Tuesday, June 16, 2009

What is _n_?

Internal SAS counter to represent number of observation being read.

In the flow of DATA step processing, what is the first action in a typical DATA Step?

Read data into PDV.

Name statements that are execution only.

IF, run, title, footnotes

Name statements that function at both compile and execution time.

DATA statement at compile time will tell the compiler name of the dataset to be filled in dataset descriptor portion. At run time it will tell compiler when to re-initialize user created variable.

Identify statements whose placement in the DATA step is critical.

1> if restriction_criteria; statement
2> output; statement
3> assignment statments eg
if minage < 20 ;
minage = min(age); /* This statment should be above IF statment*/

Name statements that are recognized at compile time only?

KEEP, DROP, LENGHT, ATTRIB, Syntax errors, BY, RETAIN, WHERE, FORMAT, INFORMAT, ARRAY

At compile time when a SAS data set is read, what items are created?

PDV is intialized, INPUT buffer size, descriptor portion of dataset is set like name of dataset, engine, host operating system etc.

Does SAS ‘Translate’ (compile) or does it ‘Interpret’? Explain.

SAS is a compiler. SAS does not read and execute one statement at a time. It will either wait for a RUN, QUIT, next DATA or PROC statement before executing its statment. This process is similar to a compiler than a interpreter.

What is the Program Data Vector (PDV)? What are its functions?

PDV is internal temparary memory which SAS uses to store one observation at one time from input dataset. It also has some automatic system flag variables like _ERROR_ or _N_ etc.

How would you code a merge that will keep only the observations that have matches from both sets.

data c;
merge a(in=ina) b(in=inb);
by commonvar;
if ina and inb;
run;

How would you delete observations with duplicate keys?

PROC SORT with NODUPKEY option

How would you delete duplicate observations?

PROC SORT option with NODUPREC

Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables.

proc sort data = mydata;
by state district county _numeric_;
run;

If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variable?

KEEP statement

When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc?

SCAN will scan for word.
INDEX will search for the pattern so will be the best.
INDEXC will search for the character, so not that useful in this case.

How would you include common or reuse code to be processed along with your statements?

Create a autocall macro with reuse code to be used anywhere.

Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself.

%INCLUDE statment

What is the one statement to set the criteria of data that can be coded in any step?

%LET condition = where age > 70;

data age70;
set demog;
&condition;
run;

%LET statement criteria can be used in any step!!!

If you’re not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set?

DATA _NULL_; statment will do that.

If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record?

Not Sure. Maybe we can use _INFILE_ automatic variable.

What statement do you code to tell SAS that it is to write to an external file? What statement do you code to write the record to the file?

PUT statement

Under what circumstances would you code a SELECT construct instead of IF statements?

If I am comparing numeric variable then it is faster to use SELECT than IF statment.

What is the purpose of the trailing @? The @@? How would you use them?

@ to input multiple lines in rawdata to single observation in a dataset
@@ to input single line in rawdata to multiple observations in a dataset

How would you code the criteria to restrict the output to be produced?

IF restriction_criteria ;

Name and describe three SAS functions that you have used, if any?

PROPCASE : Converts first alphabet of every our upcase.
SCAN: searches for a word in a string.
SMALLEST: Find smallest non-missing variable from an array or list of variables.

What is the difference between an informat and a format? Name three informats or formats

INFORMAT is how you want to read a data and FORMAT is how you want to write a data.

User created FORMATS and INFORMATS can be created using PROC FORMAT. Some defaults INFORMATS are date9., dollar8.2, comma8.2 and these can also be used as FORMATS.

If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn’t have a value?

You need to use MISSOVER option in INFILE statment.

Are you familiar with special input delimiters? How are they used?

On INFILE statement you can use DLM option to specify any special input delimiters.

How do you read in the variables that you need?

Using KEEP or DROP statement

What SAS statements would you code to read an external raw data file to a DATA step?

INPUT statement

Thursday, April 30, 2009

CALL SYMPUT and CALL SYMPUTX

CALL SYMPUT will convert data step variable value into marco variable value and it will store value of in nearest non-missing symbol table. In case below nearest non-missing symbol table is local symbol table of marco name. When we try to access name macro variable outside the macro SAS issues a warning.

%macro name(string=);
data _null_;
 name = "&string";
 call symput("name",name);
run;
%put &name;
%mend;
%name (string= Vikas);
%put &name;
/*WARNING: Apparent symbolic reference NAME not resolved.*/
CALL SYMPUTX new in SAS v9 can have an extra third argument where you can specify which symbol table you want SAS to store value of this macro variable. Code below is same as above except we are using call symputx and third argument saying we want macro variable to be saved in global symbol table. So this time around we don't get the Warning message.

%macro name(string=);
data _null_;
 name = "&string";
 call symputx("name",name,'G');
run;
%put &name;
%mend;
%name (string= Vikas);
%put &name;

SAS to keep only valid characters in a String

/* These are the only valid characters */
%let validchars = 'a' 'b' '1' '2' '_' '$';


data validchars;
string = "abc123_*";
str_len = length(string );


do while( str_len > 0);
if substr(string, str_len,1) not in (&validchars )then substr(string,str_len,1) = " ";
str_len = str_len -1;
end;
string = compbl(string);
/*Removing multiple blanks */
run;
proc print;
run;

Assigning multiple formats to one variable depending on its value using PROC FORMAT


proc format;
value mydates
'1Jan2009'd - '31jan2009'd = [mmddyy8.]
'1feb2009'd - '28feb2009'd = [date7.]
;
run;


data dates;
first = '15jan2009'd ; output;
first = '15feb2009'd; output;
format first mydates.;
run;
proc print;
run;