Thursday, June 25, 2009

SUBSTR function can be used on both side of assignment statment

data name;
name = "vikus";
substr(name,4,1) = "a";
run;

proc print ;
run;

This will change u to a in name variable.

data name;
name = "Vikas Gaddu";
firstname = substr(name,1,5);
run;

proc print;
run;

Firstname variable will contain Vikas.

Wednesday, June 24, 2009

Creating Special Missing Values

data student_score;
student = 1; score=50; output;
student = 2; score =.a; output;
student = 3; socre=.n; output;
run;

proc print;
run;

Here special numeric value a means that student was absent and n means incomplete test. So single missing values can have multiple meaning in SAS.

Tuesday, June 16, 2009

What is _n_?

Internal SAS counter to represent number of observation being read.

In the flow of DATA step processing, what is the first action in a typical DATA Step?

Read data into PDV.

Name statements that are execution only.

IF, run, title, footnotes

Name statements that function at both compile and execution time.

DATA statement at compile time will tell the compiler name of the dataset to be filled in dataset descriptor portion. At run time it will tell compiler when to re-initialize user created variable.

Identify statements whose placement in the DATA step is critical.

1> if restriction_criteria; statement
2> output; statement
3> assignment statments eg
if minage < 20 ;
minage = min(age); /* This statment should be above IF statment*/

Name statements that are recognized at compile time only?

KEEP, DROP, LENGHT, ATTRIB, Syntax errors, BY, RETAIN, WHERE, FORMAT, INFORMAT, ARRAY

At compile time when a SAS data set is read, what items are created?

PDV is intialized, INPUT buffer size, descriptor portion of dataset is set like name of dataset, engine, host operating system etc.

Does SAS ‘Translate’ (compile) or does it ‘Interpret’? Explain.

SAS is a compiler. SAS does not read and execute one statement at a time. It will either wait for a RUN, QUIT, next DATA or PROC statement before executing its statment. This process is similar to a compiler than a interpreter.

What is the Program Data Vector (PDV)? What are its functions?

PDV is internal temparary memory which SAS uses to store one observation at one time from input dataset. It also has some automatic system flag variables like _ERROR_ or _N_ etc.

How would you code a merge that will keep only the observations that have matches from both sets.

data c;
merge a(in=ina) b(in=inb);
by commonvar;
if ina and inb;
run;

How would you delete observations with duplicate keys?

PROC SORT with NODUPKEY option

How would you delete duplicate observations?

PROC SORT option with NODUPREC

Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables.

proc sort data = mydata;
by state district county _numeric_;
run;

If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variable?

KEEP statement

When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc?

SCAN will scan for word.
INDEX will search for the pattern so will be the best.
INDEXC will search for the character, so not that useful in this case.

How would you include common or reuse code to be processed along with your statements?

Create a autocall macro with reuse code to be used anywhere.

Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself.

%INCLUDE statment

What is the one statement to set the criteria of data that can be coded in any step?

%LET condition = where age > 70;

data age70;
set demog;
&condition;
run;

%LET statement criteria can be used in any step!!!

If you’re not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set?

DATA _NULL_; statment will do that.

If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record?

Not Sure. Maybe we can use _INFILE_ automatic variable.

What statement do you code to tell SAS that it is to write to an external file? What statement do you code to write the record to the file?

PUT statement

Under what circumstances would you code a SELECT construct instead of IF statements?

If I am comparing numeric variable then it is faster to use SELECT than IF statment.

What is the purpose of the trailing @? The @@? How would you use them?

@ to input multiple lines in rawdata to single observation in a dataset
@@ to input single line in rawdata to multiple observations in a dataset

How would you code the criteria to restrict the output to be produced?

IF restriction_criteria ;

Name and describe three SAS functions that you have used, if any?

PROPCASE : Converts first alphabet of every our upcase.
SCAN: searches for a word in a string.
SMALLEST: Find smallest non-missing variable from an array or list of variables.

What is the difference between an informat and a format? Name three informats or formats

INFORMAT is how you want to read a data and FORMAT is how you want to write a data.

User created FORMATS and INFORMATS can be created using PROC FORMAT. Some defaults INFORMATS are date9., dollar8.2, comma8.2 and these can also be used as FORMATS.

If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn’t have a value?

You need to use MISSOVER option in INFILE statment.

Are you familiar with special input delimiters? How are they used?

On INFILE statement you can use DLM option to specify any special input delimiters.

How do you read in the variables that you need?

Using KEEP or DROP statement

What SAS statements would you code to read an external raw data file to a DATA step?

INPUT statement