The aim is to document code chunks that are likely to be re-used for fast searching and indexing.
PROC SORT to remove duplicatesThere are three options that might be helpful: DUPOUT=, NODUPRECS, and NODUPKEYS.Code example are from this article:
NODUPKEYS (or NODUPKEY) option with PROC SORT removes observations with duplicate keys. Specify the keys, that uniquely identify a observation, in the by statement. In the example below, variable title uniquely identifies a movie.PROC SORT DATA=Movies
DUPOUT=Movies_Sorted_Dupout_NoDupkey
NODUPKEY;
BY Title;
RUN ;
NODUPRECS option identifies observations with identical values for all columns.PROC SORT DATA=Movies
OUT=Movies_Sorted_without_DupRecs
NODUPRECS ;
BY Title ;
RUN ;
Input() and put() for variable type conversioninput(char,4.) or input(char,datatime20.12) : Char -> Numeric(/Char)put(numeric,$4.) or put(numeric, datetime19.) : Numeric(/Char) -> CharStacking multiple datasets into 1 dataset with variables in different length can be tricky. Here is the solution to resolve it. You need to:
length before set statement;format _character_.data stacked_ds;
length id $20 age 8 comment $200 ;
set ds1-ds5;
format _character_ ;
run;