The aim is to document code chunks that are likely to be re-used for fast searching and indexing.
PROC SORT
to remove duplicatesThere are three options that might be helpful: DUPOUT=
, NODUPRECS
, and NODUPKEYS
.Code example are from this article:
NODUPKEYS
(or NODUPKEY
) option with PROC SORT
removes observations with duplicate keys. Specify the keys, that uniquely identify a observation, in the by
statement. In the example below, variable title
uniquely identifies a movie.PROC SORT DATA=Movies
DUPOUT=Movies_Sorted_Dupout_NoDupkey
NODUPKEY;
BY Title;
RUN ;
NODUPRECS
option identifies observations with identical values for all columns.PROC SORT DATA=Movies
OUT=Movies_Sorted_without_DupRecs
NODUPRECS ;
BY Title ;
RUN ;
Input()
and put()
for variable type conversioninput(char,4.)
or input(char,datatime20.12)
: Char -> Numeric(/Char)put(numeric,$4.)
or put(numeric, datetime19.)
: Numeric(/Char) -> CharStacking multiple datasets into 1 dataset with variables in different length can be tricky. Here is the solution to resolve it. You need to:
length
before set
statement;format _character_
.data stacked_ds;
length id $20 age 8 comment $200 ;
set ds1-ds5;
format _character_ ;
run;