Batch files - How To ... Validate Input From SET /P

Batch files are extremely "weakly typed" (to use the understatement of the millennium): everything is a string, and a string can be everything, command as well as data, or even both, and there is no way to distinguish between the two.
This makes batch code insertion more than just a potential threat.

It all comes down to input validation, and the good news is: input validation turns out to be remarkably simple, once you know how.

Safely using SET /P to prompt for input

The Problem: Code Insertion

will prompt for input (Please type anything and press Enter: ), accept all keyboard input until Enter is pressed, and store the input in environment variable Input.

Enter the command SET Input to see if your input was stored in the environment variable Input:

However, when you enter the command ECHO %Input% the output will look like this:

The explanation is that the ampersand acts as a "command separator": everything following a (single) ampersand will be interpreted as a new command.
ECHO %Input% is evaluated to ECHO abc&ping ::1 which is in turn interpreted as:

For the unquoted %CD% vulnerability, the solution is to put the variable between doublequotes: "%CD%"; this works because %CD% will never contain doublequotes itself (unless of course this dynamic variable has been set to a static value).
With SET /P however, the user can type anything when prompted, including ampersands and stray doublequotes.

Looking good so far...
But now, repeat the SET /P command, and enter abc"&ping ::1&echo "oops and see what happens with ECHO "%Input%":

The Solution: Delayed Variable Expansion

Until recently I believed that safely parsing the input is a cat and mouse game we were never going to win.

Then Kang-Che Sung provided an ingenious and remarkably simple solution to the problem: use delayed variable expansion!

Try it, try to insert code using ampersand, percent signs, exclamation marks, doublequotes... I could not make this code fail, so far.

To understand why it works, we need to understand how the command interpreter handles command lines.

Quoting Timothy Hill in his book Windows NT Shell Scripting (text marked red added by yours truly):

All parameter and variable references are resolved [...].

Compound commands are split into individual commands and each is then individually processed according to the following steps [...]. Continuation lines are also processed at this step.

Delayed variable references are resolved. ⁽¹⁾

The command is split into the command name and any arguments.

If the command name does not specify a path, the shell attempts to match the command name against the list of internal shell commands. If a match is found, the internal command executes. Otherwise, the shell continues to step 5.

If the command name specifies a path, the shell searches the specified path for an executable file matching the command name. If a match is found, the external command (the executable file) executes. If no match is found, the shell reports an error and command processing completes.

If the command name does not specify a path, the shell searches the current directory for an executable file matching the command name. If a match is found, the external command (the executable file) executes. If no match is found, the shell continues to step 7.

The shell now searches each directory specified by the PATH environment variable, in the order listed, for an executable file matching the command name. If a match is found, the external command (the executable file) executes. If no match is found, the shell reports an error and command processing completes.

Delayed variable expansion takes place in step 3 (marked red), whereas code insertion can only take place in step 2.

Notes:	1:	Delayed variable expansion (step 3, marked red) is not mentioned in Tim Hill's book; this step was explained to me by Kang-Che Sung.
	2:	Steps 5..8 describe how the executable is found for the specified command, these steps are not relevant for this explanation of code insertion.

Quoting from Kang-Che Sung's explanation:

Percent expansion %var% and %1 ... %9
(This step briefly explains why FOR tokens needs double-percent signs in scripts. Because this step will be handled differently if CMD is in interactive mode, i.e. the mode that you type commands)

Command splitting. Handles "(" ")" "&" "|" "<" ">" (and newline) delimiters. (This is where command injection takes place.)

Delayed Expansion. At this time ( ) & | < > is no longer special and so won't split into additional commands or pipes.

Word splitting. Technically it just identify which part is the command (%0) and which part are arguments (%*). Unlike Unix scripting, where all pieces of arguments ($1 $2 $3 ...) are split at this point, Windows shells just pass the whole string of arguments to the external (callee) program and let it handle by itself. (This explains why Unix shell scripts allows "$@" in addition to "$*".)

The rest, steps 5 to 8, are about resolving paths to command, and so are irrelevant for us.

Before processing the acquired input, it may be wise to reject input that contains "questionable" characters.

The following code will check if %Input% contains doublequotes, and if so, wipe the input:

Likewise, you can use FIND "&" to test for the occurrence of ampersands, or FINDSTR /R /C:"[&""|()]" to test for all "questionable" characters:

Alternatives for SET /P

You don't always need "free" input, often you only want the user to select from a list of choices.
In that case, consider using CHOICE instead of SET /P

If you do need "free" input, either use the routine above, or a different scripting language, or my InputBox.exe (which removes doublequotes, ampersands and redirection characters from the input).

Notes:	3:	Scott Sumner noted that, when redirecting `FINDSTR`'s output to `NUL` in a batch file, the return code ("ErrorLevel") will always be 0. That is why, for `FINDSTR`, redirection to `NUL` is done after checking the return code first.
	4:	The shorter code `SET Input \| FINDSTR /R /C:"[&""\|()]" && SET Input=` does not work in Windows 7 (not tested in other Windows versions). Though `(SET Input \| FINDSTR /R /C:"[&""\|()]") && SET Input=` does work, the nested parentheses required for redirection to NUL might lead to new issues. My advice is to keep it simple and safe, and use the `IF NOT ERRORLEVEL 1` test.

Batch How To ...

Validate Input From `SET /P`

Safely using `SET /P` to prompt for input

The Problem: Code Insertion

The Solution: Delayed Variable Expansion

Alternatives for `SET /P`

Batch How To ...

Validate Input From SET /P

Safely using SET /P to prompt for input

The Problem: Code Insertion

The Solution: Delayed Variable Expansion

Alternatives for SET /P

Validate Input From `SET /P`

Safely using `SET /P` to prompt for input

Alternatives for `SET /P`