(view help text of Word2Txt.cs as plain text)
Word2Txt, Version 1.05
Extract plain text from a Word document and send it to the screen
Usage: Word2Txt "wordfile" [ encoding | /D ]
or: Word2Txt /E
Where: wordfile is the path of the Word document to be read
(no wildcards allowed)
encoding force use of alternative encoding for plain
text, e.g. UTF-8 to preserve accented characters
or IBM437 to convert unicode quotes to ASCII
/D use the encoding specified in the document file
(for .DOCX and .ODT only, if Word isn't available)
/E list all available encodings
Notes: If a "regular" (MSI based) Microsoft Word (2007 or later)
installation is detected, this program will use Word to read the
recognized text from the Word file, which may be ANY file format
by Word.
If Word was already active when this program is started, any other
opened document(s) will be left alone, and only the document opened
by this program will be closed.
If Word is not available, or if it encounters unreadable content
(i.e. the file is corrupted), the text can still be extracted, but
only from .DOC, .DOCX, .ODT, .RTF and .WPD files.
If the specified encoding does not match any available encoding name,
the program will try again, ignoring dashes; if that does not provide
a match, the program will try matching the specified encoding with
the available encodings' codepages.
This program requires .NET 4.5.
Return code ("errorlevel") 0 means Word encountered no errors and
some text was extracted from the file; 1 means Word is not available
or the file was corrupted; 2 means either command line errors or the
program failed to extract any text.
Written by Rob van der Woude
https://www.robvanderwoude.com
page last uploaded: 2025-10-23; loaded in 0.0046 seconds