Office automation: Converting doc to docx
With the advent of Office 2007, Microsoft switched over to its OpenXML standard for office documents - which is quite a subject in itself, one which I will blog about sometime in the future.
This post however, is about converting older word documents to this new format. I've seen a few sites that actually offer a conversion service (for a fee) - wonder if that's even legal, seeing as Microsoft provides a free tool (ofc.exe) as part of its migration planning manager, which is available from
this link.
It's a rather funny utility, which works in conjunction with the Office 2007 compatibility pack.
The compatibility pack mainly enables us to open OpenXML documents in older versions of Office; minus all the new functionality in Office 2007.
Click here
Getting back to the ofc tool, you will notice a file called ofc.ini; this file contains a number of settings which you will need to set. Most notably the following highlighted options.
[ConversionOptions] section.
[ConversionOptions]
; FullUpgradeOnOpen: if set to 1, Word documents will be fully converted to the OpenXML format
; if set to 0 (default), Word documents will be saved in the OpenXML format in compatibility mode
; Not applicable to Excel or PowerPoint files.
FullUpgradeOnOpen=0
[FoldersToConvert]
; The Converter will attempt to convert all supported files in the specified folders
; (do not include if specifying FileListFolder)
;fldr=C:\Documents and Settings\Administrator\My Documents
fldr=c:\abc
We can alternatively do this programmatically using the Office 2007 Interop assemblies,
available here if we want to do a bit more than merely convert it to new standards.
In this example, we're simply going to convert a folder containing older documents, to the new docx:
using Word = Microsoft.Office.Interop.Word;
using System.Reflection;
using System.IO;
class Program
{
static void Main(string[] args)
{
Word._Application application = new Word.Application();
object missing = Missing.Value;
object fileformat = Word.WdSaveFormat.wdFormatXMLDocument;
DirectoryInfo directory = new DirectoryInfo(@"c:\abc");
foreach (FileInfo file in directory.GetFiles("*.doc", SearchOption.AllDirectories))
{
if (file.Extension.ToLower() == ".doc")
{
object filename = file.FullName;
object newfilename = file.FullName.ToLower().Replace(".doc", ".docx");
Word._Document document = application.Documents.Open(ref filename,
ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing);
document.Convert();
document.SaveAs(ref newfilename, ref fileformat, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing);
document.Close(ref missing, ref missing, ref missing);
document = null;
}
}
application.Quit(ref missing, ref missing, ref missing);
application = null;
}
}
Notice "document.Convert()", this method tells the interop assembly that the documents need to be fully converted to the new OpenXML format - something you might want to omit if you're planning to provide support for previous versions of office using the compatibility pack.
Update 2010/09/18
In C# 4.0 there is certain improvements with regards to COM interaction, thanks to the improvements the preceding snippet can be rewritten like this:
static void Main(string[] args)
{
Word._Application application = new Word.Application();
object fileformat = Word.WdSaveFormat.wdFormatXMLDocument;
DirectoryInfo directory = new DirectoryInfo(@"c:\abc");
foreach (FileInfo file in directory.GetFiles("*.doc", SearchOption.AllDirectories))
{
if (file.Extension.ToLower() == ".doc")
{
object filename = file.FullName;
object newfilename = file.FullName.ToLower().Replace(".doc", ".docx");
Word._Document document = application.Documents.Open(filename);
document.Convert();
document.SaveAs(newfilename, fileformat);
document.Close();
document = null;
}
}
application.Quit();
application = null;
}
Posted by - Christoff Truter
Date - 2008-08-18 11:36:53
Comments
Post comment