Showing posts with label OpenXML. Show all posts
Showing posts with label OpenXML. Show all posts

Wednesday, July 16, 2014

Using OpenXML to load an Excel Worksheet into a DataTable (or just how different OpenXML is from the old Excel API we're used too)

dotnet thoughts - Read Excel as DataTable using OpenXML and C#

In the current project we were using OpenXML extensively for reading Excel files. Here is the code snippet, which will help you to read / convert Excel files to DataTable.

image

..."

You've heard me whine about how, while OpenXML is cool and how nice it is that we can access Office 2007+ files without Office or third party apps, yet the API is pretty darn different for traditional Office Object Model users? This screenshot shows why... Parts, SharedStringTables, oh my... It's not hard, just takes a while to wrap your head around.

Thursday, June 26, 2014

Being open to opening OpenXML documents in Visual Studio with the now open source Open XML Package Editor for VS 2012/2013

OpenXML Developer - Open XML Package Editor Released for VS2012 and VS2013

image

Chris Rae recently announced on his blog that we have released a new version of the Open XML Package Editor, which now works on Visual Studio 2012 and 2013!

As anyone knows who has seen any of my screen-casts, the Open XML Package Editor is my go-to tool for opening and editing Open XML documents. It is a vital tool for Open XML Developers. After installing, you can drag and drop Open XML documents onto Visual Studio, navigate through the various parts, open parts for editing in the very excellent XML editor that is in Visual Studio, and modify any relationship in the package. Unfortunately, until this release, you had to keep a copy of Visual Studio 2010 around in order to use the tool, a pain to say the least. Well, no more. Now it works with the latest versions of Visual Studio, and furthermore, we will never get into the situation again where it only works for previous versions of Visual Studio. Since it is open source, you, I, or anyone else can quickly do the port to new versions of VS. It now supports Visio's new VSDX format and has some other minor fixes and enhancements.

We have published the code on GitHub under the Apache 2.0 license. If you just want to download the new version of the Package Editor, it's here on the Visual Studio Gallery. [GD: Post Leached in Full]

We all know that OpenXML documents (DocX, XlxX, PptX, *X, etc, etc) are really just zip file containers with standardize manifests, contents and packaging right? (Don't believe me? Rename a .DocX to .zip and see).
And sure, you can open and spelunk the unzipped contents of the document, it's not the easiest. Instead you've got to use an OpenXML explorer, one like this one, the Open XML Package Editor. And hey you can even stay in your favorite tool of choice (Visual Studio of course!). And now that it's open source, it's even cooler!

 

Related Past Post XRef:
Open Sesame - Open XML SDK is now open source

Using OpenXML SDK to generate Word documents via templates (and without Word being installed)
Checking for Microsoft Word DocX/DocM Revisions/Track Changes without using Word... (via OpenXML SDK, LINQ to XML or XML DOM)
LINQ to XlsX... Using VB.Net, LINQ, the OpenXML SDK and a little C# helper, to query an Excel XlsX
Using native OpenXML to create an XlsX (Which provides an example of why I highlight tools that make OpenXML easier...)
Generating Xlsx's on the Server? You're using OpenXML, right? With help from the PowerTools for OpenXML?

Official boat-load, as in supertanker, sized OpenXML content list (Insert "One OpenXML content list to rule them all" here)
So how do I get from here to OpenXML? Got a map for you, an Open XML SDK Blog Map…
Where to go to scratch your OpenXML dev info itch…
"Open XML Explained" Free eBook (PDF)
The Noob's Guide to Open XML Dev (If you know how to spell OpenXML but that's about it, this is your Getting Started guide...)

Reusing the PowerShell PowerTools for Open XML in your C# or VB.Net world
PowerShell, OpenXML, WMI and the PowerTools for OpenXML = Doc generation for our inner geek
Because it’s a PowerShell kind of day… PowerTools for Open XML V1.1 Released
OpenXML PowerTools updated – Cell your Excel via PowerShell
Powering into OpenXML with PowerShell

Open XML SDK 2.0 for Microsoft Office Released – Automate Office documents without Office

Open XML 2.0 Code Snippets for VS2010 (and VS2008 too)
Open XML Format SDK 2.0 Code Snippets for Visual Studio 2008 – 52 C#/VB Code Snippets to help ease your Open XML coding
Open XML File Format Code Snippets for Visual Studio 2005 (Office 2007 NOT required)

Open XML SDK v1 Released

OpenXML Viewer 1.0 Released – Open source DocX to HTML conversion, with IE, Firefox and Opera (and/or command line) support

Wednesday, June 25, 2014

Open Sesame - Open XML SDK is now open source

Open XML SDK goes open source

Brian Jones is the principal GPM of the Office Development Platform.

Today is an exciting day for Office developers—we’re open sourcing the Open XML SDK on GitHub! We’re eager to work with the community on continual improvements to the SDK’s functionality and scalability, and to explore new platforms and technologies to support developer platforms such as Mono, an open source implementation of .NET Framework. It’s been over seven years since we released the initial preview of the Open XML SDK, and over that time it’s been one of the key tools developers have used for building solutions that consume, create, and modify Office documents.

I encourage you to head over to GitHub and take a look at the project. We’d love your participation! We posted it under the .NET Foundation. In addition to the SDK itself, we opened all of the Open XML conceptual documentation in MSDN for public review/contributions. A living copy of the docs is now in GitHub for you to edit and review. Pull requests welcome!

The Open XML SDK is a key piece of our overall developer platform. The trends around mobile apps connected to the cloud have expanded the role that Office documents can play in solutions. Many of our Fortune 100 customers have built solutions leveraging the SDK, especially in the banking and health care sectors. We average over 10,000 downloads a month, and the SDK is also widely distributed in other software packages, such as accounting tools.

...

In another post, we provided a great drilldown into the architecture of the SDK and a ton of great examples.

As you’ve probably noticed lately, we’re making a big push to open a lot of our developer technologies to the community. We have a few really cool projects already in GitHub, like the Office 365 SDK for Android Preview, as well as the Open XML package editor. We’ve shifted the Office extensibility model to use open standards like HTML and JavaScript, and we’re exposing Office 365 data (documents, mail, and calendars) through RESTful APIs leveraging oAuth. You’ll see us continue to do more of this, and we’d love to hear any feedback you might have on our UserVoice.

If you’re already an Open XML developer, this is definitely an exciting day. If you haven’t built solutions yet on Open XML, I strongly encourage you to go take a look and try out some of the examples. You’ll be surprised by what you can build.

image..."

The Microsoft open source wagon just keeps on rolling! The OpenXML spec has been open for a while and now the SDK is too. Heck I wonder what else is going to be opened up? The Fluid UI? Windows Live Writer (please, please, please)? Guess 2014 is going to officially be "The Year Microsoft Opened"...

 

Related Past Post XRef:

Using OpenXML SDK to generate Word documents via templates (and without Word being installed)
Checking for Microsoft Word DocX/DocM Revisions/Track Changes without using Word... (via OpenXML SDK, LINQ to XML or XML DOM)
LINQ to XlsX... Using VB.Net, LINQ, the OpenXML SDK and a little C# helper, to query an Excel XlsX
Using native OpenXML to create an XlsX (Which provides an example of why I highlight tools that make OpenXML easier...)
Generating Xlsx's on the Server? You're using OpenXML, right? With help from the PowerTools for OpenXML?

Official boat-load, as in supertanker, sized OpenXML content list (Insert "One OpenXML content list to rule them all" here)
So how do I get from here to OpenXML? Got a map for you, an Open XML SDK Blog Map…
Where to go to scratch your OpenXML dev info itch…
"Open XML Explained" Free eBook (PDF)
The Noob's Guide to Open XML Dev (If you know how to spell OpenXML but that's about it, this is your Getting Started guide...)

Reusing the PowerShell PowerTools for Open XML in your C# or VB.Net world
PowerShell, OpenXML, WMI and the PowerTools for OpenXML = Doc generation for our inner geek
Because it’s a PowerShell kind of day… PowerTools for Open XML V1.1 Released
OpenXML PowerTools updated – Cell your Excel via PowerShell
Powering into OpenXML with PowerShell

Open XML SDK 2.0 for Microsoft Office Released – Automate Office documents without Office

Open XML 2.0 Code Snippets for VS2010 (and VS2008 too)
Open XML Format SDK 2.0 Code Snippets for Visual Studio 2008 – 52 C#/VB Code Snippets to help ease your Open XML coding
Open XML File Format Code Snippets for Visual Studio 2005 (Office 2007 NOT required)

Open XML SDK v1 Released

OpenXML Viewer 1.0 Released – Open source DocX to HTML conversion, with IE, Firefox and Opera (and/or command line) support

Friday, February 21, 2014

Microsoft Open Specifications Posters v2 released (Think "Wow, that's allot of spec's" Posters)

Microsoft Downloads - Open Specifications Posters

The Open Specifications Posters (PDF format) make it easy for interoperability developers to explore the Open Specifications overview documents for Office client, Lync, SharePoint, Office file formats, Exchange Server, SQL Server, and Windows.

Version: 5.0

Date Published: 2/21/2014

ExchangeOpenSpecPoster.pdf, 556 KB

MicrosoftOpenSpecPoster - Accessiblility Version.pdf, 336 KB

OfficeLyncOpenSpecPoster.pdf, 669 KB

SharePointOpenSpecPoster.pdf, 606 KB

SQLOpenSpecPoster.pdf, 1,011 KB

WindowsOpenSpecPoster.pdf, 1.0 MB

The Open Specifications Posters (PDF format) make it easy for interoperability developers to explore the Open Specifications overview documents for Office client, Lync, SharePoint, Office file formats, Exchange Server, SQL Server, and Windows. The posters display, by functional area, the protocols, file formats, and related technologies, as described in each overview document. A high-contrast poster is also provided for those with visual accessibility needs that contains listings for all functional areas .

Some cube art to help when you get visited by the "Microsoft is closed and the devil" guy (I know you know that guy...)

Here's a snap of the Windows PDF;

imageSNAGHTML1fe1a2a9

 

Related Past Post XRef:
Office/Exchange File Format,Specification and Protocol Documentation refreshed
Microsoft Format and Specification Documentation 0712 Refresh (Think Office 2013 CP update). Oh and some SharePoint Doc's too
Microsoft Format and Specification Documentation Refresh ("Significantly changed technical content") [Updated: Includes updates for Office 15 Technical Preview ]
Microsoft Office File Formats and Microsoft Office Protocols Documentation Refreshed
Microsoft Office File Formats and Protocols documentation updated for Office 2010 (Think “Now with added ‘X’ flavor… DocX, PptX, XlsX, etc”)

Microsoft Open Specifications Poster

XAML Language Specification (as in the in the full XAML, WPF and Silverlight XAML Specs)

"Microsoft SQL Server Data Portability Documentation"

MS-PST file format specification released. Yep, the full and complete specification for Outlook PST’s is now just a download away.
Microsoft Office (DOC, XLS, PPT) Binary File Format Specifications Released – We’re talking the full technical specification… (The [MS-DOC].pdf alone is 553 pages of very dense specification information)
DOC, XLS and PPT Binary File Format Specifications Released (plus WMF, Windows Compound File [aka OLE 2.0 Structured Storage] and Ink Serialized Format Specifications and Translator to XML news)

Thursday, January 16, 2014

Third Party Office Library or OpenXML?

CodePlex - Aspose for OpenXML

The Open XML SDK for Office simplifies the task of manipulating Open XML packages and the underlying Open XML schema elements within a package. The classes in the Open XML SDK encapsulate many common tasks that developers perform on Open XML packages, so that you can perform complex operations with lines of code.

Using the classes in the Open XML SDK 2.5 is simple. When you have installed the Open XML SDK 2.5, open your existing project or application in Visual Studio, or create a new project or application. Then, in your project or application, add references to the following components:

  • DocumentFormat.OpenXml
  • WindowsBase
To add a reference in a Microsoft Visual Studio project
  • In Solution Explorer, right-click References and then click Add Reference. If the References node is not visible, click Project and then click Show All Files.
  • In the Add Reference dialog box, click .NET.
  • In the Component Name column, select the components (scroll if you need to), and then click OK.

This project covers the following topics:

What is the use of Aspose .NET Products?

Aspose are file format experts and provide APIs and components for various file formats including MS Office, OpenOffice, PDF and Image formats. These APIs are available on a number of development platforms including .NET frameworks – the .NET frameworks starting from version 2.0 are supported. If you are a .NET developer, you can use Aspose’s native .NET APIs in your .NET applications to process various file formats in just a few lines of codes. All the Aspose APIs don’t have any dependency over any other engine. For example, you don’t need to have MS Office installed on the server to process MS Office files. Below is a list of products we support for .NET developers:

..."

I've mentioned OpenXML in the past and that it's cool that you can use it to get all the deep deep data in Office *x files? Then you've also heard me say what a pain it can be if you're used to a more traditional Office Object Model. It's a completely different way of thinking about your documents... And doing that hurts my brain. So I go out of my way to find libraries that make it easier. One such, that we've bought in my day job, is Aspose. If you've read any MS dev mag, you've seen the ads for them.

I ran across this and sure, it's sales-ware, still it's useful to OpenXML dev's does a good job of showing the differences between the two approaches...

OpenXML SDK Word Processing Code Snippets - Create a word processing document

image

IMHO, if you can, use a third party library, free or commercial. OpenXML might get the job done and it is free, but the time you spend on it isn't (And remember, friends don't let friend Office interop!)

Thursday, December 05, 2013

Free Export DataSet/DataTable/List<t> to Excel (without using or even having Excel installed)

Code Project - A free "Export to Excel" C# class, using OpenXML

It's amazing that even now, in 2013, there are so many developers still asking for help on how to write C# and VB.Net code, to export their data to Excel.

Even worse, a lot of them will stumble on articles suggesting that they should write their data to a comma-separated file, but to give the file an .xls extension.

So, today, I'm going to walkthrough how to use my C# "Export to Excel" class, which you can add to your C# WinForms / WPF / ASP.Net application, using one line of code.

Depending on whether your data is stored in a DataSet, DataTable or List<>, you simply need to call one of these three functions, and tell them what (Excel) filename you want to write to.

  • public static bool CreateExcelDocument<T>(List<T> list, string ExcelFilename
  • public static bool CreateExcelDocument(DataTable dt, string ExcelFilename
  • public static bool CreateExcelDocument(DataSet ds, string ExcelFilename)

...

And that's all you have to do. The CreateExcelDocument function will create a "real" Excel file for you.

For example, if you had a created a DataSet containing three DataTables called

  • Drivers
  • Vehicles,
  • Vehicle Owners,

..then here's what your Excel file would look like. The class would create one worksheet per DataTable, and each worksheet would contain the data from that DataTable.

image

...

Look, friends don't let friends use Office InterOp... (omg, especially for server/automated ops!). There are any number of options now available, many free or reasonably priced. Just... don't.... do... it...

Thursday, February 21, 2013

Excel with Excel without Excel... Seven Excel/XLS Libraries

Ginktage - 7 Libraries for Reading and Writing from/to Excel File in C#

Few months back , I was making an R&D on the possibilities of reading and writing to/from the Excel file from .NET (C#) . At that point of time , I came across various libraries and SDK’s available for Reading and Writing from Excel File in C# .

In this blog post ,I will list some of the libraries used for reading and writing from/to Excel sheet using C#. Note that some of the libraries are free/open source and few are commercial one’s.

...

image..."

If you've ever Automated/Inter-op'd Excel, you know that it can be "fun." The primary issue isn't the the Object Model, it's that it's COM and with .Net that can be a challenge to get the Releasing right. Then there's the license requirements, versions, etc.

So in short, if you don't need it, don't use it. As shown above, there's a number of libraries that you can use to read/write Excel files without having Excel installed...

Friday, October 19, 2012

Do you DSOFile? Tips for using it on an x64 OS and with Open XML (Office 2007+ XML formats)

Visual Studio Office Development (VSOD) Support Team - Considerations for using Dsofile on 64 bit OS

"There is a well-documented sample program, DSOFile, that enables reading and writing Office document properties (both old format files like *.xls, *.doc and *.ppt, as well as the new open xml formats like *.xlsx, *.docx and *.pptx). The DSOFile sample is compiled as 32 bit.

If you are using this sample in a 32 bit application on a machine with Office 2007 SP2 , Windows 64 bit, then DsoFile will not be able to fetch the properties of Open Xml format files.  This is because Office 2007 SP2 did not ship the 32 bit version of msoshext.dll (shell extension handler) which is the component DSOFile uses to read/write properties from Open XML files .

This issue is fixed in hotfix KB 2483216. This is also included in the Office 2007 Cumulative Update for February 2011 and subsequently in Office 2007 SP3.

The Office 2010 version of he hotfix is KB 2483230. This is also included in Office 2010 Cumulative Update for February 2011 and subsequently in Office 2010 SP1

If you wish to use DSOFile from a 64 bit program, then you should recompile the DSOFile to target for 64 bit.

An alternative approach to using Dsofile would be to use Open Xml SDK (or System.IO.Packaging). A sample that demonstrates this is given below :-..."

Can you believe I've been using DSOFile for over  8+ years? I first blogged about it in 2004... I find it interesting that it's still around and kind of, sort of, mostly supported. One thing I didn't know what was it supported (kind of, sort of) Open XML doc formats (if Office 2007/2010 is installed or the given msoshext.dll was available, which kind of defeats the purpose of a "no Office needed COM component to access Doc Properties, but still...).

In any case, if you need access to the COM Doc/Summary Properties from Office binary doc's and/or the Open XML doc's, this COM component has been pretty rock solid for me for many years...

What is DSO file again? The Dsofile.dll files lets you edit Office document properties when you do not have Office installed

The Dsofile.dll sample file is an in-process ActiveX component for programmers that use Microsoft Visual Basic .NET or the Microsoft .NET Framework. You can use this in your custom applications to read and to edit the OLE document properties that are associated with Microsoft Office files, such as the following:

  • Microsoft Excel workbooks
  • Microsoft PowerPoint presentations
  • Microsoft Word documents
  • Microsoft Project projects
  • Microsoft Visio drawings
  • Other files that are saved in the OLE Structured Storage format

The Dsofile.dll sample file is written in Microsoft Visual C++. The Dsofile.dll sample file demonstrates how to use the OLE32 IPropertyStorage interface to access the extended properties of OLE structured storage files. The component converts the data to Automation friendly data types for easier use by high level programming languages such as Visual Basic 6.0, Visual Basic .NET, and C#. The Dsofile.dll sample file is given with full source code and includes sample clients written in Visual Basic 6.0 and Visual Basic .NET 2003 (7.1).

...

Information about OLE document properties
Every OLE compound document can store additional information about the document in persistent property sets. These are collectively called the "Document Summary Properties." These property sets are managed by "COM/OLE" so that third-party clients can read this information without the aid of the main application that is responsible for the file.
To help developers that are interested in reading document properties, we have provided the following two interfaces to manage property sets:
  • IPropertySetStorage
  • IPropertyStorage
However, some high-level programming languages may have trouble using these interfaces because the interfaces are not Automation-compatible. To resolve this problem, developers can use an ActiveX DLL, such the "DsoFile sample" to read and to write the most common properties that are used in OLE compound documents. This applies particularly those that are used by Microsoft Office applications.
Use the DsoFile component from your custom application
The Dsofile.dll sample file reads and writes to both the standard properties and the custom properties from any "OLE Structured Storage" file. This includes, but is not limited to, the following:
  • Word documents
  • Excel workbooks
  • PowerPoint presentations

Because of the size and the speed of the Dsofile.dll sample file, the DLL can be much more efficient than trying to Automate Office to read document properties

...

 

Related Past Post XRef:
Download details: Developer Support OLE File Property Sample (DSOFILE) (DSOFile.DLL 2.0)
DSOFile.dll 2.0

Monday, April 23, 2012

Using native OpenXML to create an XlsX (Which provides an example of why I highlight tools that make OpenXML easier...)

CodeProject - Creating basic Excel workbook with Open XML

The purpose of this article is to describe how to create an Excel workbook using solely DocumentFormat.OpenXml.dll (namespace is DocumentFormat.OpenXml).

In order to test the samples you have to download and install the Open XML SDK 2.0 from Download Center.

The demo is created for both C# and Visual Basic.

These standards define the structure and the elements for the Office files. The Office files (like xlsx for Excel) themselves are zipped files that contain a specific directory and file structure. The files that hold the content of a spreadsheet are xml files like any other xml files.

In case of Excel files a basic xlsx file contains for example following files:

  • /[Content_Types].xml: Defines parts and extensions for the spreadsheet
  • /xl/workbook.xml: For e xample sheets that are included in the workbook
  • /xl/styles.xml: Styles used in the worksheets
  • /xl/sharedStrings.xml: Strings that are shared among cells
  • /xl/worksheets/sheet1.xml...: The actual worksheets

The actual package contains more files but in the scope of this article these are the most interesting ones. The demo projects included show few operations that are done to produce and modify these files.

About the project

The project itself is very simple. It consists of two classes: MainWindow class and a static Excel Class. The Excel class is responsible of all the operations done against the Excel spreadsheet. It's kinda utility class, but note that it's nowhere near ready. It's supposed to be used as a learning tool or a seed to an actual implementation.

When writing this demo I found out that Excel is very picky on the XML files. One surprise was that the order of the elements in XML files is very important. For example elements in style sheet such as fonts, fills, borders, cellStyleXfs, cellXfs etc must be in specific order. Otherwise the document is interpreted as corrupted.

Another observation was that the indexes of the elements are quite often used (for example the index of a shared string). However there is no support in the library to fetch the indexes so the collections have to be looped in order to calculate the index of a desired element.

So one of the best tools when building this was a utility to extract data from the xlsx (=zip) file to see what is the actual content.

image

Yes, that creates an XlsX. Stuff that anyone who has used the Excel Object Model will cause a minor brain explosion. Parts, Packaging, ShareStringTables, oh my...

This is a great example of why I keep my eyes open for examples and wrappers that make the promise of Open XML a little more accessible to mere mortals.

 

Related Past Post XRef:
Generating Xlsx's on the Server? You're using OpenXML, right? With help from the PowerTools for OpenXML?

Wednesday, March 28, 2012

Generating Xlsx's on the Server? You're using OpenXML, right? With help from the PowerTools for OpenXML?

OpenXML Developer - Quick Generation of Spreadsheet Data and Cell Styles

"This example looks at a couple of OpenXML spreadsheet topics. I have been working with the cell styles a lot lately and this is a first example showing how to add some of the named styles to a spreadsheet cell. I plan to include even more style options in my next example and blog post. Also, after I posted my example for generating a pivot table, some very helpful people mentioned that it was quite slow with large amounts of data. This example also shows an alternative method of generating large amounts of data in a worksheet.

The example code can be found at PowerTools for OpenXML on Codeplex. Look for the 2.2.3 release of the PowerTools Core.

The screen-cast is divided into two parts. The first half introduces the example and shows how to use the methods in the PowerTools Core library. The second half shows the details of how the XML is generated. If you are only interested in using the code as is, then you can skip the second part.

image

PowerTools for Open XML - PowerTools for OpenXML 2.2 (Note you want 2.2.3...)

image

Here's a snap of all you need to create a XlsX.  If you've ever used OpenXML, you'll know just how much time this can save you (and how much more sense this makes). OpenXML is great, but it's NOT the kind of Office API you're used too. This kind of library makes it that much easier to use...

SNAGHTML42fa277c

No Excel, no Office, all Net.

image

 

Related Past Post XRef:
Reusing the PowerShell PowerTools for Open XML in your C# or VB.Net world
PowerShell, OpenXML, WMI and the PowerTools for OpenXML = Doc generation for our inner geek
Because it’s a PowerShell kind of day… PowerTools for Open XML V1.1 Released
OpenXML PowerTools updated – Cell your Excel via PowerShell

Monday, January 02, 2012

Using OpenXML SDK to generate Word documents via templates (and without Word being installed)

Application design and programming in .NET - Utility to generate Word documents from templates using Visual Studio 2010 and Open Xml 2.0 SDK

This utility generates Word documents from templates using Content controls. The utility will be enhanced later as per feedback and source code is available for download at http://worddocgenerator.codeplex.com/. It has been created in Visual Studio 2010 and uses Open Xml 2.0 SDK which can be downloaded from http://www.microsoft.com/download/en/details.aspx?id=5124.

The purpose of creating this utility was to use the Open Xml 2.0 SDK to generate Word documents based on predefined templates using minimum code changes. These documents can either be refreshable or non- refreshable. I’ll explain this difference later. Also there is no dependency that Word should be installed.

A few samples for generating Word 2010 documents have been provided. More samples can be added later as per feedback. The screenshots below display the sample template and the document generated out of this template using this utility.

...

image..."

In short, generate Word documents on servers, in automated processes, etc, without Word being installed. [Insert lame "Friends don't let Friends use the Word Automation on Servers" statement here]

Thursday, October 13, 2011

The Noob's Guide to Open XML Dev (If you know how to spell OpenXML but that's about it, this is your Getting Started guide...)

OpenXML Developer - Getting Started with Open XML Development

"This blog post introduces the first in a series of screen-casts that are specifically for a developer starting development with Open XML for the first time. It is a project that I've been meaning to work on for some time, and I recently received the mandate that this should get done, so this is the start of it. In this video, I discuss the Open XML standard from a high level, discuss the resources that helped me get started, and point you to places to find additional resources. I've already recorded the second video, in which I discuss the various tools that you will want to be familiar with in order to do Open XML development. In the third video, I'll discuss the various typical development scenarios for Open XML. In the fourth video, I'll discuss platforms, languages, and libraries, and in the fifth, I'm going to discuss my current thoughts on development approaches. (At least, this is my current plan. We'll see how it proceeds.)

If you are an experienced Open XML developer, this first video in the series is probably not for you. This first video is targeted towards developers who know Open XML is a document format based on XML, and maybe not much more. Experienced developers may get something from subsequent videos, though.

..."

There's some things I love about the OpenXML SDK/format and some I hate (mostly how different the SDK API's are from the Office API's) but the like easily overrides the dis-like. Having an open format that's fully documented and easy(er) to spelunk is a night and day difference over trying to work directly with the Office binary formats.

So if you're a dev and interested in see what this OpenXML SDK thing, need to programmatically read/write OpenXML files (DocX, XlsX, etc) this video and series might be just the thing you need...

Wednesday, September 14, 2011

ODF/OpenXML, an update on the story...

Eric White's Blog - New Paper published by Peter O’Kelly – Revisiting Open Document Format and Office Open XML: The Quiet Revolution Continues

"Three years ago Peter O’Kelly wrote a paper titled, “What’s Up, .DOC? Open XML Formats, OpenDocument Format, and the Revolutionary Implications of XML in Productivity Applications.” That paper was a part of an industry-wide debate about Open XML and ODF. He has recently published a new paper that analyzes the current state of document formats.

This new paper, Revisiting Open Document Format and Office Open XML: The Quiet Revolution Continues [GD: click through for the actual link], discusses:

  • The business value of standardized, XML based document formats
  • A brief history of Open XML and ODF
  • The 2008 Open XML ISO controversy, and the response to Peter’s “What’s Up, .DOC?” Paper
  • An assessment of current Open XML and ODF market dynamics
  • Current standards activity
  • Projections into the future

..."

SNAGHTML23e56faf

Here's a snip from the document;

"Synopsis
It has been several years since the lively and highly polarized market debate about the relative merits and standards significance of the Open Document Format (ODF) and Office Open XML (OOXML) file format standards. Although ODF and OOXML have since largely faded from the mainstream technology industry press and blogosphere radar, both standards have continued to evolve and gain market support, with significant benefits for all organizations seeking to optimize their use of information contained in documents created with productivity applications.

This document provides an overview of the status and significance of ODF and OOXML. It starts with a summary of the business value of open and XML-based document formats, along with a review of the ODF/OOXML historical debate, including a recap of a widely-discussed January 2008 Burton Groupi report which included what were, at that time, considered provocative conclusions and market projections.

The document continues with a summary of some of the most impactful ODF- and OOXML-related industry changes during recent years, including Microsoft’s (surprising, to many market observers) commitment to support and contribute to both ODF and OOXML, as well as Oracle’s acquisition of Sun Microsystems, and the acquisition’s ramifications for OpenOffice.org (which served as the starting point for ODF, in 2000).

The analysis concludes with some market projections about likely next steps, as both ODF and OOXML continue to evolve.

..."

The story of the battle has been pretty quite recently, with OpenXML seemly slowly making its way deeper into, and natively used by, the business world (think DocX, XlsX, etc)

Monday, August 29, 2011

1 page, 101 Office 2010 Code Samples

Office Developer Center - Office 101 Code Samples

"Microsoft Office 2010 gives you the tools needed to create powerful applications. These Visual Basic for Applications (VBA) code samples can assist you in creating your own applications that perform specific functions or as a starting point to create more complex solutions.

Each code sample consists of approximately 5 to 50 lines of code demonstrating a distinct feature or feature set in VBA. Each sample includes comments describing the sample, and setup code so that you can run the code with expected results or the comments will explain how to set up the environment so that the sample code runs.)

...

image..."

That's an official boat load of Office 2010 code samples.... :)

Thursday, July 28, 2011

Open XML Opens Office Document Metadata (without Office)

Microsoft has been releasing a number of Open XML SDK v2 samples on the MSDN Code Gallery in the past few days, samples that interested me professionally, and so wanted to round the up for easy reference.

The key point is that these provide examples of doing things for the old/binary formats, required using Office COM, whereas now we can do it all without installing Office. Just another reason to love the more modern approach taken with the Open XML file format.

Here's an example of one of the above links;

Retrieving Comments from Word 2010 Documents by Using the Open XML SDK 2.0

"The sample provided with this article includes the code necessary to retrieve the XML block that contains all the comments from a Word 2007 (or later) document. The following sections walk you through the code, in explicit detail. When you use the sample code to retrieve the comments, the procedure returns an XML element, named w:comments, which contains the XML block of information from the original document. It's up to you (and your application) to interpret the results of retrieving the comments.

...

SNAGHTML522683bf

 

Related Past Post XRef:
Checking for Microsoft Word DocX/DocM Revisions/Track Changes without using Word... (via OpenXML SDK, LINQ to XML or XML DOM)
LINQ to XlsX... Using VB.Net, LINQ, the OpenXML SDK and a little C# helper, to query an Excel XlsX

Official boat-load, as in supertanker, sized OpenXML content list (Insert "One OpenXML content list to rule them all" here)
So how do I get from here to OpenXML? Got a map for you, an Open XML SDK Blog Map…
Where to go to scratch your OpenXML dev info itch…
"Open XML Explained" Free eBook (PDF)

Open XML SDK 2.0 for Microsoft Office Released – Automate Office documents without Office
Opening OpenXML, the Open XML Package Editor Power Tool for Visual Studio 2010
Open XML 2.0 Code Snippets for VS2010 (and VS2008 too)
Open XML Format SDK 2.0 Code Snippets for Visual Studio 2008 – 52 C#/VB Code Snippets to help ease your Open XML coding

OpenXML Viewer 1.0 Released – Open source DocX to HTML conversion, with IE, Firefox and Opera (and/or command line) support

Powering into OpenXML with PowerShell

Microsoft Office File Formats and Microsoft Office Protocols Documentation Refreshed
Microsoft Office File Formats and Protocols documentation updated for Office 2010 (Think “Now with added ‘X’ flavor… DocX, PptX, XlsX, etc”)

Monday, July 18, 2011

LINQ to XlsX... Using VB.Net, LINQ, the OpenXML SDK and a little C# helper, to query an Excel XlsX

OpenXML Developer - Query Open XML Spreadsheets in VB.NET using LINQ

"When working with SpreadsheetML, one of the most common needs is to retrieve the data from a worksheet or a table in as easy a fashion as possible. There has been a fair amount written for C# developers to do this, but not nearly as much for VB.NET. Some time ago, I wrote a blog post, Using LINQ to Query Excel Tables, which introduced a few C# classes and extension methods that make it easy to query SpreadsheetML. This post presents a super-easy way to use that code from VB.NET.

To make it as easy as possible to get going using LINQ with VB to access SpreadsheetML, I've recorded the following screen-cast that walks through the process of building a VB.NET application that uses the code from that blog post. Here is the video:

...

SNAGHTML1e8894fe

...

SNAGHTML1e88582d

..."

Nice example code and video that I've not seen elsewhere...

Wednesday, June 15, 2011

Official boat-load, as in supertanker, sized OpenXML content list (Insert "One OpenXML content list to rule them all" here)

OpenXMLDeveloper.org - New list of Open XML content now available

"There is a whole lot of content (articles, screen-casts, and blog posts) on Open XML. Over the last couple of months, I have been making a list of as much Open XML content as I can find. I have found and categorized some 380 pieces of content, and organized them by keyword and author. Keywords (and author names) in the content list are hyperlinks, so it is easy to navigate around and, for instance, look at the list of Open XML content that pertains to SharePoint, or all of the content associated with content controls. I have made this list so it is very easy to maintain, so if you are an author of some good Open XML content, please drop me an email (eric at ericwhite.com) and I'll be happy to include it in the list. And of course, as I discover (or write) new content, I'll be adding it to the list. If you need to find an article or screen-cast that explains how to accomplish some particular task, this should be the first place you start. Click on the following link to see the content list. This link will remain the same as I update the list, so feel free to bookmark it.

Open XML Content List

The way that I created this list (and keep it updated) is interesting in and of itself. I'm maintaining this list as a table in an Open XML spreadsheet. I used the code that I presented in the post on Using LINQ to Query Excel Tables. I didn't have to modify that code at all to make it work with my table in my worksheet. Then, I wrote some LINQ code to transform the results of the query into HTML tables, with all appropriate links in the right place, and so on. To update the pages in the wiki on OpenXMLDeveloper.org, I need to simply paste two sets of generated HTML code into two pages in the wiki.

I'll be writing an article on the approach I took (published here on OpenXMLDeveloper.org of course).

..."

OpenXMLDeveloper.org - Open XML Content List

SNAGHTML39b5958

We're talking one massive list... 126 printed page long list. Meaning pretty much if you're looking for OpenXML information, start here.

Wednesday, May 11, 2011

Checking for Microsoft Word DocX/DocM Revisions/Track Changes without using Word... (via OpenXML SDK, LINQ to XML or XML DOM)

Eric White's Blog - Using XML DOM to Detect Tracked Revisions in an Open XML WordprocessingML Document

"I’ve written a short article at OpenXMLDeveloper.org that shows how to detect tracked revisions using XmlDocument. Previously, I wrote an article on detecting tracked revisions using LINQ to XML or the strongly-typed object model of the Open XML SDK 2.0. However, some developers do not have the option of using LINQ, and instead must use one of a variety of XML DOM Document implementations. ..."

MSDN - Identifying Open XML Word-Processing Documents with Tracked Revisions

"Determining whether an Open XML WordprocessingML document contains tracked revisions is important. You can significantly simplify your code to process Open XML WordprocessingML if you know that the document does not contain tracked revisions. This article describes how to determine whether a document contains tracked revisions.

...

Introduction

Processing tracked changes (also known as tracked revisions) is an important task that you should full understand when you write Open XML applications. If you accept all tracked revisions first, your job of processing or transforming the WordprocessingML is made significantly easier.

Accepting Revisions by Using PowerTools for Open XML

To review the semantics of the elements and attributes of WordprocessingML that hold tracked changes information in detail, see Accepting Revisions in Open XML Word-Processing Documents. In addition, you can download the code sample, RevisionAccepter.zip from the following project on CodePlex, CodePlex.com/PowerTools. To download, go to the Downloads tab, and then click RevisionAccepter.zip.

Determining Existence of Tracked Changes

There are other scenarios where you want to process documents that you know do not contain tracked changes, and because of certain business requirements, you do not want to automatically accept tracked changes. For example, perhaps you have a SharePoint document library that contains no documents that contain tracked changes. Before users add the document to that document library, you want them to consciously and intentionally address and accept all tracked revisions. Accepting revisions as part of checking the document into the document library circumvents this manual process, where you want each person to examine their documents and resolve any issues.

As an alternative, instead of accepting revisions with the RevisionAccepter class, you can validate that the document contains no tracked revisions, and refuse to let the document be checked into the document library without tracked changes being accepted.

The code is not complex. It defines an array of revision tracking element names, and if any of these elements occur in any of the parts that can contain tracked revisions, then the document contains tracked revisions. We can use a LINQ query to determine if any of the revision tracking elements exist in the markup. This article presents four versions of the code to determine whether a document contains tracked revisions.

  • Using C# and LINQ to XML.
  • Using C# and the Open XML SDK strongly-typed object model.
  • Using Visual Basic and LINQ to XML.
  • Using Visual Basic and the Open XML SDK strongly-typed object model

..."

OpenXMLDeveloper.org - Using XML DOM to Detect Tracked Revisions in an Open XML WordprocessingML Document

"Tracked revisions are one of the more involved features of Open XML WordprocessingML. There are 28 elements associated with tracked revisions, each with their own semantics. In some cases, such as with content controls and deleted paragraph marks, the semantics for tracked revisions are (of necessity) very involved.

Some time ago, I wrote an article, Accepting Revisions in Open XML Word-Processing Documents, which details the exact semantics for each of the elements that comprise revision tracking.

...  However, many developers do not have the option of using LINQ to process XML, and instead must use one of a variety of implementations of XML DOM, such as System.Xml.XmlDocument in the .NET framework, or an implementation of XML DOM for php. This post presents a bit of XmlDocument code to detect tracked revisions. The important parts are those that show which Open XML parts to process, and the XPath expression to detect tracked revision markup. Because the semantics of XPath and XML DOM Document are carefully defined, it is pretty easy to translate this code to another language and implementation of XML DOM Document.

..."

Being in the biz that I'm in, Revisions/Track Changes in Word doc's are a big deal to me. I can't tell you the number of times I've seen documents where unaccepted revisions revealed something that might have been best to have not been reveled. From jokes to contract negotiations, Track Changes/Revisions can be a big deal. In many cases this is "hidden" metadata (though in newer versions of Word, by default this hidden data is much more in your face... which is a good thing) should be scrubbed, removed or at least be acknowledged prior to the release of any Word document.

With the old binary version of Word Documents, (DOC's) the common means of doing this was via automating Word. Yes, by those cringing and grimacing, I can see how much you liked doing that, especially in a automated, batch, or server scenario. In a word, ouch.

The beauty of the Open XML DocX/DocM format is that it's much easier to spelunk the documents now without using Word. DocX is just a Zip file with XML (and stuff). Sure it's supper dupper easy to walk a Word doc via its raw DocX xml, but have you ever looked at the binary Doc specification? In that respect, it's only about... um... about a 1000 times easier.

Anyway...

Eric's posts show how you can inspect Word DocX/DocM (DocM is a DocX with Marcos) for Revisions/Track Changes using either the Open XML SDK or just POX techniques.

 

Related Past Post XRef:
So how do I get from here to OpenXML? Got a map for you, an Open XML SDK Blog Map…
Where to go to scratch your OpenXML dev info itch…
"Open XML Explained" Free eBook (PDF)

Open XML SDK 2.0 for Microsoft Office Released – Automate Office documents without Office
Open XML SDK v1 Released

Opening OpenXML, the Open XML Package Editor Power Tool for Visual Studio 2010
Open XML 2.0 Code Snippets for VS2010 (and VS2008 too)
Open XML Format SDK 2.0 Code Snippets for Visual Studio 2008 – 52 C#/VB Code Snippets to help ease your Open XML coding
Open XML File Format Code Snippets for Visual Studio 2005 (Office 2007 NOT required)

OpenXML Viewer 1.0 Released – Open source DocX to HTML conversion, with IE, Firefox and Opera (and/or command line) support
Powering into OpenXML with PowerShell

Microsoft Office File Formats and Microsoft Office Protocols Documentation Refreshed
Microsoft Office File Formats and Protocols documentation updated for Office 2010 (Think “Now with added ‘X’ flavor… DocX, PptX, XlsX, etc”)
MS-PST file format specification released. Yep, the full and complete specification for Outlook PST’s is now just a download away.
Microsoft Office (DOC, XLS, PPT) Binary File Format Specifications Released – We’re talking the full technical specification… (The [MS-DOC].pdf alone is 553 pages of very dense specification information)
DOC, XLS and PPT Binary File Format Specifications Released (plus WMF, Windows Compound File [aka OLE 2.0 Structured Storage] and Ink Serialized Format Specifications and Translator to XML news)

Tuesday, February 22, 2011

Open XML 2.0 Code Snippets for VS2010 (and VS2008 too)

Microsoft Downloads - Office 2010 Sample: Open XML SDK 2.0 Code Snippets for Visual Studio 2010

"Download this package to install the Open XML SDK 2.0 code snippets for use with Visual Studio 2010.

Office2007OpenXML20Snippets.msi  <- VS 2008 Snippets
417KB

OpenXMLSDKSnippetsForVS2010.msi
396KB

Version: 02/11

Date Published: 2/21/2011

...

The snippets in this download use the Open XML SDK 2.0 to accomplish many tasks involving Microsoft Excel, Microsoft PowerPoint, and Microsoft Word 2007 and 2010 documents. You can use the enclosed code snippets with the Microsoft Visual Studio 2010 Code Snippet Manager. Each snippet provides unique functionality that you can reuse within an application. This download provides snippets written in Microsoft Visual Basic and Microsoft Visual C# development languages. Download and install these snippets to your Visual Studio code snippet folder and use them with the Visual Studio Code Snippets Manager.

You can still download the code snippets file, Office2007OpenXML20Snippets.msi, for use with the Microsoft Visual Studio 2008 Code Snippet Manager. That download provides snippets written in Microsoft Visual Basic and Microsoft C# development languages.

...

The download file, OpenXMLSDKSnippetsForVS2010.msi, works with documents or files produced with the following Office applications:

  • Microsoft Office 2010
  • Microsoft Office Excel 2010
  • Microsoft Office PowerPoint 2010
  • Microsoft Office Word 2010
  • 2007 Microsoft Office System
  • Microsoft Office Excel 2007
  • Microsoft Office PowerPoint 2007
  • Microsoft Office Word 2007

...

By default, this download installs files to the following locations:

  • Visual C# snippets. PersonalFolder\Visual Studio 2010\Code Snippets\Visual C#\Open XML SDK 2.0 for Microsoft Office 2010
  • Visual Basic snippets. PersonalFolder\Visual Studio 2010\Code Snippets\Visual Basic\Open XML SDK 2.0 for Microsoft Office 2010

After you install this download, use the Visual Studio 2010 Code Snippets Manager as you normally would.  ..."

Sometimes snippets are just what you need to get started. Programming against OpenXML files is very different that using the Office API's. It's much closer to the metal and so having these snippets will not only save you some time, but also give you a leg up on learning other parts of the Open XML 2.0 SDK.

At first glance, looks like the same 52 snippets...  Meaning the details in my Open XML Format SDK 2.0 Code Snippets for Visual Studio 2008 – 52 C#/VB Code Snippets to help ease your Open XML coding post likely apply to the 2010 version too

 

Related Past Post XRef:
Open XML Format SDK 2.0 Code Snippets for Visual Studio 2008 – 52 C#/VB Code Snippets to help ease your Open XML coding

Where to go to scratch your OpenXML dev info itch…
Open XML File Format Code Snippets for Visual Studio 2005 (Office 2007 NOT required)

Tuesday, June 22, 2010

Opening OpenXML, the Open XML Package Editor Power Tool for Visual Studio 2010

Office Development with Visual Studio - Open XML Package Editor Power Tool for Visual Studio 2010 (Navneet Gupta)

“We are happy to announce that today we are releasing the Open XML Package Editor Power Tool for Visual Studio 2010 on Visual Studio Gallery. This Power Tool is a Visual Studio add-in that provides an easy way to parse and edit Open Packaging Conventions files, including Word, Excel and PowerPoint documents. This Power Tool enables you to do the following tasks:

  • Open any Open XML Package file or XPS Package file directly in Visual Studio 2010.
  • Browse the contents of Package files in a tree view.
  • Open any XML part directly in Visual Studio's rich XML editor.
  • Add or remove parts and relationships directly in the user interface.
  • Import and export part contents to and from files.
  • Detect when a Package file that is opened in Visual Studio is modified externally. The Power Tool prompts user to reload the file without having to close any open XML part editors.
  • Create new Office Packages from a set of templates using Visual Studio's File > New dialog.

This Power Tool was originally shipped for Visual Studio 2008 as part of VSTO Power Tools v1.0.0.0. This new version for Visual Studio 2010 contains all the original features from the previous version and it works the same.

image …”

What I like about the Office 2007 “X” formats, (DocX, XlsX, etcX.. and “M”’s too.. DocM, etc) is that the documents are spelunk'able. That we can open them and see/tweak their insides without magic spells and/or 20+ intelligence. Same goes for XPS (have you ever tried to look at the guts of a PDF? Let alone tweak its content? It’s not pretty).

Sure Open XML isn’t for the casual document diver, but it IS do’able. Thinking about digging into the binary office formats, directly into them without a wrapper/utility/library/etc just gives me hives.

Yeah, yeah, there’s ODF and yeah, yeah, there’s all the past “political” crud about the ISO process and Office’s support for the “official” standard… yeah, yeah, whatever… LOL ;)

I’m just a LOB Dev and just want to build solutions that make my users and my business happy. And if OpenXML does that for us, then I it’s cool in my book…

[Gee, I guess I’m a little uptight today aren't I? lol ]

 

Related Past Post XRef:
Microsoft Visual Studio Tools for the Office (VSTO) Power Tools v1 Released

So how do I get from here to OpenXML? Got a map for you, an Open XML SDK Blog Map…
Where to go to scratch your OpenXML dev info itch…
"Open XML Explained" Free eBook (PDF)

Open XML SDK 2.0 for Microsoft Office Released – Automate Office documents without Office

Open XML Format SDK 2.0 Code Snippets for Visual Studio 2008 – 52 C#/VB Code Snippets to help ease your Open XML coding
Open XML File Format Code Snippets for Visual Studio 2005 (Office 2007 NOT required)

Open XML SDK v1 Released

OpenXML Viewer 1.0 Released – Open source DocX to HTML conversion, with IE, Firefox and Opera (and/or command line) support

Powering into OpenXML with PowerShell