Paul Kiel's Data Integration Blog
Data integration using Xml / Xslt and anything else...







Monday, February 25, 2008

the mighty spreadsheet

Working in technology for a while, I find it interesting that with all the tech available to folks, the simplest approach is so often used.

Take data integration. In the process of creating requirements to be modeled (whether in Xml Schema or UML), mapping data to internal processes, or mapping data between systems, folks often gravitate to the simple spreadsheet as the primary work horse.

I've done this with many clients, using variations of similar spreadsheets to do mappings and requirements gathering. I've seen standards folks use this, in HR-XML, OAGi, UBL, and the CCTS folks. They all invariably use the spreadsheet over much more sophisticated tools.

Hail the humble spreadsheet!

See 2007 post archives.

Friday, February 22, 2008

great Ruby podcast online

Great podcast on Ruby with author Russ Olsen, who wrote Design Patterns in Ruby. Have not yet read the book, but will surely do so as I get more experience. The podcast is free and I found it to be very good.

See 2007 post archives.

Friday, February 15, 2008

programming in Ruby

I've been hearing so much about Ruby, I finally went out and bought a book. It's called Learning Ruby by Michael Fitzgerald. I found the book pretty good at laying out the jist of Ruby. The most interesting thing about it is that everything is an object. Even numbers are classes! This takes object orientation to its logical conclusion.
I also like the "dynamic typing" wherein the system determines the data type of a variable. While I wonder if this leads to casting problems, it is nice to have someone else take care of the details.
Will post on a Ruby podcast next.

See 2007 post archives.

Xml arrives at age 10

Hard to believe but Xml is now 10 years old. It has been an amazing decade of development. Those of us with a love (illness?) for this technology are happy to see it become so implemented. I remember when it was first published in 1998, I was trying to implement it with a text editor and the "Panorama" plugin viewer. I had no way to validate at the time. Then within a year, I was implementing it again with the new Xml support in Internet Explorer 5. Tools are so much better now.

Tim Bray has just written a great piece called "Xml People" about Xml's development in the early days and the people who were there.

For those who want to understand the original spec, Tim also wrote this great annotated version of the spec. I've referred to this numerous times over the years.

Relatedly, the spec is currently at Xml 1.0 fourth edition. There is a proposal to make the unsuccessful version 1.1 into an Xml 1.0 fifth edition. But as I've blogged, it has its detractors.

Happy birthday Xml!

See 2007 post archives.

Monday, February 11, 2008

Impact of XML on Data Modeling

An interesting thread on the xmlschema-dev list about the impact of xml on data modeling.

Scott Tsao starts off talking about how to define different kinds of data models. The Conceptual, Logical, and Physical. He says that Xml Schema is the logical model and the binding to a database would be the physical one.

Essam Mansour says: "Conceptual: the model at this level is platform-independent and also could be logically modeled using any modeling technique, ERD, OO model, or XML Schema. Logical: is also platform-independent but it is based on specific modeling technique. Physical: is platform dependent."
This strikes me as pretty close on. Xml Schema can do any of these models, it is the semantic concepts that are different.

Anthony B. Coates says what I think is very common and have come across many times. He says Xml Schema is the physical and UML is the conceptual. This is in fact how they each tend to be used. But I have also found strong support for Xml Schema at the conceptual level. Anthony goes on to say that the conceptual (UML) is the business representation and the physical (XSD) is the technicians' view. This again is what I have come across with my clients. But it is not rigid. The line between business and technical is often blurry, and reality creeps in. But the broad separation of these if often seen as an "ideal".

Michael Kay, who is everywhere and one of the smartest folks I've read, says the concepts are separate. A conceptual model gives you abstraction that can benefit data modeling. So while the concepts are separate, there is an underlying standardized data model.

This is where already standardized data models such as OAGIS and HR-XML come in to play. They can give you a heads up on creating a new model (whether in UML or in XSD) or they can be considered an open source data sharing model for B2B transactions.

With some support for the conceptual=UML and physical=XSD, the discusssion then goes on to ask "can xsd also model the conceptual?" See Scott Tsao.

I totally agree with Scott in that it "can" play that role. And I've seen it done that way. The bottom line is to use the modeling tool that is most usable and comfortable to you. If it is UML, then use it. If XSD, then use it. I've seen it done both ways and I don't think there are any musts here.

The big problem with UML and XSD is translating between them. But I'll blog on that topic separately.

See 2007 post archives.

OAGIS 9.2 released today

Just announced that OAGIS version 9.2. is out today. Go grab a copy while supplies last...

See 2007 post archives.

Saturday, February 9, 2008

Is XML 1.0 (5th ed) backwardly compatible?

Been reading alot about the Xml 1.0 fifth edition, now a "Proposed Recommendation". The reasoning goes like this: there are no Xml 1.1 docs because parser writers believe no one will use it and no one will use it because parser writers don't support it. So why not make Xml 1.1 into an erratum version of Xml and call it 5th edition of 1.0? That seems to be what has been done.

Someone recently wrote about a problem of backward compatibility. The issue is around Unicode. Originally Xml was explicitly tied to Unicode 2. But Unicode has not stood still, moving on to version 4 and beyond. So what if someone wants to use Unicode 4 in a xml setting?

The theory is a relaxation of constraints. Xml currently says Unicode 2 and anything not explicitly allowed is prohibited. The Xml 1.0 5th edition says anything not explicitly prohibited is allowed. So following the logic, the 5th edition is a superset of the previous editions in terms of allowed characters.

Michael Rys says that 1.1 becoming a recommendation was a day for mourning. I assume he would say the same for 1.0 5th edition.

Norm Walsh likes it alot. "The fifth edition does not change the status of any existing XML 1.0 document with respect to well-formedness or validity. Nor does it introduce any of the backwards-incompatible changes introduced in XML 1.1."

So Norm's comment makes me think it fully backwardly compatible (the 5th edition that is).

John Cowan describes the characters are not allowed in 5th edition.

David Carlisle says the change is a good one, but it should be called explicitly a version and not passed off as an erratum.

Mark Nottingham describes the problem that most seem concerned about, namely "implementation Z (of, say, the 3rd edition) coming across a 5th edition document and blowing up".
This is a logical concern. But even this concern is still backwardly compatible in my reading as long as the Unicode standard is expanding in super sets and not altering existing material. This is what I understand to be the case.

So I still haven't found out what makes Xml 1.0 5th edition not backwardly compatible. For sure it would lead to things not being forwardly compatible. But that is a well understood issue in software development and doesn't prevent forward progress.

See 2007 post archives.

© Copyright Paul Kiel.

Archives:
January 2008 February 2008 March 2008
2007