The project I am working on at the moment involves putting a lot of information that our clients publish into our system for them to apply metadata to. All of this published information is structured hierarchically, although it isn't obvious from the printed versions what these hierarchies are. So we have been spending the last few weeks moving this information into an XML format that we can load into the CMS.
Some of this work has been done before, so that information was easily exported from another copy of our CMS, however that is only a small part. What we have been finding is that the only machine readable versions of much of this data is in held Quark files and they can only offer us Word document exports from this. There is no semantic markup and the WordML structure involves lots of tables which are used for layout purposes. In the end most of this information has had to be cut and pasted by hand.
I have to admit that I am a little shocked by all of this. The information in question is the life blood of this organisation, in fact it is pretty much the only reason that the department that we are dealing with exists. Having this data locked into a format that is strictly for layout publishing purposes seems absolutely crazy. From what I can see all editing is done directly to this format. I think that the reason this has happened is that our clients have always seen the end product, i.e. the printed versions, as sacred without ever thinking that somewhere in there is pure data which is actually what they should be concerned about.
The true extent of this problem came to light on a related project with the same client and data set. The printed versions have marginal notes which provide cross-references between parts of the information and we needed to know if there were any reciprocal links in there. Our client couldn't tell us this without checking through all the relevant parts of a printed copy!
It is easy for us developers to forget that a client's perspective on something can be very different to our own. We had assumed that there would be a way to get some of this information in an electronic format that we could at least begin to use and transform. I think this has been something of a learning curve for our client, and we are helping them to understand the implications. Of course the great outcome of this project is that they will have these pure data versions in XML. They are now seeing all the possible benefits of this perspective change from thinking that the Quark files are their only precious commodity to thinking about the underlying data as being more valuable.
Reminder:
We have a position open for a Java/XML/XSLT developer at our offices in London (£26-£30K + benefits). If you are interested there is more information available on our website.
1 comment:
louis vuitton outlet online
louboutin uk
new orleans saints
nhl jerseys wholesale
michael kors outlet sale
michael kors factory store,michael kors outlet online sale,michael kors,kors outlet,michael kors outlet,michael kors handbags,michael kors outlet online,michael kors handbags clearance,michael kors purses,michaelkors.com,michael kors bags,michael kors shoes,michaelkors,cheap michael kors
tiffany jewelry
san francisco 49ers
tods shoes
arizona cardinals
coach outlet store
chicago bears
michael kors outlet online
soccer jerseys,soccer jerseys wholesale,soccer jerseys cheap,soccer jerseys for sale,cheap soccer jersey,usa soccer jersey,football jerseys
ugg boots on sale
cheap oakley sunglasses
true religion sale
kobe 9 elite
nike air max 2014
mont blanc
ed hardy outlet
philadelphia eagles
belstaff jackets
michael kors handbags sale
vans sneakers
michael kors handbags,michael kors outlet,michael kors outlet online,michael kors,kors outlet,michael kors outlet online sale,michael kors handbags clearance,michael kors purses,michaelkors.com,michael kors bags,michael kors shoes,michaelkors,cheap michael kors
nike free
north face outlet
nike air huarache,nike huarache,nike huarache sneakers,nike huarache shoes
pittsburgh steelers
boston celtics
ralph lauren
ddddd1110
Post a Comment