Matt Large: December 2004

Tuesday, December 28, 2004

Christmas survived...bring on New Year...

Wow Christmas just snuck up on me and then dashed past. Been back to the old homestead this past weekend, and now here in the big smoke again. It's been a great Christmas, my nephew's first (he drooled a lot) and my friends' 26th (they got drunk and drooled a lot).

So next on the traditional things-to-do list is New Years which will be a quiet one in with the, infinitely, better half. With that in mind here comes the ubiquitous "Year in Review" post...

Work

So we'll start with work, and there has been a lot going on, in some respects, not too much in others. I might get a project to completion in the New Year, but will not have made one this year, that kinda sucks, but then I've worked on some exiting things.

We have just got a new CEO, and early in 2005 there will be more new faces around. We also said goodbye to some great people. It's going to be a year of change in many ways.

The development team did manage to get our technology open sourced successfully (http://www.openharmonise.org), that was very satisfying.

Personally

While work was a bit of a mixed bag, personally I had an amazing year. I ended 2003 and started 2004 by telling anyone within earshot that 2004 was going to be the best year ever, and although none of them would believe me it did turn out that way.

I met and fell completely for the above mentioned "better half".

I traveled, an amazing feat for those that don't know me. Until 18 months ago I had taken one holiday in 4 years (for my brother-in-law's stag party). Since then I have been back to Munich for the Oktoberfest, to Paris for the first time. I had a great time on an abortive attempt at camping in Wales during gale season. A great friend and I went to Rome together, making me appreciate his friendship even more. Finally my girlfriend and I recently went to New York together, during which trip I realised that not only are we great together in our relationship but we are traveling soul mates.

My predictions for a great 2004 extended to those around me as well, and that also worked out pretty well.

One friend finally made the move to London and is now in a great flat, a good job and generally having a good time, no matter what he says sometimes.
Another friend found a new flat in an area he loves and got the promotion that he richly deserved but refused to believe he would get.
Friend number 3 (in no particular order) moved in with his girlfriend and seems to be happier than I've ever seen him.
One friend has a book deal for 2006-11!
Another friend was published for the first time.
And finally, although there were other great things that happened, we all went out and had a great time over Christmas.

On and on to 2005...

One of my friends is now championing 2005 as the next great year, and I am willing to go with that, although I think that for me it will be a hard year. I am going to be focusing on moving forward with a lot of things that are going to require a lot of work from me, but I truly believe that they are worth it. People that know me well will know what I'm talking about, those that don't will find out in due course.

I am looking forward to the next year, I hope it turns out well for all of you. This week I am going to be working damn hard to get my current project out of the door and then celebrate the New Year so I will be back with you all next week. Till then...

Thursday, December 23, 2004

Lessons Learned #4: Using XSLT key function for lookups...

I use XSLT all the time, more and more in fact. Obviously we use it as part of the publishing engine of Open Harmonise, but there are far more uses that we put it to. Mostly these are for data preparation or presentation. For example a goal in many of our projects is to end up with XML data sets that our clients can give out to industry, but very few, actually none, of our clients are willing to look at the XML documents to QA them. When this happens a quick XSLT later and they have HTML documents to do the data QA on, if any changes have to happen to the original XML then all we do is regenerate the HTML again.

One of the features of XSLT that I've known about for a long time and not really used is the "key" element/function set. I haven't used this as it has been quite badly supported within XSLT processors, however this doesn't matter so much when I'm doing this data preparation work as the only place I need it supported is within the Sonic Stylus Studio processor, which has excellent support for all of XSLT. I haven't checked out the support in most Java processors for a while, I imagine it is much better.

Well today I finally found a reason for trying "key" stuff out again. I am preparing some data, a rather large single XML data set from a client for import into a system we are building in Open Harmonise. Some of the data associated to parts of the XML contain Value Codes which need to be converted in the correct Paths to the Values within the Harmonise repository. I published a set of files which provided mappings from Codes to Paths and used these in the main XSLT. Unfortunately these are rather large files with several thousand mappings in them, and they are referenced several thousand times as the main XSLT is run. This meant a 9 minute wait for the processing to finish.

XSL "keys" are designed specifically for this purpose, to make lookups very, very fast. There is only one problem, you cannot directly reference an external XML document (using the document() function) from a "key" element. After some hunting around I found the solution, and here it is.

<xsl:variable name="externalDoc" select="document('doc.xml')/child::ROOT"/>

<xsl:key name="lookupKey" match="CHILD" use="@id"/>

<xsl:template name="lookupID">
<xsl:param name="lookup">
<xsl:param name="ID">
<xsl:for-each select="$lookup">
<xsl:value-of select="key('lookupKey',$ID)/child::NAME/."/>
</xsl:for-each>
</xsl:template>

The variable at the start contains the external XML document's root element but the key is setup as normal, as if the key was being used in the main context document. The named template is created so that we don't have to have this XSL code all over the place every time we want to use the key.

The template takes in 2 parameters, the first "lookup" is the variable that we set up at the start pointing to the root of the external document, I did this because in my XSL I had several external documents I was using. "ID" is the other parameter, which is the value we are going to pass into the key.

We then use a "for-each" element on the lookup variable, which remember is the root element of the external XML document. While using a "for-each" on a root element may seem silly, there will of course be only one of them, this is done to set the current context to that XML document. Keys always operate within the current context, therefore the key will now work with the external XML.

There are much simpler ways of doing this, simply navigating the external XML without have to change the context to use keys, however this solution brought my 9 minute XSL processing down to a more managable 30 seconds.

Saturday, December 18, 2004

Did HP just try to screw me?...

Some time ago I told you that I ordered a HP dx2000 Linux machine direct from HP. It was on two week availability, so I sat back and waited. It should have timed perfectly with the turning on of my broadband access, which by the way is on and perfect. After a while HP took the money out of my account and so I figured that the machine was on the way, when it didn't arrive after a little while I thought I would give them a call to check on the delivery status.

When I called them, as well as some really cheesy Christmas music which was great, I was told that they'd stopped supplying that machine. They said I could talk to the sales team and order something else or just get a complete refund.

So my question is, exactly how long after taking my money were they thinking of telling me that I wasn't going to get anything? I took the refund and immediately ordered something from Scan, which for only £100 more is actually a much better machine and I still didn't have to buy Windows so all is good.

Thursday, December 09, 2004

Using our tech to build a paper based system...

I had a meeting with the clients for my current project to work on their requirements. This project is to review a set of sector specific controlled vocabularies and their application to thousands of resources. We are providing a web based system on Open Harmonise that will contain the master copies of the vocabularies and resources and their metadata, allow for the edits and output machine readable copies in XML. So we are providing the technology and the client is providing a small army of sector specialist to do the editorial work.

All well and good, not an unusual project for us, except that I found out in this meeting that the sector specialists doing the editorial work have decided that they want to do this on paper! Granted they are not technical people, but a web based system isn't that hard to deal with.

Well our client is wiling to go along with this, and provide staff to take the paper forms, once filled in, and input the data into the html forms. So I'm building a lot of pages to publish the forms into PDF files to be printed off.

I can understand the reasons for this, but that doesn't stop the project from going against almost everything I believe in as a developer. Should be an interesting project at the least.

Wednesday, December 08, 2004

Lessons Learned #3: Unicode in SQL...

We are continuing our adventures in ensuring that Unicode is fully supported throughout our application and a colleague of mine solved the last of our issues. We were loosing the Unicode character during the round trip to the database, which in this case is MS SQLServer 2000.

SQLServer 2000 is fully Unicode compliant so it all should have worked. All the columns that might contain Unicode character were set to the n* types (nchar, nvarchar and ntext) and yet we were still loosing the information.

Apparently the solution is the prefix the string literal with a "N" character, as in;

update bob set surname=N'unicode surname'

This prefix is SQL92 Intermediary and SQL99 & SQL2003 Optional. Here is an extract from the only place in the SQLServer 2000 help files that I could find this vital information;

"Unicode strings have a format similar to character strings but are preceded by an N identifier (N stands for National Language in the SQL-92 standard). The N prefix must be uppercase. For example, 'Michél' is a character constant while N'Michél' is a Unicode constant. Unicode constants are interpreted as Unicode data, and are not evaluated using a code page. Unicode constants do have a collation, which primarily controls comparisons and case sensitivity. Unicode constants are assigned the default collation of the current database, unless the COLLATE clause is used to specify a collation. Unicode data is stored using two bytes per character, as opposed to one byte per character for character data."

Hope this helps, I know that we're very glad we found it.

Monday, December 06, 2004

Lesons Learned #2: Character Encoding

Since I've spent the last few work days dealing with character encoding issues I thought I would write up some useful points to remember.

1) Internally Java is excellent at maintaining character encoding, so you don't have to worry about how you deal with Strings and Characters within your application.

2) The points where character encoding will become an issue are at the boundaries, where you are sending or receiving data.

- When using streams to read/write data to/from a network connection or a file you should declare the encoding you want to use, e.g. "UTF-8".
- Ensure that any databases you interact with are correctly configured for the character encoding that you want to use. For example in SQLServer you will need to make sure that all String fields are either nvarchar or ntext, not the usual varchar and text types. You may also have to configure the database as a whole to ensure that connections use the correct encoding type.

3) Swing components will all support Unicode, however you will need to ensure that the font you are using supports the Unicode ranges that you are using. The default fonts delivered with Java will support most of the Unicode set, but this does not include Japanese, Chinese and Korean glyphs. MS Windows comes with "MS Arial Unicode", however you may be better off using code to find a font that will support the Unicode ranges that you need.

A good reference is the following;

cover

buy from

Sunday, December 05, 2004

Things are a changing here at LargeHQ...

The ongoing saga of computer changes here at home continues with the news that I am finally giving up on this machine (far to old and slow to use as a main workstation) and I have ordered a new machine. A really simple and cheap machine from HP, which comes without Windows.

I also finally ordered broadband, 1Mb line with Bulldog, which should be in place before Christmas. So very soon I should be back up to full swing here.

If I could go back in time...

The first thing that I would do is ensure that all developers at the start of computing agreed to Unicode 2.0, or something else. I don't really care what standard they chose, as long as it works and that they all used it. As time goes past this should lead to me being able to tell my boss that foreign character support in our application is a given and therefore save us several days of testing and head scratching wondering where our Japanese characters have gone.

Character encoding and date formatting are the two things that I hate more than anything in programming. Why can't this be more simple? Why is it that I am restricted to using a specific font to support all the characters when it couldn't be too hard for OS vendors to enable the font subsystem to fall back to a generic Unicode 2.0 full supporting font for individual characters that the desired font doesn't support.