February 2010 (1)
September 2009 (1)
May 2009 (1)
April 2009 (1)
March 2009 (4)
January 2009 (3)
November 2008 (2)
October 2008 (2)
September 2008 (1)
August 2008 (5)
July 2008 (3)
June 2008 (1)
May 2008 (5)
April 2008 (8)
March 2008 (3)
February 2008 (1)
January 2008 (2)
December 2007 (2)
November 2007 (4)
October 2007 (17)
September 2007 (9)
2007-12-04 20:50:31
At work we are migrating an old site to a new CMS.
Unfortunately the content is a mess. Owing to people pasting text in from Word and various other accidents, one fragment of HTML can be a mixture of UTF-8 and Latin-1 and cp1252 and goodness knows what else. When you’ve been a good boy and coded all your templates to declare “I am UTF-8, honest guv” it’s a bit trying. Especially when the client complains.
The markup is pretty broken too. It’s littered with weird markup from Word and generally non-compliant.
So far I’m having good results from a pipeline of various tricks.
The only downside is that over thousands of items, this is pretty slow. But it’s the price you pay to be beautiful, I guess.
Rendered at 2012-02-05 20:12:57