When Links Attack

As many of you may be aware, at least those who have read this somewhat poor attempt at writing for some time, would know that I pull the links list via del.icio.us using some (heavily) re-purposed RSS import code. It’s a pretty ugly hack and has been tweaked some to ensure it plays nice with the XML feed in use.

The result, however, is an entirely portable way of handling the linked list — giving me access to discoveries and providing a methodology to share with others. It’s a simple concept that continues to work remarkably well. Given del.icio.us limit the description field character count, it forces a certain brevity, which is ideal.

When importing data from potentially un-controlled sources, it’s always wise to have some form of ‘tick box’ approach to approve posts. A human based sanity check, if you will. This rather simple, yet potentially time consuming point is easy to ignore, in favour of import to an automatic live, published state.

Obviously if you’re a control freak1 then that’s not on the cards — but with the likes of tumblr and similar systems, that make one-click re-posting an entirely simplistic affair, it is becoming a more common phenomena.

Time is money, as they say, thus spending additional time gently sculpting content here and there is not something that one will always have an over-abundance of time for2. Which is going to be fine, ninety-nine point nine-one-nine percent of the time.

So what happens when you do not control your content, do not sanity check the data coming in to ensure it makes the grade and live publish?

malformed content in an XML feed can lead to unexpected results

The above is the result, snapped after quite some pruning. Some 31 copies of the same link had been imported, whilst I slept. Which as we all know, is the absolute perfect time for the proverbial goopy nasty stuff to collide with fast-spinning blades.

Had I allowed the import to publish ‘live’, I would have a bunch of readers asking quite why this link was so very important, that it had to be published, repeatedly, for several hours. It appears a random quote mark, or other un-escaped entity made it into the XML and resulted in an import occurring successfully, excluding the malformed character.

Obviously, when the next “sweep” of the XML feed occurred, the data did not match 1:1 so the script did what it is supposed to do and reimported the data for me. Again and again. Even though each import does a quick check to see if the data already occurs, any failure to match data will import the link as new.

This is by design, in a way, as it is entirely possible that I will at some point, import more than one link with the same title, or other potentially identical data, thus in order for all linked items to be checked and posted, I need them to actually import regardless.

The most reliable solution thus far, as mentioned above, has been to import to a ‘draft’ status. This affords me as much time as needed to ensure I am happy with the link, well before it is published. It provides an avenue to add additional relevant hyper-links, touch up the content and so on prior to ‘go live’.

If I just don’t think it’s right after final editing, I’ll drop a tag from it’s meta data at del.icio.us and delete it from the blog, all without your notice. And this link related content, is one of the many reasons I am, in fact, looking at other platforms. The above flaw exposes a weak point in Wordpress that truly frustrates this author.

As a basic blog, Wordpress is entirely brilliant. It’s remarkably easy to drive and offers an array of basic features that work well. But the moment one wishes to extend, to incorporate other content and to push the boundaries, I’m afraid it simply isn’t up to the challenge. One requires a host of plugins and a great deal of template hacking to open up the platform.

Conversely, a number of existing platforms have aggregation built right in, yet Wordpress still has yet to provide anything more than a rudimentary one-off RSS import function. And I’ve read an increasing number of people commenting that “It shouldn’t be this hard” of late.

You’re right — it shouldn’t be.

Today’s exercise in having to re-work my hack, again, and the time spent cleaning up after an entirely valid XML feed borked on import, is time that I could have spent on content creation instead.

Perhaps I have simply outgrown the platform, which is not perhaps as far-fetched as it may sound. Document control, ease of content management and an ability to import and self publish, in a well constructed and aesthetically pleasing manner, are all increasingly important to this author.

Today I was reminded of both the advantages and disadvantages of the platforms I use. Perhaps, tomorrow, you’ll benefit from that understanding as a result.

  1. although.. I have become one of late ()
  2. again, I find I continue to make the time required to ensure I am at least, in part, happy with the content ()

≡ This is a journal entry relating to the topics of , , , .

Brendan Borlase is a Systems and Network Administrator living in Adelaide, Australia, having lived, worked and breathed Information Technology for over 12 years. Learn more.

Feedback is encouraged. If you would like to read more, consider subscribing to the regularly updated RSS Feed.