Archive for the 'Uncategorized' Category

Converting relative URLs to absolute ones.

Tuesday, November 30th, 2010

On a number of occasions I have found that I need to convert all URLs in a webpage from relative ones to absolute ones.  This is usually done when you are displaying a page sourced from one domain from a different domain.

Note that this doesn’t really provide full emulation or proxying of a site, you need to take into account cookies, etc. to do that correctly.

Here is how I most recently cracked it (in C# .NET).  This isn’t fool proof; however it should work with most HTML:

GAPP CMS

Saturday, June 26th, 2010

Okay, so I’ve written a CMS.

Not a huge deal I know, but there are a couple of things about this CMS that make it a bit different.  Check it out.

The reasoning behind creating it was I had recently moved to London and was looking for work.  I thought it might be a good way to up-skill a little and have something to show a little bit of Java experience when it comes to web.

I’ve also kinda wanted to do some work with the Google Application Engine since it was released, and I figured that this was a good opportunity.

Turns out I got a job doing .NET!  Come to think about it, I’ve actually got more Java here than PHP… hmmm.  Just to clarify, I have done more PHP than anything else, but I do like to have experience across a range of platforms.

So, I decided to finish it off and publish it.  It’s based on the Google App Engine, which gives it a number of advantages (with a couple of drawbacks).  Working with a No-SQL database was an interesting experience, but given how difficult some of the simple things can be (aggregates anyone?) I think there would need to be some strong reasoning behind the decision to use them.

Anyway, check out the website, have a play, give me feedback.

java-libpst & pst2gmail

Tuesday, January 26th, 2010

I like Gmail.

I like the huge storage, the reliability (I know they’ve had the occasional issue, but it’s nothing compared to some other services), the interface with the vim-like shortcuts and I simply love the searching capabilities.

When Google finally enabled IMAP support for Gmail I decided it might be worthwhile to move my work email over to it (people may recall my love for IMAP).  This meant I could still use Outlook with IMAP and get all of the benefits such as being able to read calendar invitations, RTF formatted emails as well as the horrendous HTML generated by client with Outlook, but it also gave me the added features of Gmail.

Unfortunately I had about 4GB of email stored in my old Outlook PST.  I really wanted to migrate this over to Gmail as well to take advantage of those sweet searching features to help me navigate through the vast minefield which is my email archive.

So! I tried drag-dropping the emails from my PST file over to the Gmail IMAP store through Outlook, and started the migration.  My Outlook almost instantly hung while each email was uploaded to Gmail at what seemed a glacial pace.  Indeed, I left it for a weekend and managed to get a couple of gig up, but my outlook had completely crashed by the time I walked in on Monday.

So, I thought to myself “there has to be a better way”, and started searching for a migration tool.  Unfortunately I couldn’t find one.  No probs I thought, I’ll just write my own… only there wasn’t much in the way of libraries that could read PST files.

First I tried libpst, which is a cute little C library.  I hadn’t really done any development in straight C before, so I decided I’d try to make a little ncurses app to read my PST files to test out the library.  I proceeded with moderate levels of success.  Unfortunately I ran into a few issues, the first being my deep-seated and intense hatred for C as a language.  The second and slightly more fatal one was that the library didn’t work particularly well for me; it would crash when reading any emails under the Inbox and it would simply hang when trying to read my larger 4GB PST file.

I then came across libpff.  This is another C library that was built from libpst. By this time however, I had had a gutful of programming in C.  I also suspected it may have some of the scalability issues present in the original libpst.  It was also a little difficult for me to work out how exactly to use the library as C isn’t one of my strong points and there isn’t much in the way of usage / API documentation.

So, I did the only reasonable thing that a self-respecting geek would do: started to write my own library in Java.  Thankfully, Joachim Metz of Hoffmann Investigations who was responsible for the vast majority of the work in libpff also studiously documented the file format as he went and it looked like there was more than enough there to make a start.

The library (java-libpst):

Okay, so I’ll admit that I was biting off a bit more than I could effectively chew, let alone swallow.  It turns out that PST Files are all little-endian, and Java is, of course, big endian.  Oh, did I mention that PST files are also encrypted by default using a substitution cipher? But… not completely encrypted… just enough of them to make it difficult, oh! all the RTF items are compressed as well.

There is also the matter of the reasonably important unknown bits of the file format which were not yet documented.  For example, there is this B-Tree in the file format that allows you to quickly locate an item within the file.  Which is great! only, there didn’t seem to be a way to find out what child items may be attached to a parent item (like a folder).  Oh! you can find out who your parent in easy enough, the id of the parent is sitting right there… but the child items… no sir!  This was a bit of a problem you see because as with most tree structured files, you start from the top and work your way down.

I managed to work around this by reading the entire contents of the B-Tree and creating my own inverted map of parent to child objects.  It turns out that this wasn’t really that bad of a solution and only takes a few seconds to read through them all and create the map even on huge PST files.

Frustratingly, soon after I had managed to get this part of the library to work, a new section of the file format was documented which describes “related” nodes, which contains, among other things, a list of all the child items.  Sigh.

I also had a few difficulties understanding some of the finer points of the documentation, like when data tables fall across different blocks with external lookup tables.  There was also a fun point when I started to hit “named items”, which are basically keys that change meaning from file to file.  The identifier that denotes a recipient’s email address may, for example, be a different identifier in a different PST file!  Once again, this is only in some instances (yay for consistency!).

After much head scratching, reading, and staring at massive amounts of Hex encoded data, I managed to get the library working well enough to navigate the PST structure and extract items including emails, attachments, contacts, calendar items, etc. relatively quickly without crashing too horribly.

The result is available on Google code here:  http://code.google.com/p/java-libpst/

But my work was not yet done.

The migration app (pst2gmail):

Of course, the real goal was to create an application that could migrate all of your email from your legacy PST file to Gmail.

With the library working, this part was a lot more straightforward, and a mostly working alpha can be downloaded from here: http://code.google.com/p/pst2gmail/

I did have grand plans of having the system migrate your contacts and calendar items as well, which wouldn’t actually be that difficult to do, however my personal development focus has once again shifted and this functionality will probably remain disabled and unimplemented until people start to howl for it to be implemented.

Until then, please enjoy, the emails are uploaded through IMAP, flagged messages are passed through as starred items, folders are automatically created as tags, attachments and all that work, you can even resume your upload through the generated log files if you need to stop the process midway through!