Lets start with some things you don't see of APEXBlogs; the code that is pulling all the information from the blogs and aggregating it together.
This is the flow how it worked in version 1 of APEXBlogs (currently still used):
APEXBlogs itself is just the UI, in the backend you have packages (using XMLDB features) that connect to all the blogs and searches for changes. If it finds changes it will merge these changes in the tables APEXBlogs is build on. There are a couple of issues with this method:
- The more blogs you have, the slower it gets to look for changes as it needs to connect to the blog, read, search for changes, disconnect, connect to the next blog, read, search for changes etc.
- It not only became slower, it also used a lot of resources (CPU and memory)
- There are different kinds of blogs e.g. WordPress, Blogger (Blogspot), WindowsLive etc. You would expect the RSS format is universal, but it isn't, so I ended up with different code for the different kinds of blogs.
- The package was quite sophisticated as it could recognise the type of blog, but I got into trouble when people started to use their own urls (.com)
So I did a complete redesign of the backend code and the flow in version 2 looks now like this:
In version 2 of APEXBlogs there's only one connection necessary to update all the blogs at once.
The reason is because Google Reader is in the middle now. Google Reader is basically doing the hard work. I setup a specific Google Reader account for APEXBlogs which now holds the blogs to aggregate. Now I just access the Google Reader API to retrieve the posts and search for changes and I merge that stream into the backend tables.
Where the synchronisation of APEXBlogs version 1 took a couple of minutes, the synchronisation in version 2 is done in a couple of seconds. Also the database resources used and the network traffic are decreased a lot. My code is a lot simpler as I only need to maintain one code base, the one for Google Reader.
So now you know how things work behind the scenes... tomorrow I'll focus on how I show the blogs in APEXBlogs v2.
Hello Dimitri,
ReplyDeleteUsing Google Reader is the right way to go. That's what I discovered back in 2006 when I set up OraNA.info especially when the number of blogs gets bigger.
Today, OraNA aggregates hundreds of blogs with ease because, as you said, the heavy lifting is done by Google Reader. It even aggregates APEX blogs :)
Cheers!
Dear Dimitri,
ReplyDeleteI like APEXBlogs and look forward to v2.
I am just wondering if you could write a blog about how to get Google Reader stream and merge that stream into the backend tables.
Thanks
David Ren