Entries in 'googlejuice' ↓

The Bumpy Rolling Out of Kaplak Stream – And What Not To Do To Piss Off Google

Kaplak is changing it’s course again. Since the inception of the first kaplak idea, we’ve come a long humbling way to only realize over and over again, how much we still have to learn. But slowly, we also realize what kind of knowhow we have and are building, and how Kaplak can help crack the problems and meet the challenges, which we set out to originally. Hence we also begin to understand what kind of value we add – and just as importantly, what we don’t add. Among many other things, this is key to learn what kind of business model we want to build – and, just as importantly, what kind of business we don’t want.

Let’s take a look at what happened with our traffic since the somewhat bumpy rolling-out of Kaplak Stream in 2008, from November 1st last year to February 1st this year :

The above is a screenshot from the Google Analytics Dashboard for Kaplak.com including subdomains. Following the launch of Kaplak Stream, sometime in November our traffic started to take off. Kaplak Stream basically consists of the present WordPress MU installation of which the Kaplak Blog is also part, along with a handful of customized plugins, of which the most important one is FeedWordPress. The idea (as sketched out in this previous blog post) is that items in the stream can be “fed out” from the stream again, which will reveal new contexts, which didn’t exist before. When two separate items which are both tagged “Barack Obama” are fed from the stream, they create a new “Barack Obama” context, even though the original items may have been produced and published in wildly different contexts.

The first installment of Kaplak Stream came with just about fifteen feeds, of which a handful were submitted by owners of niche websites. Others were feeds from sites such as YouTube, Amazon.com, Twitter (tracking particular subjects or keywords) and Boing Boing. Enough to provide the stream with some variety and “head” which would also test the autotagging performed by Open Calais via a modified version of Dan Grossman’s WordPress plugin.

Kaplak Stream managed to aggregate well over about 15.000 items, i.e. about 1000 items from each feed on average. Grossly more tweets than regular blog posts were aggregated, but posts attracted the greater amount of traffic, given that they worked much better with the autotagging functionality in place. Since they had more text, the tagging tended to be more precise – although some times tags were wildly misleading and out of place. Room for lots of improvement. Most, about 90-95% of all traffic came from search, notably Google. Visitors tended to not stay long, but quickly be on their way again. This could seem to suggest that only few found what they were looking for. However, reports also came in from feed owners, that our traffic managed to produce a meaningful sample of visits on the actual sites aggregated. This was really good news, as it suggests that a sample of our visitors actually found what they were looking for, or was curious enough to click through.

So what pulled the rise in traffic? No subject in particular, but the variety of subjects covered. What attracted users were more often than not pretty obscure pages and topics. For example, top result were the “tag page” for the tag “university-of-illinois-arctic-climate-research-center” with 641 views, and there was absolutely no recoginzable pattern in the rest of the more popular pages reached by visitors. I have not given our sample here substantial analysis, but my guess would be that there would be a neat power law graph, if one dotted in the number of visits to each page in Kaplak Stream and ranked them besides each other. But there is no discernable pattern as to what determined what aggregated items were more popular than others.

While some things seem to work, albeit still just barely, there are also problems. One of these is that apparently something happened on January 26th, which made our traffic drop drastically to before Kaplak Stream levels. Presumably this drop was caused by a Google penalty from duplicate content, which Google have been known to give websites which carry identical content across different domains. While Kaplak’s goals are somewhat aligned with Google’s, although not completely, I’m not unsure the penalty (if there was one) was not “right” in the sense that there were clearly limits to how informative and appropriate the search results which led visitors to our site, were. At least to justify the dramatically beneficial position we gained by aggregating just 15 feeds.

Another problem is the “noise” level, in our tagging, and in the combinations of feed items tagged with similar tags. Tags can be and mostly are very local. A post only remotely connected with a person and a piece which is solely about that person are usually tagged identically. My instinct tells me we need to use automated tools for what they are good for, and let filtering be more in the hands of expert users, in the contexts where it matters.

Clearly, more experiments are needed, and we need much more sustained analysis and methods to analyze our data. All this takes time and costs money. Right now Kaplak has no business model except what we can put into it of our own pockets (meaning mine) – and these are rapidly emptied. This means, for the time being, i.e. for several months now – and several months (and perhaps even years) ahead, I will not be able to work and develop Kaplak on full time. Thanks to the benevolence of our host, we can keep and continue to work on all Kaplak’s sites and projects, but we’ll make some changes which prepares us best to run Kaplak as a part-time operation.

We’ll convert the Kaplak setup to a setup more similar to that of the UMW Edublogs set up by Jim Groom at the University of Mary Washington. Among other things, this means we’ll focus more on building each smaller site in the network, and keep each site focused on it’s subject or theme. We’ll focus more on aggregating what happens within the Kaplak network of sites than what is going on outside the Kaplak WPMU install. We’ll still use aggregation tools to track very particular subjects, keywords and tags, but each different subject will be treated in a site of it’s own, to make things more manageable (it’s a mess cleaning up a large site based on aggregated items). In other words, we’ll run a network of small, very low-maintenance sites, and delay bigger experiments and improvements for a while. Meanwhile, Kaplak Stream will still be able to track tags across all sites and offer feeds from particular tags used in the network.

Reducing the amount of my time which goes into actual development of Kaplak also means I can focus better at building a new constellation of ressourceful people and (real) investors, which we will need to come back stronger with a revived Kaplak at a later time. This is what I hope to achieve, while I work simultanously on other things, making a living.

However, there is also a risk, that we don’t. That our ways may go in other directions. This is not necessarily all bad. See this video with Tim O’Reilly in a previous post to see why. I will try very hard to keep an open mind and attitude and not get stuck in ideas I ought better to leave behind. That said, I can’t see any companies or services which presently really cracks the problems we set out to – and this means we still need to fill that space, one way or the other. And more than anything, I can’t stay away.

The Grey Zone of Syndication

As I mentioned in an earlier post, syndicating stuff is also one huge grey area of legal hazzle. Stumbled over this discussion from a couple of years back (as well as this one), which airs not at all uncommon concerns. You risk being called a scraper, a spammer and a splogger, if you pursue the path of syndication.

Pariah S. Burke wrote :

RSS feeds are published for individual, private consumption; they are not a blanket license to, or waiver of, reprint rights. Taking and republshing content—no matter how much or how little—without the original author’s permission is a violation of U.S. and international Copyright laws. There are exceptions, of course, detailed in the Fair Use doctrine, but such exceptions are very specific and do not apply to the vast majority of sites using FeedWordPress, Autoblog, and the like. In fact, Charles Johnson, the creator of FeedWordPress is in constant and frequent violation of copyright law because the apparent majority of his blog’s content is stolen without the original authors’ permission.

In that case, Google, which enables users to very easily tag and share (i.e. republish) feeds they find interesting via their popular service Google Reader, is guilty of same said constant and frequent violation of copyright law, or at least, in willful and assisting infringement. The same of course goes for YouTube and any web service, which allows anyone to embed their videos, images and games on your own local site.

Who says a tool has to be used in one way only? Let’s get creative! That’s how problems are solved and new business models are developed!

Here’s another POV, from a guide on setting up an automatic blog which automatically generates a ’shitload of traffic’ and is ‘just about hands free’ :

To be honest, I’m not a big fan of people scraping content that people have sweated over. However, one thing I don’t mind doing is thieving from thieves.

You’re on the hunt for “disposable” content – generally not text based. Think along the lines of Flash games, funny videos, funny pictures, hypnomagical-optical-illusions – that kind of thing. The Internet is awash with blogs that showcase this stuff. Check out Google blogsearch and try a search like funny pictures blog. There’s hundreds of the leeching bastards showcasing other peoples pictures, videos, games and hypnomagical-optical-illusions for their website. They can hardly call it “their” content. With this ethical pebble tossed aside, we can go and grab some content.

There’s loads of ways you can hunt down potential content. You’re on the lookout for RSS feeds with this rich media. So you could try; Google Blogsearch, Technorati, MyBlogLog – basically any site that lets you search the blogosphere.

My personal point of view (this is also Kaplak’s stand) is that the problem of visibility for sites and products is larger than the largely fictional problem of “theft”. If you make syndicated feeds publicly available, you implicitly want and ask for syndication, because you want your message out. Syndication will help your site or product become visible in places and contexts it would not otherwise be seen in, and that’s why you use it and why you should use it. If you do not want your message out in other contexts and do not want to see your articles appear on other websites in a syndicated format, you can simply choose not to make articles available for syndication. The benefits however, in the Google Juice and traffic which syndication brings back to your sites and products, are in most cases much greater than the disadvantages.

Accusing syndication sites and services for theft and copyright infringement is IMHO ridiculous at best, as these services actually help your site become seen and achieve better rankings in search engines. It helps your interested readers and users find you in the first place. And if you don’t want to be read – why publish to the web?

At worst, these allegations are harmful, as they instill an atmosphere of fear and create distrust of using RSS, feeds and aggregation tools. Instead, we need to urge and encourage syndication and use of syndicated feeds, as it enables rich web contexts, which would otherwise not be possible, and makes it easier to direct interest and relevant traffic to sites and subjects of interest. It is above all a tool, which can be used for our mutual benefits – or for spamming and creating yet more “get rich quick” mentality kind of sites filled with stuff the world could care less about (but apparently doesn’t). I am of the opinion that these types of sites may provide their owners with short-term rewards, but ultimately will fade to authentic sites of much stronger lasting value. How to build lasting value, and help these sites and products build lasting value, is what we’re interested in here.

The Structure of Kaplak Stream : Our Goal

I’m in the process of setting up Kaplak Stream (working title), a project we (part) deliberately have been pretty silent about – at least in it’s deeper ramifications, even though we did touch upon the wider picture of feeds and aggregators recently, when I discussed Clay Shirky’s book Here Comes Everybody in a recent post.

Kaplak Stream is a network of websites, in fact, it is a network of Planet-like websites, each dedicated to a particular niche. Using automatically and semi-automatically fed RSS feeds as our vehicle, Kaplak Stream consiste of an ever-growing pile of niche websites, which all are part of our new WordPress MU install. These sites can be homegrown and consist of from just one to several articles, or they can be houses of RSS feeds, fed from our customers’ own sites and preferred services and related web sites of interest, which offer publicly accessible feeds.

The feeds from each subsite are then fed back into the main channel (the great “planet” site), as well as all the external sites, which tap whatever is interesting to them. We’ll also tap into the greater Kaplak Stream from the Kaplak Wiki, where pages will be fed relevant items based on categories and tags used.

Here’s an illustration of the feed traffic and link love created by Kaplak Stream :

What’s important is this network of niche sites help build context for the niche products offered by our customers. We aim to create very low-maintenance sites, which will help sell some of the “slim end of the long tail” products, we mean to help our customers sell.

These marginal products only sell the occasional copy, so each site cannot cost too much to maintain. This is where syndication comes into the picture. With syndicated sites, we can maintain rich contexts easily and we don’t need lots and lots of traffic for each site individually to pay the bills.

How does this help me sell my product?

So how do you sell with Kaplak Stream? You opt in for a site in the stream, free of charge, with a subject and RSS content of your own choosing. For now, your product must use an external affiliate program and a shopping cart provided by third party services. Products/widgets must also support a revenue sharing model, which shares revenue with publishers.

Each site is focused on one product or few related products only. The widgets for these can be placed at site-level in the sidebar. In this case, Kaplak will be an affiliate publisher of your product.

Alternatively, products may be sold at post-level, i.e. from widgets included in posts in a feed. For these sales, you (or anyone else responsible for the feed) will be the publisher. If unused, the sidebar will be utilized to sell another related product in the Kaplak household, if applicable, or house our usual ads and other stuff circulated among the sites. It’s also in this space we’ll begin to introduce our URLsale widgets when we get that far.

Once the site has been created, you can nurse it and cultivate it – or simply leave it alone and forget about it. Until it makes the occasional sale. A site can be a silent sleeper for years, until someone re-discovers it’s existance and makes a purchase. In Kaplak Stream, this is not a problem.

Only when your product makes a sale, do you earn a dime, which in turn is shared with the publisher. Making the sale is not the only benefit of using Kaplak Stream however. The greatest benefit may be the improved targeted visibility created by the linking activity in the stream. Feeds from Kaplak’s niche sites may easily be pulled back into niche sites everywhere, which adds context and value to these sites, to the advantage of their owners and communities. The links across the network and pingbacks in WordPress MU makes it easier to connect the dots between “separated” islands of niche contexts. Kaplak Stream could be the first step in our ‘making the world’s ends meet’.

As with everything we do, this project may be subject to change – any time. Much in the setup depends on further testing and development, particularly of the plugins we use.

To Disqus or not to Disqus

The pros and cons of commenting service Disqus

The waters are divided these days on the blog commenting service Disqus, which we’ve also installed here on the Kaplak Blog. Personally I was impressed with it when I first saw it on the How To Split The Atom blog, and decided it could do great work for the Kaplak Blog too. So when we moved the blog, it was a natural step to install their WordPress plugin.

What Disqus does is deliver a cross-blog and cross-platform commenting plugin for blogs, which hosts and connects comments, and feeds them back in different ways to the blogs. There are several great advantages from this ‘fragmentation of blog comments’, and so far about 4000 blogs (according to Disqus) think so too – and there are some apparent drawbacks, at least for time being.

I’ve been trying to gather the pros and cons of Disqus as it looks right now, and ultimately I am pretty undecided. Robin Good, blogger and new media reporter (who, among other things, did a remix of Steal This Film) sums the undecidedness up pretty well in this video :

To sum up as they’ve been put by Robin and others recently :

Pros

  • Users who comment on different blogs can easily find their comments again and organize their discussions.
  • Users are much more able to interact with other bloggers and commenters, independently of the blogs they comment on.
  • Bloggers can easily reply to comments via Disqus email, which saves a lot of ‘logging in/out’ hazzle if you receive many comments.
  • Discussions can be feeded easily from Disqus into other services, such as FriendFeed, drawing other people into following discussions and commenting.

Cons

  • Bloggers potentially lose out on the Google juice provided by comments, while Disqus gets the juice – at least if they use the JavaScript based plugin.
  • Bloggers potentially lose out on the income from ads, if too much commenting activity is moved from “their blog” to Disqus
  • No support for trackbacks or pingbacks, which is a pain, since these play a vital role in the blogging “if I link to you, you link to me too” ecology. Daniel Ha of Disqus says they’re working on something big in this department. One can’t help but wonder, though, if they foresaw what kind of a dealbreaker not including this to begin with could be?

You can find Kaplak’s Disqus Community page here. I’m curious to learn more, as I am still pretty undecided. All things balanced out, for now we keep Disqus on the blog – even though we might use a temporary hack to enable WordPress trackbacks. In my current estimate the social benefits and effects of using Disqus are greater than the Google juice we get from comments (we don’t get a lot of comments yet), although it is a difficult estimate, since we are a young blog and needs to attract readers. I guess it adds up to this : why can’t we have both the Google juice and the trackbacks, as well as the great social functionality and effects that Disqus can give us?

How does the balance look for you and your blog or commenting habits? What are the scores, advantages and benefits? What is the dealbreaker?