Posted by Morten Blaabjerg, April 6th, 2009 in Copyfight, Information filtering
Torrent index sites like The Pirate Bay are often compared to search engines such as Google in that both offer vast indexes of information, and both give easy access to unauthorized copies of copyrighted material.
One thing which surfaced during the Pirate Bay trial in late February was IFPI’s cooperation with Google and other search services in their battles against copyright infringement. When IFPI’s representative John Kennedy was asked why they sued The Pirate Bay and not Google (as in “or any other major information filtering service using the internet”), the answer was that Google cooperated, and The Pirate Bay didn’t :
When asked about the differences between TPB and Google, Kennedy said there is no comparison. “We talk to Google all the time about preventing piracy. If you go to Google and type in Coldplay you get 40 million results – press stories, legal Coldplay music, review, appraisals of concerts/records. If you go to Pirate Bay you will get less than 1000 results, all of which give you access to illegal music or videos. Unfortunately The Pirate Bay does what it says in its description and its main aim is to make available unauthorized material. It filters fake material, it authorizes, it induces.”
(…) Kennedy was asked why they haven’t sued Google the same way as TPB. He said that Google said they would partner IFPI in fighting piracy and he has a team of 10 people working with Google every day, and if Google hadn’t announced they were a partner, IFPI would have sued them too.
I think the truth of the matter is, that Google’s business is based on copyright infringement from the start. When Brin and Page started Google, they started by downloading the entire internet and offering their index of it online. In the words of Larry Page himself, in David Vise’s The Google Story :
Google was started when Sergey and I were Ph.D. students at Stanford University in computer science, and we didn’t know exactly what we wanted to do. I got this crazy idea that I was going to download the entire Web onto my computer. I told my advisor that it would only take a week. After about a year or so, I had some portion of it.
In order to offer Google’s search of their index to the world, they had to keep all the internet’s content on their own servers, otherwise their results wouldn’t be very fast. Did they ask every single website owner or administrator for permission to use said material? No. Did they need to? No, in fact they couldn’t. That would have been prohibitive for what they were doing. The cost alone of asking would have been prohibitive for what Google was doing, if they even knew themselves, what they were doing.
However, was what they did beneficial to the world? Yes, one may very well say so, to a degree that Google is now a hugely successful business whose operations span the globe and benefit millions, if not billions of people on a daily business. What Google did was transformative, defining of the internet. It defined the web.
What Google added was their filtering index of the web. On their servers, the content of sites were analyzed and ranked according to PageRank, an algorithm which rewards sites which are greatly linked to with a better placement in search results than sites which have generated fewer links.
But for this to work they needed the data to work with. Google has done a lot to give users the impression, than when one is using their core product (search), it appears like one has instant access to all of the World Wide Web. This is a brilliant illusion, but no matter how good it is, one is still only surfing around on Google’s own servers, which store Terabyte after Terabyte of unauthorized copies of copyrighted material. The fact remains, that Google took this data, without asking anyone for permission. Perhaps they didn’t need to, perhaps they didn’t deem it necessary. What Google did was one of the greatest things that could have happened to the web at the time, and what everyone else involved in the search industry was doing. Throwing around data without paying any kind of homage to copyright owners. To the great benefit of everyone of us today, most will say.
What The Pirate Bay and other sites are doing today – is no less transformative. But they’re not cooperating.
What happened since Google introduced their filters to the world was that the “war on piracy” became greatly intensified. Napster and peer-to-peer networks threatened the monopolies of first the record industry, since the Hollywood-based entertainment industry. Google and other services which offer online metadata – i.e. access to “other people’s” information via the internet, got trapped in that battle. Some felt they had to choose sides. And most chose to cooperate with the entertainment industries – over what was right or true or just. Whether this line of business was born out of the pragmatism of doing “business” and avoid expensive law suits or out of a mission to “do no evil” doesn’t matter. Google and likeminded companies will do a lot to cover up the fact that what they are doing is based on massive copyright infringement – including cooperating with IFPI to filter online information – every day. Which in my humble opinion is very creepy.
I say this as a big fan of Google, as a daily user of countless Google products, which I would hate to live without.
It’s a pretty good fraud. Cooperate with IFPI and other copyright holders to only slightly cover up the fact, that the whole thing is based on copying other people’s material. Blur the distinctions to the extent that it even confuses the courts as to what they should believe. What is really the difference between Google and similar search filters and a service such as The Pirate Bay? Both store and provide access to metadata. But while the first stores everything on their own servers, from where they provide access to local sites and material – The Pirate Bay and others employ a superior technology, which offers nothing but hyperlinks directly to material stored on their users’ own machines. So why should The Pirate Bay lose the case which is going on right now in Sweden? Because they do not cooperate. They do not care about anyone’s material. What they’re interested in is developing a new technology to the benefit of all of us. They do what Google did in 1998, except they do not commit any copyright infringement at all.
On a curious note, Google also ranks web sites according to how “unique” their contents are. This means, that if you run an aggregation site, i.e. a site which harvests and provides access to the content of other web sites – just like Google did, and still do – Google assigns you penalty points, and your site will be harder to find using Google’s search. Your site will rank lower, if you do what Google does : copy the content of other websites.
What’s really scary however is the degree to which we rely on proprietary filtering services such as Google’s search, which are influenced by interests we don’t know about. Google presents itself as an almost universally neutral service, which can give us an instant answer to almost every problem we face. The truth is, Google is in fact a highly weighted information filtering service, which is influenced by the special interests of organizations such as IFPI, on no legal grounds except what pleases and what not pleases Google and is completely dependant on their choice to cooperate. We don’t know what other special interests Google chooses to cooperate with, and we have absolutely nothing to say as to whether they do and how they let their search results be influenced by them. I can only conclude, that while a few young people in Sweden are willing to stand up for our freedom of speech (for this is what I consider the “freedom to link” to be) – it is shameful to realize again and again, that the world’s information filtering superpower is not.
In my view there is no other way out of this misery than to create and help build new sets of truly de-centralized information filtering tools and services, which are based on free software, which cannot be influenced, manipulated or dominated by any particular third party. Tools which enable better, faster and more precise connections between someone who wants a message or query out – and those who wish to receive and answer it. We’re still throwing around rocks in our information stone age when playing with proprietary services and tools such as Twitter, YouTube and the many many others we use on a daily basis.
Tags : copyright, David Vise, free software, Google, Larry Page, Pirate Bay, search
Posted by Morten Blaabjerg, December 18th, 2008 in Copyfight
I was called up on the phone today by a guy from Jyllands-Posten, which is, as some will know, a major Danish newspaper. The guy in the other end told me they’ve just converted to a tabloid format and wanted to sell me and Kaplak a subscription for one year, and he would give me a free laptop too, if I’d take his offer. I thought the offer sounded a bit suspicious but didn’t decline, as I thought a free laptop always might come in handy. So I said I couldn’t decide right now, and went online to search for more information on this offer using Google.
I didn’t find much on the laptop offer. Instead, this turned out to be an interesting case of using search and of what one finds when looking for something else. I stumbled over this article in Jyllands-Posten (now also quoted in this space), which I found sufficiently interesting to spend a few minutes hunting down the object of the article, this video, which was first posted to YouTube, but then taken down by the service :
The story goes like this : Bilka, a large Danish chain of supermarket stores sells computers. One of their customers had an unusual agenda. Unnoticed he used a demo model of a laptop which was showcased in one of their stores (located in Holstebro) to play a porn movie for customers which happened to be passing by. Meanwhile their reactions were captured with the laptop’s built-in webcam. Apparently the plan was put into action using a USB stick to get the movie onto the showcased laptop. Some way the perpetrator managed to get the footage of customers’ reactions edited and uploaded to YouTube, from where it was later removed (by YouTube). It is now an entry in the collection of YouTomb, a MIT study dedicated to takedown patterns on YouTube (and other online services).
Fortunately, someone also uploaded this obvious case of the consumer-producer convergence to other online spaces, from where it may be seen and be redistributed. The days have passed when YouTube could take down a video and then it would cease to exist on the internet.
This story is a good case of PageRank which generates activity for all the wrong reasons. The video didn’t appear in Google’s own video results, for obvious reasons. YouTube took down the video so they wouldn’t include it, and Google’s video search is not (yet) very good at locating video from third party sources. Clearly Bilka doesn’t like to get all this attention from a story like this and have likely been trying to shut it down. And I wasn’t even looking for something about Bilka or for cases like this. So why did I find it?
Thanks to PageRank, it is easy to find these kinds of cases, as they are typically linked to from a number of places. PageRank also makes it difficult to find stuff, because these kind of stories prioritized by PageRank are deeply irrelevant to the information I was seeking : my original search for something on the laptop offer. I didn’t find what I was really looking for. But in this case, this wasn’t important enough for me to not be easily swayed from my way.
To find the story, the only keywords I had to use was “Jyllands-Posten” and “laptop”, originally searching for something on the laptop offer I received. Subsequently, after I found the article from the paper’s website, I tracked down the video in question, mostly out of pure stubbornness and refusal to let YouTube decide what’s good for me. It also seems strange to me to have a news story on the internet about a video, but not display the video. I wanted to see it for myself. In order to find it, I looked for “video” with the other keywords “Bilka”, “porno” (Danish for “porn”) and “Holstebro”, and dug up the title it had on YouTube before the takedown. After the takedown, someone posted the title of the video, “Electronic Harassment #1 – Porno on Laptop” along with the YouTube user name of the user who uploaded it. After that it was easy to locate it somewhere else. I was lucky it still carried the same title.
I can’t help but find this story incredibly funny. In all it’s comical simplicity, pulling this stunt showcases the shift in power, voice and authority, which distributed computing and online media enables – from large respectable companies, channels and filters to every one of us, independent of filters, disrespectfully engaging, limits imposed only by the audacity of our creativity. Let’s continue our work to find and build filters, which are independent of YouTube, Facebook and other such services, which so ridiculously lie flat on their stomachs for yesterday’s norms and masters. Which have so little concern for the individual voice of experimental producers, that it’s just sickening. And let’s spread stuff like this wide and far, to let executives everywhere know, that we know, that their unquestioned power is about to end – if it hasn’t already. “Everybody fucks”.
Tags : Bilka, consumer-producer convergence, electronic harassment, Jyllandsposten, laptop, metacafe, search, video, YouTube
Posted by Morten Blaabjerg, October 28th, 2008 in Kaplak on the web
Sometimes I prefer to visualize an idea using nothing else but notepad – or preferably just pen and paper, whatever I have in front of me. The ‘back of the napkin‘ philosophy fits well with me. In fact when I tidy up old stacks of paper once in a while, I always find sketched down ideas on the back of envelopes and in impossible places such as the backside of letters from the tax office. Do I archive it under that particular idea and project – or does it go into the tax papers stack?
The Kaplak Stream napkin model
Here’s an updated napkin model for Kaplak Stream which I recently created in Notepad :
This model shows the very basic idea of Kaplak Stream. The Arts and History websites are different sites, but have some tags or categories in common, such as ‘knights’ and ‘romantic’. But each site has no way of knowing about this; they may not even be aware of the other site’s existance. They’re separate systems, islands of information. A visitor clicking on a tag on the Arts site won’t see the items tagged the same on the History site. Now, when the feeds of both sites are fed into the Kaplak Stream, it allows new types of long tail sites to be created.
By pooling our feeds, we allow new contexts to be created. This can happen when feeds are extracted from the stream for particular tags or categories. When feeds are pooled, even tags and categories that are not used a lot on an individual website, may spawn new rich web contexts, which are capable of sending traffic back to the original publishers, but, what is more important, enable the distribution of products (via affiliate models) which are otherwise hard to sell in a mainstream context.
In this case a Knights site and a Romantic site can be easily created. Neither of these new sites could exist within the History or the Arts sites, but because we pool and channel the information from a wider range of sources, they can now.
Here’s the expanded version of the above model (which is also an improvement over the model, I previously posted on Kaplak Blog) :
As this model shows, linking back to feed publishers for increased visibility of their sites and contexts is a key feature of the network. Submit your feed and gain greater visibility, because more sites “on the way” will link back to your site. This is key for publishers to actually want in and be part of what we’re doing. However, this is just the short-term benefits.
Connecting the disconnected
When feeds are extracted from Kaplak Stream and into other niche contexts, publishers will connect more easily with these contexts and communities, empowering both publishers and communities, who would otherwise not know each other. Anything may arise from these new connections : meetups, exchange of ideas, products, etc. It is in this new context, that the sales of niche products are more easily arranged, probably most likely and easily via the use of affiliate programs.
As we have previously learned, attributing value to the context of finding information, rather than to any particular piece of information, is the more effective route to Kaplak’s goal, in an environment such as the web which literally explodes with new information every day. Creating very finely segmented sites will enable passionate users to more easily reference interesting niche material, i.e. create recommendations socially for interesting information items as well as products sold in these niche domains. Simply because there are now rich niche domains and contexts, which will be worthwhile the link, contrary to the situation before the aggregation and filtering, where the niche items were spread out all over the web – and very difficult and timeconsuming to find using search, bookmarking services, Wikipedia, StumbleUpon or Digg-type sites.
With time, some of these new niche sites and contexts may connect otherwise disconnected communities with each other and possibly even grow their own small communities, which will enrich those contexts even further with valuable context. The value of these new contexts do not depend on the short-term Google juice of linking back to sources I mentioned earlier. Instead, it thrives and builds on the social connections and recommendations, which now can rest on increasingly more bonified points of reference – and (probably with time) even greater tools for sharing than what we have right now.
What’s important for this project to succeed is to tag/categorize incoming items conveniently and precisely. We’ll continue to work and experiment with autotagging, but the best bet is (with time) to make tagging a social proces which can take place for each item all the way of it’s ‘journey’. For the time being however, we rely heavily on feed items being richly tagged by their source publishers. This is one challenge, we face right now.
Because it’s so critical to what we do to thoroughly understand what’s at stake, it’s also vital that we invite input every step of the way. If nothing else we want to give you the opportunity to read, think and absorb our ideas, and go out and implement your own tools and architectures – for every step of our way. And when you’ve done that – come back and tell us about it. We’d love to learn more.
We have yet to setup proper forms for receiving feed submissions, but we’ve begun to receive them anyway. For the time being, please submit your feeds to The Kaplak Team or directly to me via Twitter or Identi.ca. Remember to give us a few keywords on the contents of your feed (just the most important ones).
Tags : back of the napkin, community, feed imports, Kaplak Stream, long tail, peer production, rss, search, tagging
Posted by Morten Blaabjerg, July 17th, 2008 in Kaplak on the web, The mainstream problem, What is kaplak?
This early sketch illustrates how a product/widget from a niche producer is made visible in a niche context somewhere else on the web :
A web user and niche producer (A) encounters a Kaplak widget on a website, he knows and trusts (B). The producer finds Kaplak can be used to distribute a product of his own. He decides to sign up, and subsequently uploads a product and submits basic product information.
The Kaplak interface (C) spits out a widget a.k.a. a “kaplaklink” for the product. The widget is also published to the Kaplak market network, from where it may be fed via RSS or other means, to subscribers within particular channels or categories.
A website-owner (D) run what we may term a “filtersite” (E). D feeds or filters widgets from the Kaplak network from a range of categories or tags, in order to capitalize on sales, i.e. earn a share of kaplak from each sale made on E. His motive is primarily of commercial character. Among the widgets filtered is the widget for A’s new product.
In order to avoid what we term the mainstream problem, i.e. that just a handful of “hits” are prominently displayed and amplified, Kaplak depends on filtering sites of all kinds, i.e. index websites which seek to filter Kaplak’s feeds according to particular specialized interests or criteria. We have a lot of this kind of websites in the online landscape today, many of which are financed by advertising. Kaplak will offer one more type of income for index type sites, and one which may allow a sharper edge in filtering, because the size of income streams may not always be proportional with the amount of traffic generated by a site. A large site may suffer from greater problems in making the “slim end of the long tail” presentable, than a smaller and more well-defined niche-friendly site will. Both may be filtering sites, though, basically performing the same task of feeding and filtering.
The widget from A on D’s site is now discovered by (F), who puts the link into her blog, because she finds that the product is interesting and relevant to the article she’s about to publish. F’s blog is visited by a much more select crowd than D’s site, who rely mainly on search as a source of traffic. F gains a lot of attention through a social networking site popular within her field of expertise (G). Motives here weighs more heavily towards the professional, contextual, idealist side than the money side. F earns a fair share from her Kaplak widgets though, as her choice in widgets is much more finetuned to her readers, than the bulk filtering of D, which earns from a few sales of a lot of products (the “pure” Chris Anderson model).
Finally, a friend from G alerts another friend, who happens to be the owner of a nichesite (H), which deals particularly with A’s subject and finds the new product intensely interesting. The regulars of H knows the deal and can instantly see the value of A’s product. A’s product finds a potential market here, he otherwise wouldn’t have found.
None of H’s users would have discovered A’s product without Kaplak, even if it was accessible via Google or filesharing networks. First, none of them would know about the project. Had one of them actively searched for the product, she would have had to pick very delicate keywords, endure the timeconsuming process of browsing search results to page 7 or 8, only to discover a dead link to a torrent, which may have been alive and kicking, but of which there are no seeders.
The owner of website H publishes A’s widget from both professional and financial motives. The professional, interested motives weighs in the heaviest, but since the site engages A’s target group, the collective sales pays off decently in kaplak, which contribute to financing the site. H’s traffic may be slight – if the group of “regulars” is sufficiently interested and the price right, then H need not care greatly about the amount of traffic.
The producer A expands his market with H’s users and anyone who made a transaction along the widget’s “route”, who wouldn’t otherwise know about the product. The process repeats itself, this time with one of H’s users in the role as producer A, who discovers she may use Kaplak to distribute one of her own products. This process happens across Kaplak’s entire global network, with the intensity dependant on the demand for the products offered by users, and on the ease or difficulty by which a product/widget can gain an entrance into the niche environments and markets “in the other end”.
The sketch illustrates what Kaplak’s primary product is. As we’re on the web, all sites and actors in the above diagram are accessible to everyone all the time, from anywhere they may be situated in the world. The problem is knowing the product exists and next, to find where it is. Search engines such as Google and others offer one model, filesharing index sites such as The Pirate Bay and others offer another. Both however, are primarily based on active search for information, from the buyer’s end.
Kaplak offers a third model, which brings the product to the target group, through the web services and communities the target group uses every day. When Kaplak works, web users will find interesting links/widgets on sites and services they regularly visit and trust, before they even know they want the particular product – and long before anyone even thought of using Google or something else to go look for it. Finally, the Kaplak model can be fully financed by the market, which is opened up, rather than rely on upfront payments from our niche producer, before he or she knows if there is a market.
Tags : blog, filesharing, Google, long tail, Pirate Bay, search, visibility
Posted by Morten Blaabjerg, July 12th, 2008 in Copyfight, Kaplak on the web, What is kaplak?
Over the next handful of articles I’m going to dive into what Kaplak is and how it works, as far as I can at the present time. This first article is a slightly modified re-run of the background article from our old main site :
Originally, kaplak is an old maritime judicial term of Dutch origin. For bringing a shipment of stores safely to port, a skipper could be paid a bonus, i.e. káplak, calculated as a percentage of the shipment’s value. This served as financial compensation for the risks taken and hazards overcome at sea. Káplak literally means ‘fabric for a cap’, with a reference to the incentive it provided to stay on deck even in bad weather.
The internet is like an ocean, travelled by data packages. It is happening all the time, everywhere, at the same time. It is a global network of instant communication, of conversations, information and knowledge. Of human experience, artworks and products in all kinds and forms. As long as it can be digitized, i.e. made understandable and transportable by computers and cables, it can be made accessible on the internet.
In a global world of ‘unlimited shelf space’, as Chris Anderson coined it, there’s a market even for products on the very slim end of the long tail. If you can approach your market precisely enough, using the internet, you’ll be able to reach the unknown destinations, which will make your product meet it’s niche customers. This is one of the great promises of the internet, but it doesn’t come without problems.
How do you get noticed? – and more importantly, noticed by your target audience, on an internet which grows by millions of new websites alone every month?
How do you get paid? How do you get safe and fast transfers of your digital goods and digital money, which will allow you to keep doing what you do best, without the hazzle of setting up and running your own ebusiness and marketing networks?
The World Wide Web alone grew by a staggering 4.4 million websites from april to may 2007, and this number is increasing. Paradoxically, while all this information is made available and accessible all the time, to everyone, at the same time, it also makes it difficult to find a particular piece of information, if you don’t know where to look. We come to depend on recommendations, from people and companies we trust, to find what we’re looking for. Search engines deliver such recommendations. Your friends, colleagues and social networks provide others.
One method of communicating our preferences and recommendations is to create hyperlinks on the World Wide Web, which points others to interesting files, information and communities. As the amount of hyperlinks on the internet increase, however, we also need methods to filter the hyperlinks; to select certain criteria for collecting, ordering and presenting them.
At Kaplak, we don’t believe in re-inventing the wheel. Search engines and web indexes are doing great jobs at filtering information, answering queries and creating visibility on the World Wide Web. But we recognize a few significant problems with search as the only method of filtering and finding information.
In order to search for something, you need to know what you’re looking for, at least generally. You need to be motivated enough to take your time to use a search engine, type in your query and sort your results according to your preferences. For some queries and products, this process can take hours, as the most interesting results (typically niche-oriented results) remain buried deep down the results pages. And of course, you can’t search for information or products you don’t know about.
Even peer-to-peer filesharing technologies such as bittorrent, which otherwise holds great promises, has difficulty tackling files with less-than-mainstream interest. One has to be something of a hero to keep one’s bittorrent client open all night, in order to seed one’s work for the lone leecher which stumbles upon it by chance.
A large amount of information and products remains unseen by their potential customers and markets. You come to depend on marketing agencies and banner advertisements in order to be seen. Most marketing schemes however, are not precise enough to reach very delicate groups and environments. And you need to have established your business model, in order to use them.
Making your ends meet
Cheaper hardware, internet connections and free software make it economically feasible today for almost anyone to create a business model using the internet. This has so far led to a tremendous growth of thriving webbased businesses, whose economical and social ramifications have possibly not yet been fully understood or recognized.
Business models on the web, however, have mostly been thought in terms of luring customers away from whatever they were otherwise doing on the web, into ‘visiting’ a specific website. This website typically offers particular ‘webshop’ software, handling inventory presentation and customer monetary transactions. Alternatively, the website offers all its contents for free, relying instead on income from advertisments, of which some of the least intrusive are the popular text ads from Google and others.
In either case, if you want to sell something using the web, you’ve also been left with the task of maintaining a website and administrating online transactions, taking time from what you do best; creating new products. If you’re successful, you soon face the choice of hiring help to administrate your growing online business, or cut back on the hours spent creating products. This makes you a manager, which is great, if this is what you want, but not so great, if you want to focus on creating and working within your field of expertise.
If you sell very little or receive only slight traffic, none of this is feasible. Your time will be spent optimizing your website, and your traffic will be too insignificant to bring you any income from your advertisements. Perhaps you will be tempted to make your products more ‘mainstream’ to attract more customers, in order to make an income from your ads. If you receive great amounts of traffic, but still sell very little or otherwise fail to monetize your traffic, you will be hit with bandwidth and bottleneck problems too.
So, apart from tools which help your products ‘be seen’ by your target customers, as a niche producer you also need tools, which gives you an income, but without the time consumption needed to necessarily run your own webshop. At the same time, it can’t hurt if your product can help others finance their websites and internet businesses.
We’re cultural niche producers ourselves. We know what it means to make a living on the slim end of the long tail. Kaplak was launched, when we realized, that no other market or non-market actors today on the internet seemed to offer distribution tools, which could help us meet our present challenges. Sure, there are distribution tools if you want to give away your work for free, but none which solves your problem at the core : making money while doing what you do best.
As niche producers, our products have often targeted audiences and markets, which are so slim, that setting up and running a website and ebusiness, along with ads or other methods required to market and sell, is impractical and often deemed inefficient and unprofitable from the very beginning.
Kaplak is a tool which will seek to remedy these problems for our customers. What Kaplak is about, is creating economically sound distribution methods and tools for these kinds of products, which may not sell much, but still do find their markets.
How it works
Using Kaplak can be boiled down to these three steps :
1. Provide your product (or a link to it) and a few details of information.
2. Pick your price.
3. Determine how much of your earnings you’re willing to part with in Kaplak.
Kaplak will then spit out a widget, i.e. a small piece of code, which can easily be inserted on a website. You can use the widget yourself, on your own website, and you can distribute it to others. You can even just leave it on the Kaplak network for others to find it and redistribute it, if and when, your product is in demand.
Your product is made visible and sold by local “skippers” (i.e. website owners, admins, forum visitors etc.) on the niche websites and networks your potential customers use. They help bring your product safely to harbour, across the oceans of the internet, and in turn earn their share of Kaplak. Your product helps them finance their work,
while you sell your product in a place, you wouldn’t otherwise have reached.
You don’t need ads for your product sprinkled all over the internet or on mainstream media websites, visited by masses of people, who could care less about your not-so-mainstream product. What you need is well-placed and precise recommendations in those niche environments and web communities, your customers visit.
Company and financing
Kaplak is owned and developed by Morten Blaabjerg. A number of partners have acquired warrants for b-shares in Kaplak, including our hosting partner MC Solutions.
Kaplak’s first goals are :
1. To present a public online platform, which presents the project and invites initial customers and collaborators.
2. To create a company capable of building a first, early version of our service and sell this to our first customers.
3. To document this process and generate income streams to finance further development.
4. To create a publicly accessible workspace in the form of a wiki. The Kaplak Wiki will host our growing information base and invite participation from all interested in developing Kaplak.
5. To present a thorough second edition of the Kaplak business plan aimed at venture capital, and spend at least 10% of our time to actively develop and sustain durable investor relations.
Please sign up, if you may be interested in Kaplak as a future user and customer, or simply would like to know more, follow our demos and our online events. We will be happy for your support. It helps us, that we can tell our investors, that we have interested customers waiting. We’d also like to ask you to take our online surveys, when we get around to that. We believe we can create a product, which is most useful to you as a niche producer or consumer, by inviting your input and participation to the process, at a very early stage.
We also welcome you to follow our blog, which is also available via RSS. Our RSS feed makes it possible for you to post the latest Kaplak headlines on your own website, blog or online profile, to tell others about this project, or simply enjoy our latest articles with your favourite RSS reader.
Kaplak issues warrants for shares in Kaplak to interested parties. Please contact us for further information, if you are interested in joining Kaplak as an investor. We’ll be happy to help you with further details.
Tags : long tail, search
Posted by Morten Blaabjerg, June 30th, 2008 in Information filtering, Kaplak on the web
I’ve previously written about the merits of attributing value to the context of finding information, rather than on any particular piece of information. This makes sense in an environment which literally explodes with new information, and shows no signs it’s gonna stop in any foreseeable future.
Google seems to think so too. After all, this is what Google do, and do really well. But it’s true no less of a somewhat overlooked product of Google’s. I’m talking about Google’s Custom Search. This service allows anyone to composit their own search engine, and place it on their own website. More accurately, your custom search engine filters Google’s index of webpages. Say you want a search engine on your site about your niche subject only to return results which relates to your site. It’s simple : type in your site name, and allow Google to show results from your site as well as all the sites your site links to. Or you can be even more specific, or list a range of sites you want results to be taken from. Or you’d like Google to still show results from the web, but emphasize results from your own site – this is also easily doable.
The only problem so far with Google’s Custom Search has been on the one hand that Google’s crawlers don’t seem to index every website too tightly and too frequently, and on the other, that results are still based on PageRank. Say you want your users to find a great piece on your blog about a particular subject, when they search for that subject, but that piece isn’t greatly linked to by other sites or articles. Chances are, that Custom Search will show a largely irrelevant, but greatly linked to article from another site, or simply not show that post at all, if it hasn’t been properly indexed. Your built-in blog search, such as WordPress’ search, will find that article very fast, because it searches your database directly. For smaller sites, local search as we know it, is still much more effective.
However, as sites grow and we as internet users and bloggers spread our activities over many sites and platforms, platform-specific search is too limited. We begin to look for more tailormade solutions. Google’s Custom Search is one, but there are others who want a piece of the action.
New kid on the block
Lijit is an internet startup based in Boulder, Colorado, which offers a promising version of “local” or “contextualized search”, which searches one’s blog, “content” (on sites such as YouTube, Flickr and many others) and the network of sites and “friends” your online activities connect you to. We’ve already created a Kaplak search engine powered by Lijit, and the Lijit widget is featured in the outer right column on this blog. I think Lijit could potentially be a very useful addition to the Kaplak toolbox. I plan to expand this search engine with further feeds and sites as our network and activities grow.
When I first tried Lijit, I wasn’t satisfied with the search results. I searched for a direct title in one of our blog posts, and it didn’t come up. As the impatient web customer I am, not hesitant to make a fuss about my problems with a free online service – on another free online service, I posted my quibbles on Twitter. It turns out, Lijit is on Twitter too, and so is Micah Baldwin, who works for Lijit and took time out to answer my quibbles.
It turned out Lijit based their first version on Google’s Custom Search, while developing their own web crawler. Switching Kaplak’s search to Lijit’s own crawler was a huge improvement from Google’s occasional crawl, and made me look much more enthusiastically at what this small team of extremely talented people are doing. I take my hat off for a company which acts so swiftly in response to “customer” sentiments, and make it a priority to help their users along with such friendliness. There are a lot of companies who could learn so much from Lijit. Micah and Lijit gives the expression “listening to the groundswell” a whole new meaning.
I like the freshness of Lijit and I like the results after being switched to their own crawler. I have only a few quibbles with it now. It’s got what I’d call some weaknesses in the versatility department, because I can’t control and finetune texts, messages and included sites/webpages as much as I’d like to and was quickly getting accustomed to in my short period of experience using Google’s Custom Search. For instance, I found all of my del.icio.us network automatically included in the search engine, where I’d like the opportunity to handpick whose links got to be included. Lijit’s search engine also wants to categorize results very neatly into “my blog” (even though the Kaplak Blog is not precisely “mine” – it’s the company blog and maintained by me, but not “mine”), “my content” and “my network”. What if we (which we’re probably going to) put the widget on our wiki? – that’s not exactly “mine” either. Our Kaplak universe is not so neatly organized, and while I do like the “Lijit picks” category, I prefer being able to scrap all categorization schemes altogether, get our own adsense stuff on the search results and just get on with finetuning and putting in more sites and feeds to give our visitors the best possible experience.
Lijit can potentially be a great key to tying together the many different platforms we operate on in Kaplak – and one we’d even pay for, if they included premium options we needed. As a company, we still do need search, and if Lijit could potentially even crawl user and product profile pages on our later-upcoming Kaplak Marketplace, we’d have something here, which we’d probably like to pay good human money for.
You can find most of my conversation with Micah via Summize, an online service which has built a search engine on top of Twitter, searching conversations on Twitter in realtime.
Imagine a service which have taken upon itself the daunting task of searching all things on Twitter instantly and is capable of threading and translating posts to and from numerous languages – globally. Then you have Summize.
Using Twitter a lot these last few months, I’ve found Summize indispensible to keep track of tweets, users and subjects. I’ve also used it for market research, i.e. “listening” to what other users are twittering. I find this stuff utterly incredible. There’s a lot of things happening in the search business these days.
I’m sure this is only the beginning.
[EDIT : Twitter's acquisition of Summize has broken the above link to the Summize search with my conversation with Micah. Here's a similar search on the new http://search.twitter.com which supposedly replaces Summize...]
Tags : contextualized search, delicious, Google, groundswell, Lijit, Micah Baldwin, search, speed, summize, Tara Anderson, Twitter
Posted by Morten Blaabjerg, April 16th, 2008 in What is kaplak?
A local department of a political party of which I am a member (never mind which party) had a discussion rolling some time ago concerning spam mail, which led our web editor Rasmus Larsen to ask a few questions concerning Kaplak :
Let’s say that I am organizing an independently financed political forum on the web, with a range of interesting articles by a mixed group of connected people – some writing small newspaper pieces, others longer dissertations. [...] The purpose is not to generate a profit, but 1) to create attention around the webforum as a supplier of meaningful political articles, and 2) to inspire and influence the activities of the target groups, as a kind of prolonged think tank activity.
If there’s an article which supply something innovative on integration policies, it needs to get out in some way to all relevant people who’re already occupied with the subject and are active online. It could be high profile debaters and thinkers, people within different political parties, which leads particular workgroups, certain students and researchers at universities etc.
Now I face these challenges :
- How do I get this article out to the target groups described here, without a firm grasp of who and where they are?
- How do I make sure, they won’t consider it spam or unimportant?
- How do I get it out to the target groups, which I haven’t even considered exists?
These questions hit the nail on a crucial challenge also for Kaplak : “Search” pre-conditions a pre-knowledge, a core of conscious information, which makes someone able to search for something. How do we reach the other someones, who are interested in what we do, without knowing who they are, and they not knowing who we are?
The answer is deceptively simple, yet incredibly hard work. The answer is hyperlinks. Most people don’t realize how important they are. A search engine, for instance, is really nothing but a very advanced index of hyperlinks and hyperlinked webpages. So to be visible for the someones who do search for you, if they know who you are or what your “product” is – let’s imagine you manage to get that information to them by some other means – you have to build a strong interlinked system of hyperlinks, pointing to your site from related sites, networks, communities, blogs etc., which will make search engines better pick up your site and rate it correctly and appropriately.
You can use special techniques often referred to as Search Engine Optimization (SEO) to optimize your visibility for people who search for you on the web. But your efforts will be most efficient, if 1) your target group know beforehand that you exist or already are looking for what you offer, or 2) you can define precisely or near precisely, who your target group is and what they will search for.
The bottom line is still, however, links, links, links. A well placed link in a good spot will direct the right kind of people to your product or message.
So what is ‘a good spot’? I’ll discuss this in a second.
Concerning the questions about how recipients won’t consider your message or product ‘spam’, and how to reach groups you haven’t considered existed, I’d like to flip the questions around a bit. In other words, what will recipients consider spam? And how do you reach people, who haven’t even considered your existence?
To the first, clearly most people consider unsolicited mail spam. Information they seek out themselves, are motivated for and have accepted to receive is not spam. If you want your message out, you have to find a way to reach people where this is the case or seems to be the case. The second point has really the same answer. In order to reach people, who don’t know you exist, you have to create a situation, where they seek you out themselves. So how do you do this?
You find out what the good spots are. Get your hyperlinks to the places where you find people who are potentially interested in your message and won’t consider it irrelevant.
At first, this may seem like an impossible task, especially considering the enormous growth of the web. But if one reconsiders, there’s good reason to praise this further atomization of the internet. The growth of the number of sites in the world also means, that there’s a much greater variety and finer segmenting of target groups. If you can define them right and precisely enough, that is. Segments can be anything on the web which has an audience or some point of communication. They can be website communities, but need not be. They need even not be on the web – they could also be mailing lists, intranets or darknets, i.e. closed p2p networks. A segment can be as small as the group of friends around a Facebook profile, or as large as Barack Obamas following on Twitter.
To describe and predict segments like these are exceedingly and increasingly difficult to do from any central point of view (although any SEO will certainly try). This is why we need to utilize local forces and filters (by means of “peer production”), which will help decide for us, what kind of segments will find what pieces of information. Local peers have what we don’t : the expertise in knowing their communities and segments much more precisely than we do. Connecting with these mediators is how we find ‘the good spots’.
In a way, this is what is already happening now all over the internet. An atomization followed by more precise forms of segmenting and reaching audiences and markets. There are a lot of affiliate programs and products such as Google AdWords/AdSense which help mediators make these connections. But in Kaplak we don’t see really efficient solutions which helps the producers on the very slim end of the long tail, because these customers are not really the concern of most market players operating on the web.
What we propose at Kaplak is (among other things) to introduce a capital bonus (i.e. kaplak) to those peers who successfully connect a niche product with a niche market. This, supported by other tools, will help speed up the connecting of products with their markets in online niche contexts and generate larger margins for our customers – as well as for the mediators.
Tags : Barack Obama, Google, Google ads, hyperlinks, hyperlinks as value, niche producer case study, peer production, search, seo, the good spots, Twitter
Posted by Morten Blaabjerg, March 10th, 2008 in Information filtering, The mainstream problem
Or do algorithmic search really scale better, work faster and ensure better quality than ‘socially produced’ services? A few days ago, I had an interesting exchange of POV’s with Danish SEO Mikkel deMib Svendsen, known among other things from the SEO radio show Strikepoint.
I replied to Mikkel’s blog post on ‘Search – before, now and in the future’, where I tried to make the point that search as a communications solution suffers from key preconditions, which are far from optimal. Among these the fact that in order to search for something, you need to know what to look for before you search, and you need to deliberately and consciously use a search engine to look for it.
In other words, search is a deliberate and conscious affair. This makes it difficult, for instance, to use search to market products, which are not well known, such as niche products, or to address problems or needs, which are not yet consciously thought or expressed.
Add to this the present growth rates of information on the web. For each new website added to the web, you increasingly risk that your information will never reach the queries seeking it. We’re talking exponential growth rates in several millions of new websites, added each month to the web. As Search faces these increasingly greater amounts of information, this problem, which we’ve so far dubbed the mainstream problem, will only become more apparent.
Mikkel, however, firmly believes in the future of algorithmic search, so these claims didn’t go uncommented. First, he argues that machines will always work faster and scale better than social services, which has great filtering and quality challenges :
I am completely in line with Louis Monier [founder of Altavista], and am 100% certain that algorithmic search will remain dominant. Manual data processing, like in the [online] social services, simply suffers from too many scaling and quality assessment-issues to compete in the long run. Only machines are scalable on the necessary scale and with a continued central quality assessment. … [my own translation, MB]
I have a few problems with these arguments for the quality of search as a communications method. I wanted to analyze them a bit here, in order to make them part of our process to find out more about the effectiveness of online communication and it’s niche/longtail effects. Over the past months, I’ve come to question the widespread naturalization of search as ‘the best’ and ‘natural’ method of making information available and visible online. True, right now Search is the dominant method for obtaining information online. It is also a billion dollar business, although this is mostly due to the success of Google AdWords/AdSense, which is quite a different product. However, this may not be so in the future.
The part of Mikkel’s argument which makes a distinction between social services on one hand, which are manually created and processed (by humans), and search on the other, which uses machines and algorithms, and therefore scale better etc, is fundamentally flawed.
First, search results are ‘peer production’ almost as much as any online social bookmarking service is, i.e. they are socially produced. Peer production is a term coined by Harvard professor Yochai Benkler (in The Wealth of Networks (2006) – which can be downloaded freely here). That search results are peer production means that they create value by putting together websites from different peers (i.e. companies, organizations or individuals) in order to respond to a search query. Google does this without even asking the peers first (it’s an opt-out, not an opt-in system), and so the peers used to create value may not even know that they contribute value to this system. However, this doesn’t detract from the fact that it is the sourcing and pooling together of the work of different peers as a response to a human search query, which creates value in a search result. A search result is socially produced, even though the work done filtering and presenting it in few seconds is done by advanced programming, software, hardware – and cables.
These advanced architectures however, are also created by humans, which means there are someone sitting and using their human capabilities to decide what categories and what variables should factor in with how much weight in the algorithms which control the process of finding and delivering information, when somebody searches for something particular.
The problem with this is not that the process is just as human-influenced as any online social bookmarking service for instance, but that the someone deciding what variables should factor in, is (most likely) not an expert on what the someone at the other end typing in a search query is looking for. The other someone is. It’s one size fits all. One architecture (in principle) for all queries in the world.
I tried asking Mikkel how he could be sure, that a query actually met with a usable result. Even if a query is answered by a number of search results, this doesn’t mean that the search results are actually usable and delivers the answer to the query. If this user experience is bad, search fails in delivering an answer, even if there are a million hits on the query.
Let’s take a look at something completely different, i.e. this page at Wikipedia. Notice the edits happening to the article “Tsunami” in december 2004? A page which before december 2004 had minimal contributions and edits made to it, literally exploded with new information, when a tsunami this month devastatingly hit the coasts of Sri Lanka and Thailand. Everything was frequently updated as events rolled along and people in different parts of the world found out new things about what had happened, complete with a small animation to go along with it.
Wikipedia aims to make knowledge freely accessible to anyone on the planet. Like providers of algorithmic search, Wikipedia uses lots of machinery to deliver it’s information, as well as an advanced complex of software architectures. Wikipedia’s articles are peer produced, but much more directly and consciously so than the algorithmically created search result we saw earlier. Even the software is peer produced. MediaWiki is free software, which can be copied and worked upon by anyone who wishes to do so, and any changes may be adopted by the main package.
A second point is the difference in value created. With an example from his own work, Mikkel illustrates how Search Engine Optimization (SEO) done right directly creates great surplus of value for the companies he and other SEO’s work for. Regular SEO maneuvres help direct lots of relevant traffic to the corporate websites.
That SEO helps create value, however, by more directly targeting traffic at corporate websites can’t be said to be an argument for the quality of search as a communications solution in and of itself, but rather for the quality of Mikkel’s and his colleagues’ work. There’s a lot of money in SEO, and that’s not because search is a brilliant solution to a communications problem. It is rather because search is inherently insufficient as a solution to the problem of connecting a query/demand with an answer/product, especially for a company which wants to stay alive and gain a competitive edge. And this problem will grow a lot bigger. I predict that Mikkel and his SEO colleagues will be paid even better in years to come.
It is first and foremost a problem of visibility, not particularly of search. We need to create better ways to make information accessible to the people who need it, without swamping those who don’t. Second, it is a problem of speed, because we need information fast, to better meet the challenges we face, as individuals, organizations and societies.
As a non-profit, Wikipedia doesn’t make any money on the processes involved in creating and building a quality article, but the value that an improved Wikipedia article (such as the tsunami article) provides for millions of journalists, for instance, and the newspapers and media companies which employ them, is indispensable. I know for a fact that reporters use Wikipedia a lot, and with good reason. It is the fastest and most scalable source of information online, beyond any doubt. And when as many contributors as in the tsunami article come together, it also proves a highly reliable and credible source. It beats the crap out of trawling search results pages without finding what you’re looking for. But it is only a small example of what peer production is capable of, given the right architectures and tools.
Tags : peer production, search, seo, speed, time, tsunami, video, visibility, wikipedia
Posted by Morten Blaabjerg, February 25th, 2008 in Identify challenges, The mainstream problem
I’ve had a few days these past weeks where I’ve been kicked out by a fever and a sore throat. When you’re sick you’re not up to much. And when your 8-month daughter is sick too, it’s really no fun at all being sick, if that wasn’t enough.
On the bright side, this gave me some deserved time to finally get into Carsten Jensen‘s epic Vi, de druknede (in English, We, the Drowned, appearing later this year). I’ve been looking forward to reading this novel for a long time since it was first published in 2006, and I am thoroughly enjoying it.
It is an epic about the history of a 100 years from 1848-1945, not through the eyes of kings or generals, but from the perspective of the sailor, the adventurer, the flogged, the fugitive, the runaway, the outcast, the drowned (in all kinds of meanings of that word), and the wives and children who were left behind without being asked, all native to Marstal, a Danish port town on the island Ærø. With the obvious exception of the shrunken head of James Cook, which figures prominently in book. The novel leaves one with a few interesting perspectives on things global and local, which is inspiring, not least in the context of the global internet, and in the context of Kaplak.
In a Danish context the novel is not exactly marginal. It received rave reviews and has been extensively marketed by it’s publisher and by booksellers, and has sold well. In this sense, it is an industrial product, mass produced and sold via traditional book selling channels. The book’s IP (i.e. translation and distribution rights etc.) has been sold to more than a handful of other countries.
On the global web, however, the novel is a marginal niche product. It exists at the mercy of search and an exponential growth of information on the web. In this sense it faces precisely the same challenges as a completely unknown novel by a completely unknown author, if it wants to move beyond the local context and use the internet as a marketing or distribution channel. Like many similar products, Vi, de druknede has it’s own website, but to find it one almost has to search for the book’s exact title. At least, one doesn’t find it by searching for the author’s name, which was my first choice, because apparently Carsten Jensen doesn’t have his own website! The first hit is to an architect of the same name, and another to a LinkedIn profile for a CEO with the same name. And Carsten Jensen is even supposedly the most prominent “Carsten Jensen” in a Danish context, which would lead one to think that he had a greater amount of links pointing to information about him and thus a higher Google PageRank. The most authoritative (international) source on Carsten Jensen remains a stub on Wikipedia.
Even if one does manage to find the book’s website, one will find that it is only available in Danish. Apparently the publisher has thought only about using the global internet for targeting the book to a Danish audience, even if the book rights have long since been sold to a number of other countries. This of course just underpins the status of the novel as an industrial product, which seeks to appeal to a national, mainstream audience. An English reader will learn more from this article, which appears as the second hit, when one searches for keywords such as marstal + sailors.
As a niche product, Carsten Jensen’s novel doesn’t fare much better on the web than most niche products, despite local rave reviews and traditional marketing campaigns via conventional channels. It is as less or as much seen, as it has customers who search for it. Making this easier for potential readers has apparently been of very little concern to the publisher, if one takes this superficial analysis at face value.
We, the drowned is an obvious metaphor for all the unanswered queries of the web. When writing this article I had to find out what a “shrunken head” was in English. It is easy when you know it, but how do you show a search engine what you mean? I knew the Danish word, “skrumpehoved”, but finding the English term was pretty tricky. Kaplak doesn’t have any ambitions for creating new or more intelligent ways to search, but we do think the activity of our network will likely help generate more relevant and context-rich web results, which will more likely cover a much much longer tail of niche interests and pursuits, than is the case today.
Tags : books, Carsten Jensen, Google, industrial schisma, search, Vi de druknede
Posted by Morten Blaabjerg, February 14th, 2008 in Kaplak on the web
If Kaplak is to succeed in ‘making the world’s ends meet’, we need to get in touch with potential customers globally. This is a daunting task, to say the least, and not something you do from one day to the other. Kaplak’s product may depend on technology, yet we can build the best solution in the world technology-wise, but if noone uses it or knows about it, it doesn’t matter. This is where this website, and this blog in particular, comes into the picture.
We need to connect niche producers with new markets, document that we are able to do this, and that our efforts pay off. What’s more important is that we have to do this simultanously with our product development, not after we’ve spent millions building the product, only to find out things didn’t look exactly the way we imagined.
In this respect, it is interesting to take a small peek at some of the traffic data we’ve collected so far.
This model shows an early tendency which is very reassuring. After the last two months of this website’s uptime, there’s a clear indication already, that Kaplak will not simply remain an obscure Danish project. We’re capable of reaching out and building a larger global base. The important question to ask at this point is why this is possible?
The model illustrates the effect of something I find incredibly important for the Kaplak Project, but which is often difficult to describe and communicate even to people involved in the project. This is why I wanted to show it here.
We have thought and contemplated Kaplak in a Danish context, in one particular local spot of the world. We’ve performed no marketing efforts at all, besides spreading the word through our local off- and online networks. Our initial traffic therefore consists mostly of our friends and colleagues, and their friends and colleages. But it may very well be, that our best, most motivated first customers are in Buenos Aires or in Koala Lumpur, and not in our local spot. We don’t know yet.
But we do know, that the only thing which have made it possible for us to attract visitors from as far away places as Uruguay, Viet Nam or Ireland until now is a mixture of hyperlinks pointing to our site and of texts, images and links, which makes it possible for search engines to index our site appropriately. This is why I make such a great fuss out of cultivating as much activity on the blog (among other things), as we possibly can, including real case studies with real answers from real people and companies, who feel what life is about on the slim end of the long tail. Because every time someone searches for terms in a unique way which matches the way our site has been indexed, we gain contact with someone who may share the aches, challenges and opportunities we describe.
The model also shows, how much more work we still need to do in order to accomplish what we’ve set out to do. It’s an uphill struggle for each blog entry, each reference, each visitor, each comment, each link which may connect us with someone, who really feels the niche producer’s ache.
We don’t need or want massive amounts of traffic for our website, at least not for the time being, but we’re very interested to see a healthy growth and composition of our traffic evolve over time, which makes it possible for us eventually to reach someone who is motivated enough to single our site out of the millions, and sign up for our mailing list, if he or she is interested in Kaplak.
Tags : attention, kaplak.com, search, statistics