Searching for the Moon

Shannon Clark's rambles and conversations on food, geeks, San Francisco and occasionally economics

Archive for October, 2006

Sunday Brunch

Posted by shannonclark on October 25, 2006

I will have a bunch of people over this Sunday for brunch at my new place – if you are reading this blog and would like to join us leave me a comment or send me an email. I’ll be cooking from ingredients I pick up at the Farmer’s Market on Saturday (Ferry Building probably) so should be a lot of fun.

Posted in geeks | 2 Comments »

Switching from my PC to a Mac – keeping iTunes ratings

Posted by shannonclark on October 25, 2006

This is a first of what will likely be a series of posts documenting what I discover as I seek to make my new iMac 24″ system functional and make it and my trusty ThinkPad T40 running Windows XP play nicely together – and together make me more productive.

Yesterday I started (it ended today) the full backup process from my PC to my new iMac – migrating over the nearly 30 gbs of data (though not yet my ~8gb of mail archives) including my full media library.

Today I tried (and finally succeeded) to make my iTunes on the iMac reflect the current state of iTunes on my PC. What I first tried was importing the files (which then copied them to another directory – default behavior on the mac) and which did not keep my many playlists or even more crucially my ratings and play counts.

I found a great resource online on how to at least migrate iTunes libraries in way that preserves my ratings and playlists. However it does not, unfortunately appear to preserve playcounts – which is a field I use extensively when managing my listening – especially managing my shuffle.

Since further Apple does not (at least last I heard) allow an iPod shuffle to work with two machines (they do let regular iPods) I will probably have to keep on using my Thinkpad as the sync machine for my Shuffle.

I looked at many podcatcher clients for the mac and so far, I am not at all impressed. They look nice and have some “features” but none of them that I have found so far have the features of Doppler Radio which I have come to really appreciate and love. Specifically Doppler Radio allows me to manage each podcast feed separately and transparently (though mac podcatchers do a much better job of integrating the rss feeds and the full posts – so they are very nice as rss readers). Each feed gets a new playlist in iTunes (something I can not imagine forgoing ever again – as I rarely want to just listen to all my podcasts – rather I want to easily select between the individual episodes). One of the mac clients has the “feature” of adding podcasts to iTunes as a podcast.

However for a shuffle user in particular this feature is nearly utterly broken. By default (and I really truly do not understand why) all podcasts go into iTunes with the “do not shuffle” checkbox selected. There is no way, at least that I can tell – and I have looked, to change this default. And it has the impact of not allowing those files to be added to a shuffle. So as a result I am very, very reluctant to use iTunes to subscribe to ANY podcasts as I have to manually make that change for each and every podcast episode when they come in if I want them to get into my shuffle automagically when I next sync it. Very frustrating and nearly useless. (keep in mind that I subscribe to about 60 podcasts or so).

Also I haven’t found a podcatcher for the mac that is as simple as Doppler Radio is when it comes to managing and auto-deleting files.  On the PC I simply set for each feed what criteria I want to use (I usually say anything that is rated 1 star) and then when I decide I do not want to keep that episode I rate it one star and the next time I sync via Doppler it gets cleaned up for me automatically.

On the mac that cleanup process appears to always be separate from the process of subscribing. And further, I can’t get the very basic stuff like “update all feeds” to work smoothly.

So my plan is to migrate all my non-podcast media to the iMac and use it into the future as where I keep most of my media (non-podcast) files. This will free about 12 gbs of data from my ThinkPad (I may keep a few gb of music I listen to most often for the occasional times I’ll want it when traveling. I probably will remove most of my podcasts from the iMac – with the exception of the ones I have rated highly and want to retain. I’ll sync my shuffle with the Thinkpad and keep using Doppler until something better comes out for the mac.

Posted in mac, podcasts | Leave a Comment »

Jetblue and the iPod Shuffle

Posted by shannonclark on October 21, 2006

I flew jetBlue this past weekend for the first time, as I boarded in Oakland I selected and paid $1.00 for the earbuds that Jetblue sells at their gates for you to buy and use inflight (and to then take home).

This evening as I was walking out the door listening to Coverville on my iPod Shuffle I decided to try the earbuds from Jetblue on my shuffle – they were sitting there on the bookcase where I had unpacked them. I plugged them in and wow, the difference from the standard, basic earbuds from Apple was like night and day – the sound which had been muted and muffled became rich and resonant. I’m still getting used to them, but I can tell that they will make a big difference in my listening enjoyment.

Well worth the $1.00 investment, so the next time you are flying JetBlue be sure to pick yourself up a pair.

Posted in geeks | Leave a Comment »

Using IRC – after 15 years of mostly avoiding it

Posted by shannonclark on October 21, 2006

Okay, so in the past few weeks I have finally started making active use of IRC. I say finally because I first encountered IRC over 15 years ago when I connected to the web for the first time in my college dorm room as a Freshman at the University of Chicago. I had been on the Internet even before that, when I was a summer intern a Argonne National Laboratory, but I hadn’t used IRC there, just email and some early web research tools (this was long before www).

So anyway since then I have made occasional attemps to use IRC, mostly for a local IRC backchannel at a conference I was attending, usually downloading a new IRC client for the occasion or using a browser based client. However at a recent event we decided to make more ongoing use of an IRC channel as a means of staying in contact post the event. I discovered that Trillian includes a pretty good IRC client so I have been happily using that.

I know about #joiito and plan on lurking there (lurking at least until I figure out more about how to use the #jibot so my nick is persistant etc – still haven’t figured out some of the basics like how to do pm’s via Trillian etc

But my question/post is to ask “where else should I lurk?” and what else should be I looking to IRC as a channel to do? I’ve missed out on using IRC for a long time, which is somewhat silly but also because I have have been a rather slow adopter for the alpha geek I am more typically seen as being – sure I do know and hang out with the founders of many a web 2.0 company, and yes, I have been online since 1991 (with my own servers for most of that time) and sure, I can and have programmed in nearly 10 different languages, have built apps etc, but that doesn’t mean that I use every tool I should be using or use them to their fullest extent.

Posted in geeks, internet | Leave a Comment »

Traveling this week to NYC, DC, NYC again

Posted by shannonclark on October 11, 2006

I will be in NYC on Thursday (leave a comment/email me if you read this and want to meet up, I’ll also be in NYC on Sunday – Tuesday of next week) and earlier today while reading my blog subscriptions on Bloglines I noted that Andrew Sullivan will be signing his new book “The Conservative Soul” at a Barnes and Noble in NYC on Thursday. If I can, I’ll try to go by and say hi.

Over the weekend I’ll be in the DC area but will have only limited time as I’ll be there for an event (non-public).

On Monday, my friend Ori Brafman, author of The Starfish and the Spider: The Unstoppable Power of Leaderless Organizations will be launching his book with an event at the British Consul in NYC which I will be attending.

Posted in geeks, reading | Leave a Comment »

Equiping the entrepeneur/researcher – buying a new computer

Posted by shannonclark on October 10, 2006

Later this month I will buy at least one new computer, possibly a couple of systems. These systems will be for work purposes (but also some fun) so while price matters, it is not my only or even primary concern. Rather, I am looking for the right mix of components to support a wide variety of future needs – all while, I hope, helping make me more productive and successful in the future.

What my uses will be

For the desktop system(s) I will be using this as my primary hub, a repository for lots of data, a testbed for visualization techniques, a development/staging server for some light programming (see my recent post on the Netflix Prize for one such use). I plan on running multiple OSes (either via dual boot or tools such as Parallels). In addition to research, writing and blogging from this system, I also anticipate doing some light audio recording/editing, some video editing, and much increased use of digital photos. And while it won’t be my primary use, I also do hope to catch up on the games I have missed playing for many years and perhaps dip into Second Life and WoW.

From a technology standpoint I have a few, relatively simple requirements. I will run this system as a dual monitor system, probably with dual 23″+ monitors (either Apple Cinema Displays or the Dell 24″ display) -not the cheapest option but very sharp, very high resolution, and will finally give me more then enough pixels to really use all the tools and capabilities of my system.

At a minimum I want 500GB of storage, I’m currently downloading about 400mb+ of new podcasts every day (currently deleting them after listening in most cases), most of my relatively small cd collection remains unripped for lack of space, and I have only barely touched on digital video – almost never using bittorrent etc to download videos, let alone editing or making my own.

For future proofing and the programming I’ll be doing, I’m planning on at least 2gb of relatively fast RAM, and I expect to upgrade this in a relatively short timeframe to more like 4gb or even more.

Since my apartment does not have ethernet drops but does have wifi, I plan on primarily using this system via wifi (perhaps running ethernet when I need to do serious uploads though my wifi speeds are probably faster than my DSL upload speeds).

Some of my programming work will involve visualizations but more critically I want to be able to run the full versions of newer OSes such as Vista (at least for testing, unlikely as my primary OS) and I’d like to be able to run modern games in their full visual capacity. So I’m thinking at least 256mb of dedicated video ram on a fast graphics card, and possibly a dual graphics card system to support the dual monitors very well.

My decision

I’ve decided to go with an Apple iMac + additional 23″ Apple Display. As of this morning, Apple has them refurbished and available at the Apple Store they also have some really great deals on a MacPro if that is what you are looking for (I was tempted but decided while I do anticipate needing to upgrade what I bought – with more memory and perhaps a bigger HD – I’d being paying a $500 premium for that priviledge but more critically I’d be getting capabilities I may not need (quad Xeon’s pretty much overkill for much of what I need to do I suspect). Though I was (am) tempted.

[UPDATE – the iMac I ordered wouldn’t arrive until Nov 30th at the earliest, so I have canceled that order and ordered customized iMac 24″ with 2gb memory and a 500gb HD which will arrive later this month instead, a bit more but I’ll have the machine over a month sooner and with two of the upgrades I otherwise would have to spend about $500 or so on anyway – so for $100 or so more I’ll have the machine when I need it and without the added delays of buying memory and a larger hd and installing them]

I also picked up a wireless keyboard/mouse, copy of iWork, and a copy of Parallels (so I can run Windows XP or Vista and Windows native apps without rebooting).

It has been many years since I used a mac significantly in my daily life – my family bought a Mac 512k MANY years ago (we then upgraded it to the Mac Plus – i.e. a whole 1mb of memory). However, in college my dorm computer was a NeXT Cube so in many ways OS X will be a return to my roots – it is deeply indebted to the purchase of NeXT (and since I’ll have a mighty mouse I don’t even have to suffer the woes of a single button mouse – yes, I know, the mighty mouse is technically a single button mouse tricked out to pretend to be multi button and have a scroll wheel).

My biggest migration worry will be migrating my very large archive of Outlook Mail (about 6 GB of mail in a variety of .pst files). That is my most significent collection of data and resources that I need easy access to (though I’m not giving up my Thinkpad so I will have access there – but having everything in one place would be nice).

I will also need to figure out how to keep some data files in sync between my ThinkPad and my iMac – specifically my music collection and especially the podcasts that I download – ideally I’ll find a REALLY smart set of podcatchers that could do the following:

– allow me to download on EITHER system (i.e. when I am on the road with the laptop)

– allow my portable music/podcast player to sync against EITHER system (not sure if I can do this with my current device an iPod Shuffle that I think is restricted to syncing with one system, the “real” iPods I think can be a bit fancier in their sync capabilities)

– note when I have downloaded the content and instead of syncing it again just copy it from one machine to the other (i.e. keep the media files in sync – or most specifically keep my laptop’s media files in sync with my “full” collection on the iMac)

– migrate any paid content I buy (say the few items I have bought from the iTunes Store) so that it is valid on either system

– syncing includes noting when my podcatcher deletes a file – i.e. currently if I rate a podcast with 1 star, my podcatcher will automagically delete it from both iTunes and my HD when it next syncs that podcast – saving me a lot of time and effort

– in the very best case my ratings and played attributes will also be synced (i.e. did I skip the file, have I played it yet etc). I typically only download to my 512mb Shuffle unplayed items – whether podcasts or music, that way I’m slowly working my way through my collection as I listen over the course of the day

Other software I will be seeking

I will definitely heed the suggestions from friends and the Internet, such as the recently posted “10 Apps to Increase Productivity on your Mac” or Google’s list of Google apps for the Mac. And being an avid user of MindJet’s MindManager for Windows, I will definitely take a look at their recent Mac version of MindManager! [full disclosure, MindJet sponsored my conference, MeshForum and gave me and many MeshForum attendees a free copy]

Much of my current online activity is browser driven, once I port my bookmarks from Firefox to the new iMac (and likely install Firefox there as well) I will have many of the things I use every day. Plus iWork for the occasional presentation or document I have to create (and I’ll work on porting/converting my archive of Word docs and Excell spreadsheets as necessary).

iLife should be good enough for my initial photo managment, podcast creation, etc and there are of course great Mac software for all of that. My biggest concern at the moment will be how (if) I can automatically keep some parts of my two systems in immediate sync – i.e. when both systems are on the same network, can I get them to auto sync as necessary (which not coincidentally would serve to back up my laptop and parts of my desktop – though over time my desktop will need other backup measures as well as it has close to 5x the useable hd space of my laptop.

What other applications should be I looking at – both free and paid? What games are available natively on the Mac (especially natively i.e. not Rosetta on the Intel Mac – i.e. is WoW available? How does Second Life run?).

I am looking forward to installing some of the many great and useful Unix applications that I have not been using – especially analytical and visualization tools for data analysis. I may also buy a Wacom tablet so I will have a tablet interface (and can use the little known Apple Inkwell capabilities of OS X)

Posted in Entrepreneurship, geeks, mac, tablet pc, working | 1 Comment »

Advice for the modern entrepreneur – passion, discretion and pricing

Posted by shannonclark on October 6, 2006

I started JigZaw Inc ( – site’s broken, will fix soon) in early 2000, incorporating it in May of 2000 (just in time for the crash). Since then JigZaw has gone through many changes, I’ve also started a non-profit MeshForum (, run a couple of conferences, held/helped with lots of other events, and met/worked with numerous entrepreneurs (and moved from Chicago to San Francisco). JigZaw was (is) funded out of my own and friend/family’s investments, while we considered additional angel/vc rounds we have not yet raised any money through those means, nor does it look likely in the immediate future.

This post is intended to outline what I would do/use/leverage today were I starting a new company or project. It is the first post in a series of unknown length, I welcome comments, suggestions, alternatives and counter arguments. In addition to the links and resources I directly link to, I’m indebted to the many entrepreneurs and writers I know and follow.

My assumptions

You have an idea. Perhaps some initial thoughts on how to go from idea to something. You have, at least some, of the skills that will be required and/or you have the nucleus of a small team to work with to make your idea happen.

Your idea involves technology and, most likely, the Internet (or similar networks such as mobile). If not, much though not all of my suggestions would still hold true – even a new café or retail store should both leverage useful technology and give some thought to how to use the Internet well.

My focus

I am a geek, I’ve been using computers since the 3rd grade, when among other things I would do the flowcharting homework my mom assigned to the students in the college course on computer programming she taught at a nearby junior college. I’ve been on the Internet, running my own server, since 1991. I know, at least enough to get in trouble, over 10 programming languages and I’ve adminstered servers both Windows, Linux, Unix, and others for nearly two decades. My consulting, however, is generally at the intersection of technology and business – I’m hired to help run projects, to analyze existing technology, to brainstorm new strategies and approaches, to identify vendors, to analyze investments (or possible investments), to offer new and alternative approaches to business – many of which involve the appropriate use of technology.

While for much of the past decade+ of my professional career I have used Unix and Open Source, I have also worked with many Windows and other proprietary projects and tools, my approach is to focus first on the business needs, next on the resources, skills and constraints and only then to match up the appropriate technologies – whether internal or external, open source or proprietary.

Getting start – from idea to the next phases

When you first have an idea your thinking goes something like this “This will be great we do a few simple things, launch, everyone loves us, we start buying our own private islands” Okay, there are many variations on those thoughts and not everyone focuses on a big eventual payday – for some simply “everyone loves us” is reward enough. (or “we change the world” which should, after all, be a pretty good reward – assuming you like the changes).

But all too often entrepreneurs focus on the what – on the technology and ignore the underlying business. This is not always a bad approach, without something it is hard to progress and especially early on you are wise to get it working before you dwell too much on the impact of GAAP accounting etc. But it is a delicate balancing act – far too many companies, including some which have raised funds and employ (or were founded) by friends of mine have powerful, cool technology, but little if any mapping of that technology to real business opportunities. At the other extreme, there are an increasing number of “technology” (web 2.0, new media, insert your own catch phrase here) companies that are not, actually, tech firms – but are rather new(ish) versions of media/entertainment properties. Certainly you can make a lot of money entertaining people – but the blurring of the industries does give rise to confusion.

So what should an entrepreneur do?

Start by trying to answer this question:

“What do you care passionately about? Which people? Problems? Industries? Are you going to be as passionate about this for the next decade? Longer?”

Then – with that passion – what needs to be there that you can address? Both in the fantasy future (“search the world’s information” – Google’s mission) and in the immediate term (i.e. real users, preferably real income generated, in the next 3 months or sooner)

For most people, to achieve these passionate goals will take a team – what role will you play? What roles do you need to fill?

I’d encourage you to think not in terms of job titles (CFO, CIO, Director of Sales) but to think in terms that arise out of what you will be working on, who you will be working for (besides yourselves) and what has to happen to both keep the lights on and deliver. If you are naturally introverted, most businesses will need some public faces and outgoing types to interact with the rest of the world – to build the relationships that generate income. Someone has to dot all the i’s and cross all the t’s, get bills paid on time, collect money, allocate scarce resources and find creative ways to get what is needed. Ideas have to be implemented and often that takes a variety of skills – whether you are making a film or building a new piece of software/web service.

With the beginnings of your team coming together, and with much thought given to what you are passionate about – next start to think about what you can do now – and where you want to be in the future. Here is one initial major “fork” to think through.

The fork – are you discretionary or critical?

I am writing a book (well first a book proposal, then a book) on economics from a network perspective. As I analyze the economic network around an entity one question I ask is whether the entity (person, company, organization) is a discretionary path or a critical path.

You can think of a “firm” as being an entity through which value from many other entities (physical goods as well as work and intellectual efforts) are pulled together and made available to others. For the firm to succeed and prosper the costs over time have to be less (or at least equal) to the value generate and made available to the firm.

If you are part of a critical process then you works and the value you provide is part of how other entities, in turn, make money and suceed – i.e. if you supply parts to a car company they make money by adding value on top of your parts. Likewise, if you sell that car company software in the long term they should be able to make more money by having and using your software than if they did not use it (or else it is unlikely that other companies will keep buying your services).

On the other hand, if you provide something that is discretionary what you offer is nice for others to have and use, but is not part of how they, in turn, generate value. Over a long term it may occasionally help them (by sparking a good idea) but the choice of whether or not to use your service is not based on, in turn, generating value, but based on something else. Often you are offering entertainment but many other types of firms have a strong discretionary component to their offerings – the extra, non-tangible elements to the good or service.

It is worth noting that a vital aspect for many businesses is learning for whom they are part of this critical path, and for whom they are discretionary, the balance between these two elements of a business can be the path to financial success. Ebay for example serves millions of people as a source for their (mostly) discretionary purchase – their collections, music, books, clothing, etc. However for the hundreds of thousands, perhaps millions of sellers Ebay has rapidly grown into a critical business partner for those firms, one who delivers services and an audience of buyers in exchange for a percentage of each sale – a global, always on mirror of the landlords of large shopping malls.

For the entrepreneur this is a process that will not stop, the balancing act of who pays the bills vs. who the business was built to serve. The successfull entreprenuer will find creative ways to shift value from one set of parties to the firm and to a larger network around the firm.

If you are building software, and you want to sell either that software directly OR services built on top of that software, it is vital that you truly understand what value (if any) the software delivers. If the answer is “indirect” or “it is fun” then, though you have packaged it in technical terms, you are participating solidly on the discretionary side of spending. You can (and many have) still make a lot of money for yourself and the firm, but you have to bear in mind the differences as you price your offerings and as you create the business structures.

Advantages and Disadvantages of each aspect of business

If your firm delivers direct value, that is, if via the services or goods you provide others directly make money, then as they grow you can grow with them, to the limit of their capacity to, in turn, create value and sell what they do with your goods and services. For many businesses this is effectively very, very large, nearly no matter what your product if Target or WalMart determines that they can sell them for more than you charge them, they will buy a large quantity of the good (or service) from you and repeat until buyer’s change their purchase patterns.

Thus an advantage is that you can experience growth directly with you partners and when you are small you can grow to a proportion of each partner in line with the percentage of value your offerings offer them (i.e. if your firm offers a part of a car that is 10% of Ford’s business, and your part is 10 % of the car, you could grow to be just under 1% of the size of Ford – which is indeed very large, if somewhat unrealistic for most startups).

The disadvantage is that your pricing will be constrained not just by any individual partner’s capacity to make money off your goods or services but by the network of your partner’s. To an extent you will try to offer customized products and services to each partner so as to capture the best percentage of the value your goods and services offer that partner, but especially given the increased transparency of the Internet and given the flow of information, in practice you will be constrained to the LOWEST price a significant partner will pay (assuming that you turn down partners who offer only to pay low prices and only purchase at a low level, very likely the cost of manging them combined with the impact on prices to other partners makes turning down that business the better step much of the time)

In contrast while there is indeed very intense competitive pricing for many aspects of the “discretionary” markets, there is also a much wider range of options and approaches around price. Your prices are not determined by partner’s requirements (if they are you likely have aspects of your business that are no longer fully discretionary – DVD sales have become a critical part of the sales of many stores on and offline, while the prices certain fans may pay for certain content are very fluid, the businesses involved such as WalMart dictate that wholesale prices be structured in certain ways and that retail prices in other ways).

But assuming that you have more direct contact with the purchasers of your goods or services you have a variety of approaches to how you price those services.

As a consultant while I like to think that I offer services that are ‘critical path” and highly valuable for my clients, in many cases the decision to hire me is a discretionary one, an investment in having more perspectives, new approaches, deeper analysis, but except in a very few cases my services will not directly in turn generate revenues for my clients (some lawyers who have used me for expert witness services may have marked up my time as might the occasional firm for whom I sub-contact on a given project, but generally it is rare). Thus I can set my prices and in no small part my price is a signal to the buyers of my services of the value they will be getting (i.e. if I still sold my time as I did when I first entered the consulting field then a day of my time wouldn’t buy a nice dinner for two at a good restaurant – now an hour of my time should buy a very good meal for two, plus wine and tip with money left for the valet).

beer price photo by CoryPina - will send a signal and reflect not just where buyers will place you, but how you as a firm view the market. If you price services around the cost of a beer (as my friend Ethan Zuckerman discussed at his presentation at PopTech 2004 that successful Internet Cafes priced an hour of use at the local price for a beer – i.e. at the amount available locally for regular discretionary spending) then you view what you are offering as being an alternative to buying another beer at a bar. (photo from CoryPina)
That, in turn, probably means you are looking to sell a lot of something to a large number of people, and very probably sell to many of them on a regular basis (yes, Starbucks fits this model exactly, down to the selling of brain altering substances).

For technology comparisons here the pricing of individual games and ringtones on cell phones is one immediate comparison. Others are many newer hosting offers (down to <$7/mth or less than the price of some beers in big cities) The special (but common) case of services being offered online for “free” I will address in more detail both below and in future posts.
A different, but also viable approach is to price at very high premium, targeting a smaller audience (though it can often still e quite significant – i.e. see the fashion industry) at a price point that will not be compared to a beer but more like to fine dining or much more. This strategy, applied to technology,

Technology comparisons here would be some games and game consoles (though many of these are adopting pricing models closer to the high volume, low per unit model). Many major pieces of software are, or at least claim to be, tools used by the buyers to in turn make money (SAP, Photoshop, Visual Studio).

And then there is “Free”. Don’t get me wrong, I’m a huge fan of Free, both as a user and as a business advisor, but the key is to both realize when what you are delivering is not really “free” – i.e. when value is exchanged but in ways other than directly financial. For example all the “free” users of Google, when they click through on Google Ads are very directly generating revenues for Google. And through their use of the search engine (and when they do not click through) they are helping Google refine the algorithms.

But more on that in future posts…

Posted in Entrepreneurship, geeks, internet, venture capital, web2.0 | 1 Comment »

my next hack – hacking Netflix

Posted by shannonclark on October 2, 2006

With their permission that is

NetFlix has just announced the Netflix Prize which will award a prize of $1M for a 10% improvement on their recommendation engine, based on a dataset of over 100M ratings of movies which they are making available for research purposes to anyone who registers to participate.

When I founded JigZaw Inc in 2000 I embarked on many years of research into various aspects of machine learning and AI. My initial focus was on automated data acquisition, techniques for automating the understanding of data structures (especially from semi-structured data such as web pages) as well as techniques for extracting that data and converting it into “real” structured data. But beyond those techniques I also started a lot of research into data clustering methodologies and approaches, with my interest focusing mostly on some fairly complex ways of automatic data clustering into data-driven categories (including the possibility of overlapping categories) I was and am less interested in the “postal code” type of clustering, where the categories are known ahead of time, are fixed and usually are unitary – i.e. a specific element can only be placed into one and only one category.

What I’m more interested in is the much harder problem of automatic data driven clustering – clusters that are properties of the dataset but which arise naturally through the data analysis, not from a priori defined categories or cluster types.

But there has always been a very real lack of serious datasets to test my theories upon so I haven’t done much with them for many years.

Netflix’s announcements change all of this, with a single, well thought out action, they have made a very large (and furthermore mostly real) data set available to nearly anyone (if you live in certain countries or Quebec you aren’t eligible to participate). I know that I plan on registering and downloading the dataset and exploring it, even if I don’t seriously enter the competition.

Though, that said, I do think I have a number of approaches and techniques that would achieve very real and valid results.

But I do have a couple of procedural questions as well as some real concerns. First and foremost while I applaud them for the very real steps they are taking to preserve user’s privacy, by modifying the data in a variety of ways they do cloud the validity of the data as well as embed into the contest certain assumptions (some of which I had planned on questioning in a few of my approaches).

This is not all of those approached, but for example, by modifying in some cases the date when a rating was made they change in unkown ways the temporal factors implicit in those ratings – one testable assumption might be that people who typically watch movies over the weekend (and rate/return them early in the week) have very real and measurable differences from movie watchers who primarily watch movies during the week returning them anytime. Not to mention that some possibly calculable measures such as whether or not there is a correlation between how long someone kept a given movie and who positively/negatively they rated that movie would be worth testing. (I know in my own experience when my ex-girlfriend had a netflix subscription that certain types of movies, often ones we felt we “had” to watch but generally didn’t really love, might sit, unwatched, for weeks or in a few cases many months.  (time might also be a proxy for other more typical factors – the differences between a single mother renting for herself as well as for her young children and those of a single renting mostly for weekend (or less commonly weekday) movie watching with a partner.

Anyway, I encourage the research inclined among you to check out Netflix’s announcement (and as I announced a few weeks back on MeshForum, MeshForum is looking to work with companies on the creation and release of network datasets for general and broad research projects. Netflix’s model is a good one – though I also hope they make the full data collections available to anyone interested in research (and/or allow them to be mirrored in dataset archives such as the one MeshForum is looking to build). MeshForum’s mission is also to encourage companies to do more than a single, one-time release of data, rather we’ll looking to support and encourage companies to make large network datasets available on a regular and reoccurring basis (one to two quarters delayed being  perhaps a good basic model to consider).

If you are interested in working with me on this project please leave a comment with your contact information or feel free to contact me directly.

Posted in geeks, Movies, working | 1 Comment »