November, 2003 – Unstructured Information Management

Posted by shannonclark on November 27, 2003

A site I need to spend a lot of time reading and looking at, as well as potentially should contact to explore an idea I have for a new organization and conference (non-profit most likely designed to be a networking tool for people writing systems that extract data automatically and then deal with it – whether from “unstructured” data sources, the “semi-structured” sources I tend to deal with, or even from “structured” sources (emphasis on the plural as being the case with AI comes in handy). I would like to then expand to also include people do search, classification and other related topics, as well as people who are looking at the technology and applications of “network theory”)

Here I mean “network” in the form of connections not just technical networks (which are one type of “network” but networks of people, cells busineses etc. There is an emerging but not fully named or understood discipline that is studying networks – in terms of organizational structure, growth patterns, and “end” states (all connected to all, independant clusters, etc.

Very cool stuff, in any case the site offers some positive viewpoint about what might be a workable means of growing my idea into something that happens and is viable.

Posted by shannonclark on November 18, 2003


Pretty cool stuff, need to think about it and figure out if I can see anything to do with it, but the concept of XML described games that can be linked together somewhat “on the fly” is pretty interesting.

I wonder if it might become something really neat if combined with a M** type code base?

Anyway something to look at, at the moment I have to work on coming up with some products to get funding from a company that has offered it to a partner of mine, so that is my first priority.

random thoughts and a business idea

Posted by shannonclark on November 12, 2003

or something because I’ve only been posting short stuff of late

So it has been a fairly strange past month as I have been recovering from my office move and have only just been up and running mostly fully functionally. I’ve been working out of my house at cafes many afternoons, not much different than working from home or from my old office, though I find myself making less phone calls. I need to get into a habit/routine where I make all my calls at one time and then work from cafes when what I need to do is write.

But enough on that for the moment.

More interestingly (I think at least) is a conflation of events that have been happening of late and some ideas which I have that I am considering looking into.

First, the idea in a short summary, more on it later I hope (if you are interested, have suggestions, paying investors etc, please contact me privately, if you don’t know how, look for me at Ryze and contact me there).

“A directory of the Internet that works by AUTOMATICALLY sorting pages into collections and letting you find pages that are related/similar very rapidly and quickly. Further, it would have some elements of ‘paid search’ to provide a revenue model and would apply some pretty cutting edge AI techniques for the classification and search technology”

To start it off we would be using some standard crawling techniques, but would be building up our indexes in a fairly new and innovative way. Further, rather than attempt to force pages into a vast tree ala the Open Directory Project (which is nearly impossible these days) or forcing people to get very very good at searches, I hope to have an interface and an overall data structure that would work in more intuative manners.

For example, you might be able to cut and past something into a big box on my page – we would then return results that were pages that were most likely related to the content you posted. From each page you could also then find other, related pages and/or highlight specific blocks of text (or words) that you would like us to find more pages similar to.

Unlike Google we would not have a short number of words as a limit to search phrases, nor would putting in too many words result in no results (and/or being forced to play around with word tenses, pluralities etc, over time we would address this via dictionaries, thesourses and translation abilities, eventually being able to find pages in non-English languages from an entirely English search request.

Over time as well I think this would be built into either a Bookmarklet and/or toolbar type funcitonality as well as a web base page model.

It sounds pretty complicated (and it is) but I have many years of AI technology that has been sitting on my shelf that I plan on bringing to bear on this problem – specifically I will eventually be able to pretty powerfully detect page changes, differntiate between major and minor changes (i.e. changes of actual sections vs. changes of a ad or the date), and do some other rather cool stuff to the data that we collect.

I also plan on integrating our efforts with other revenue opportunies on the web – Google AdSense,’s affliate programs, PayPal’s programs and the like (perhaps Overture though I think Google might make more sense for us).

As well, I will likely offer “sponsorships” of certain things, though exactly how that will work is something I have to figure out since unlike an Open Directory type project we won’t have very fixed categories, though we may have something close enough for sales purposes.

My underlying technology likely would cause some developers to have fits – I don’t plan on doing this in a very traditional manner… but heck that’s probably why it will work. 🙂

In other news, I have sparked a bit of a fit over at ChowHound with a post I made about Spoon Thai and it was followed by another discussion on the “Not about Food” Chowhound board on Secret menues. Saturday I may be joining them for a tasting of Italian Beef around town, not sure exactly what the reception will be, but personally I think the discussion while at times getting too personal has been interesting and useful and entertaining.

Last night I finally saw Memento on DVD, a movie I really should have seen when it was out in regular release. I originally read the story first published in Esquire Magazine so I had some idea what was going on, but still it was a very enjoyable puzzle film and after watching it, like any good net person, I went looking for and read some online discussions of what “really” was going on (as well as tried to get the supposed easter eggs such as showing the film in time order to work but could not).

CodeCon 2004

Posted by shannonclark on November 10, 2003

Could be interesting. I’d like to try to attend this, mostly to see who else attends, learn more about what they are developing, possibly network for future employees, and in general keep an ear very close to the ground for some of the more cutting edge (but working) applications out there.

If I’m really up to it I might even try to present at it myself.

Shirky: The Semantic Web, Syllogism, and Worldview

Posted by shannonclark on November 7, 2003

Clay’s article makes a point that I made almost exactly a year ago, that there is a fundemental philosophical problem with the Semantic Web itself. That is that context and difference is an inherent part of the world but is ignored in the Semantic Web.

I am not sure why Anil Dash when linking to this article thought that Clay was “wrong” – I fundementally think Clay is RIGHT.

Posted by shannonclark on November 3, 2003


Looks to be really useful site that I need to spend some time looking over – since my company does almost all of our development on PHP, these may be some systems and tools we need to be using.

Lotus Notes and Outlook issues

Posted by shannonclark on November 1, 2003

Hi all,

A technical question and request (and an offer in return).

I am in the midst of moving to a new laptop and am exploring
alternatives to Outlook as I would like to avoid the cost of buying MS
Office (if I can). My laptop came with Lotus SmartSuite (which seems
more than sufficiently compatible for all of my needs for
presentations, writing, and spreadsheets) and a license of Lotus Notes.

However, I have been unable to figure out how to import my 1+ gigs of
mail, calendaring and other data from Outlook into Lotus Notes, and how
to configure Lotus Notes to work in an Internet email only manner (i.e.
no Domino server).

Is there anyone on the list that knows enough about Lotus Notes,
especially in a client only mode, to help me with this?

And/or can anyone suggest an alternative to MS Outlook?

I have extremely serious mail requirements, so something like Outlook
Express is not an option. My needs are for an integrated mail, calendar
and address book capable of handling about 1 gig of past emails
(imported from a .PST file), 1500+ contacts, 3+ years of active
calendar events. It has to have very powerful rules for mail reading to
allow for automated filtering/filing (importing MS Outlooks rules would
be nice but not mandatory).

Further, I have to be able to easily and quickly access all of the mail
– mail split into many “archives” and/or search results that don’t show
everything are non-starters. (I recently evaluated Bloomba a new mail
software package, which is nice, but has a 2500 message limit in any
given view – that can easily be less than a month’s email for me, so
that limitation is quite severe for me).

If there is some way I can help you out in return, please ask.



