Projects for the New Year
Posted by shannonclark on January 3, 2005
Blogging my thoughts and ideas for new projects this year, these are rough outlines and rough ideas – they may be worth my time, they may not be, at the moment I am just trying to capture my thoughts and an outline of some ideas.
First, a series of articles/papers which I would like to write (co-authors, publishers, clients for test cases please contact me or comment here).
1. Summary of the processes I used to create Pieces – what I term the technology behind JigZaw’s automatic data extraction technology (a demonstration is available, link will follow but it is embedded inside of a more complex calendaring application we wrote a few years ago). This combined techniques from about 100 different sources – including a few PhD and master theses and other published sources (and a few unpublished but available online sources). What the technology does is twofold. First, “chunk” semi-structured sources into their logical components and then based on a particular ontology (what we termed a defined set of definitions and rules about knowledge in a particular domain) extract data from the chunks that were identified and make it available in a machine parseable format (XML perhaps but also other formal data structures). The Chunking process could be driven in part by the particular data to be extracted (i.e. via a process that used the presense/absense of data in chunks to help determine how to select between alternative means of chunking a particular set of data. Advanced features which were planned but not yet developed would include machine learning to refine some particulars of the process and further data source manipulation to enhance the value and detail of the data we extracted – as well as improve the handling of some “edge” cases (specifically the first or last data present “chunk” in a source that held multiple examples of data to be extracted – at times these edge cases could cause a problem for our algorithms.
2. Write up my consulting process – specifically including two different processes which I follow. [UPDATED later in the evening on 1/3/2004 – forgot to include the second process 2b]
2a – Project scoping/specing process. Unlike many technical consultants, JigZaw approaches consulting projects without a pre-concieved packaged solution or specific vendor we prefer to work with and sell. Rather, our goal is to understand our client’s business needs and then assist them in identifying the appropriate and deliverable solution to those needs. Towards this end we have a specific process we follow when specing out a project.
This process is to work with the client to create a set of requirements for the project – however we sort these requirements into three special categories. First – drop dead requirements – these are the requirements that make or break the success of the project – completion of all of these requirements defines a deliverable project, failure to deliver them means the project is incomplete. Second – what would be “nice to have” features/aspects of the project. These are features or aspects that while valuable, are not required for a deliverable project – generally this is a large category where a significent percentage of the “requirements” of a given project fall. And finally, third – fantasy requirements. Frequently these are overlooked and unmentioned in formal requirements gathering processes. These are what the client would dream that the project could do – here is where, as an outside consultant, we get to glimpse at long term impacts, at future goals, and the driving forces behind the project. Knowing these features more often than not allows the project to specified and developed in a way that often allows for the delivery of one or more of the “fantasy” elements with the initial delivery of the project. And in almost all cases, knowing them allows for the built into the project capacity and plan to achieve the long term, “fantasy” features.
Very often, these fantasy features are features that the client thinks are impossible, very difficult or very expensive to achieve. Very often the client is wrong in more than one way about these features – not uncommonly they may actually be cheaper and relatively easy to achieve.
Having all three categories defined prior to the start of a project, JigZaw is then able to work with the client on usually a fixed-fee and time basis. That is, we agree with the client to a quote for a particular time of delivery as well as on a specific fee and payment schedule. This timeline promises the delivery of the requirements (along with many, though not all of the “nice-to-haves” and if possible some or all of the “fantasy” elements). Often the schedule will include specific additional requirements from the client above and beyond payment – generally access to personal, data, systems as well as periodic participation in reviews, testing and eventually implementation. The quoted fee will usually include specific expenses as well as JigZaw’s time investment (i.e. hardware purchases, software licenses, etc).
JigZaw usually retains ownership of the code we develop, though we provide a full and perpetual license to the client who pays for the development – alterntive payments/fees can be quoted if full, exclusive ownership of IP both code and/or patents is desired by the client.
This is not always, indeed usually, a simple or very quick process. Completing it requires a deep level of understanding on the part of the client and JigZaw, it also requires an investment of time and effort in the development of the requirements – depending on the time and scope this may be a for-fee service by JigZaw.
That said, it results is significently better software (or other deliverables – not unique to software development) than most other processes. When putting together JigZaw’s quote JigZaw will take into account what we expect to require in terms of staff, outside resources and time – however we prefer to work on a fixed fee basis as this aligns our incentives directly with those of our client. If we complete the application early, we have time to add additional features or do further iterations of testing and enhancement. Within this framework we can then also enter into very clear discourses with the client as change occurs. Specifically testing of deliverables as well as changing business environments often result in changing requirements in the typical project. In the JigZaw framework, this then becomes a discussion of what has to give to meet the promised deadline – i.e. are other “must haves” now “nice to haves?” or can more money be spent on the project? (i.e. more time).
We work to make sure that it is an informed discussion with our client about the tradeoffs of adding new features/requirements to the project. Not uncommonly our project plans have built into them expectations of some aspects being determined during the course of the development process – but we still work to confine these in known ways that will not result in higher costs for either JigZaw or our client.
At the end of the day, our goal is to deliver very high business value to our clients – in a way that is profitable to JigZaw and which will allow us to continue to support and serve our clients in the future.
2b [updated evening 1/3/2004] JigZaw research/analysis/troublshooting process – aka “Smokejumping”
One of my personal favorite type of consulting engagements is to do what one client referred to as “smokejumping”. That is, jump into what is often a failed application or project, quickly research and analyze it, identify what could be the problem and then set up a course of action to test, further analyze (if needed) and repair the problem. One project involved an application which had gone live on a Friday and failed and needed to be working by Monday. Friday evening I spent going over the source code, met with the developers on Saturday and worked through the code with them to identify the fault, had a solution by Sunday and the application was working on Monday.
For these engagements, whether a weekend or 6 months, I follow the same general process. First as much research as I can (and have time) – reading source code where available, seeing the application(s) in action, meeting with users, meeting with developers, reading over project documentation (if any). All with the goal of understanding three things: what the problem(s) are; what the application should be doing; what and how it was developed and designed (rarely matching what it should be doing).
With those three items understood, then I seek to understand what the constraints are in both the troubleshooting and resolution process – how time critical the issue is (generally if I have been brought it, there is a high degree of time criticality); what restrictions there are around any solution I propose – windows to make changes, usage requirements, hardware/software capacity/licensing; what the budget is as well as what the cost impact of the problem is (not always easy to determine but critical in helping evaluate and recommend the best course of action beyond hiring me to investigate the problem(s).
In troubleshooting what I typically do is seek to understand the baseline – how the application was working/being used or more often what was intended. Then I look for potential points of failure – in the process or system as a whole, in the data, in the use case(s), sometimes in the programming logic itself. Often I encounter situations with insufficient information or logs – sometimes it is possible to begin to generate those logs and use them for further analysis (though other times there is insufficient resources/time/capacity to do so. Not uncommonly a problem appears to be semi-random, however frequently by observing the users and talking with them before, during and after a problem occurs I am able to begin to significently narrow down the scope of the search for the cause – i.e. what parts of the application were being used, in what order, what should have been happening – which can then be contrasted with the logs of what did happen to the extent that they exist.
One of my somewhat random skills is often to pick out the disonate aspect within a pattern of data – a small comment from a user, an odd entry in an error log, a bit of source code that has flaws, etc. In smokejumping this skill is key – as important as anything is looking for potential causes to a problem.
3. Explore “network economics”. This is likely more accurately explored in a book or perhaps in a series of long form studies/lengthy essays. In short, it is my theory of a new way of fundementally thinking about economics – not in terms of assets and money but in network terms of connections and links between nodes. Importantly this means modeling “value” in a very fundementally different way. It also requires a new way of looking at both macro and micro economics – whether analyzing the economics of a business or global finance. To explore this in detail likely requires me to write a series of detailed studies – arising from my fundemental observations are many further reexplorations of both classic and new economic issues. I strongly suspect that this new perspective does not render invalid or useless economics as a whole, though it likely results in some new avenues of analysis as well as new solutions to some old problems. It also very likely results in a very different set of math in the study of economics as the study of networks is a study of chaotic systems.
My thesis is that economics is networks – that everything can be understood in terms of networks, in terms of connections between nodes – from people to other people to nodes that are corporations, organizations or governments. Money itself can be modeled as a network link – from the holder of the note to the central government.
In this set of studies I will consider networks as being composed of “nodes” and links. Links are unidirectional from one node to another. A link can be modeled to include a measure – that is once a unit of some form is defined multiples can be modeled as a link of a particular “strength”. Links also have a duration in time – most being short-lived, but others perhaps being of longer duration. Importantly if given a link from A to B. Node A knows of the link to B, while B may not – that is, a link can be anonymous (i.e. if the holder of doller bill has a link of unit 1 to the US Treasury, the US Treasury does not “know” of the holder – but the holder of that dollar does “know” of the link to the Treasury.
It then is the case that in “solving” a complex economic problem – the modeling involves networks over time with some links existing for long durations, while others exist only for a short while (generally during a transaction of some form). Using this form of modeling there is a theoretical difference between cash and credit transactions for example – especially if the “credit” involves third-parties.
Much more to explore, and I hope to do so in 2005.
4. Applying network models to AI and other challenges that I know. This is a bit less well formed, but it appears to me intuiatively that my interests in AI (specifically data extraction) and in Networks (generally as well as specifically in respect to economics) share related traits around the modeling of complex and very large data sets – and that the same techniques used in AI to avoid the non-full-set problem (that is how you handle clustering results into logical groups when you do not have the full set of all known data – i.e. most real world systems where new data sources are being created as fast or faster than the system can analyze them – and where you need to be able to avoid having to reanlyze all previous data in light of each new set of data that appears – otherwise the process of analyzing say 1000 documents and then each new document could easily become exponentially difficult as new documents are added.
5. My critique of the Semantic Web. While I am interested in the potential of some aspects of the Semantic Web, I also find that it is flawed in a fundemental way. This flaw is also all to common within the computer science world and even inside of the corporate development world. The flaw is the assumption that it is possible to know in the present what data (and specifically what metadata) will be needed to be known in the future. This may seem obviously impossible – but still most developers attempt it and it is inherent in many ways to the goals of the Semantic web. This impossibility I argue is philosophical in orgin – the future is inherently and literally unknowable.
In response I would argue for an alternative and perhaps simplier approach to data and especially to metadata. One that results in changes to how software is specified and developed and one that challenges many existing software development and standards initiatives. Howeve,r I also think that this simplier approach can and does result in deliverable applications, designed for growth and utility into the future, and designed in ways that build inherently into the applications an understanding of the users of the application in the present as well as into the future. Users being both humans and other applications and systems.
6. Ongoing reporting on and observations on the technology industry. Where I think things are, where things are going. Here I hope to build on my now over a decade of being an active and “power” user of the Internet (managing my own servers online since 1991). I strongly feel that lessons learned back in the “early” days of the Internet are being revisited and relearned, though on a growing scale, with each new “advancement” of the technology of the Internet. Muds and Mucks led to modern multiplayer games. Usenet and Gopher space to the web, Usenet and BBS communities as well as IRC channels to many of the aspects of online communities currently on the web and in mailing lists (where they have been for many years if not decades now).
I am a third-generation computer/technology person. My grandfather programmed some of the earliest computers while he was at Douglas, Rand Corporation and Aerospace corporation designing aircrafts and space components. My mother has been a programmer since the late 1960’s – writing early systems to run railroads and universities (including an early experiment in ecommerce in the 1970’s connecting student records to a university bookstore). I grew up reading flowcharts and taking programming classes since the third grade. For me, as for many more people into the future, technology and computers have been a core aspect of my world.
Besides these potential topics, I want to expand my consulting to start building up case studies for these and other studies. I want to help companies (and/or organizations) who can afford to hire me (and JigZaw) to apply and use technology in a value-added manner. I strongly feel that many, perhaps most, organizations do not use technology as effectively as they could. But that this is not, at its root, a case of technology not being capable – but rather it is a combination of often poor analysis and understanding of the underlying business needs and environment and poor implementation of the technology that is chosen – whether custom development, configuration/customization of packaged software or mostly “pure” packaged applications.
Most businesses can “solve” or address their business requirements and needs via an almost countless set of potential approaches. What is chosen is frequently determined by historical circumstances, inherited capacities and not in frequently less than skilled implementors and/or users.
What I can offer is an outside perspective on both business problems and the systems that have been implemented to address them (and/or I hope the systems that are being considered to address them). Working on a fixed fee basis (often, I hope and think would work well on a retainer basis) I can assist clients in evaluating what is currently in place as well as in addressing future requirements and needs.