Tuesday, March 30, 2010

Context-based page unit recommendation for web-based sensemaking tasks

Authors:
Wen-Huang Cheng National Taiwan University, Taipei, Taiwan Roc
David Gotz IBM T.J. Watson Research Center, Hawthorne, NY, USA

Paper (pdf):
http://portal.acm.org/citation.cfm?id=1502650.1502668&coll=ACM&dl=ACM&type=series&idx=SERIES823&part=series&WantType=Proceedings&title=IUI&CFID=81639924&CFTOKEN=12013848

Summary:
This paper explores the authors program InsightFinder, a web tool that aides in connection discovery during sensemaking tasks, as well as provides details to the algorithm, interface, and user study behind it.

The program they developed is called The InsightFinder and is described as 'a smart web-browsing system that assists in connection discovery during sensemaking tasks by providing context-based page unit recommendations.' The sensemaking that they refer to is the process a user faces after they have selected a website such as from a browser query and they are scanning through the website to find the information that they were looking for and then connecting that information to other relevant data from other websites. One of the examples that they gave was someone who was relocating to a new city. Some of the things they might research is an apartment complex within their price range, relative location to a day care for their children, and also proximity to their place of work. Rather than have to search all of these out individually and accumulate a series of notes that illustrate all the different possible solutions the program InsightFinder would do this for you, linking these different websites together.

Below are the properties that a tool of this nature should have:
  • Site Independence: A sensemaking tool must be independent of any specific site or content provider to allow cross-site connection discovery.
  • Note-Taking Functionality: A sensemaking tool should allow for the collection of information fragments into a task-specific workspace to help users organize their findings across multiple sessions and sites.
  • Assistance in Connection Discovery: Most critically, a sensemaking tool should assist the user in performing the difficult process of uncovering connections between their notes and what is currently being explored in their browser.
The authors went into detail describing the algorithm behind the program, and I've included an excerpt that briefly describes this.
The insight loop is
triggered directly through the InsightFinder interface,
which provides tools for users to record or organize their
notes. As the user’s notes evolve, the InsightFinder
maintains a context model which represents the user’s captured data. The exploration loop occurs while users
interact with the normal browser interface. As users
navigate the web, the InsightFinder performs a series of
steps each time a new page is visited. At the conclusion
of both loops, the InsightFinder provides a ranked list
of recommended web page fragments that are most relevant
to the content in the user’s notes. To provide this functionality, the architecture includes modules for interface management, content extraction, context model
management, page segmentation, and relevance computation.

Below is a screen shot of InsightFinder, followed by another illustrating the ability to take notes.

Figure 2. A screenshot of the InsightFinder system.

Figure 3. Users can record notes by dragging content
fragments (links, images, text, or entire pages) from the
browser to folders in the InsightFinder.

The last part of the paper went into detail describing the user study they performed and the results. Their program ran as a sidebar on mozilla firefox using XUL for the interface and java/javascript for the computational components. Overall their program was proved to work as it was suppose to and it did indeed improve sensemaking tasks by reducing the amount of time required to perform them, on average a reduction of 30 seconds. In their possible future work they mentioned extending the granularity of their node-weighing as well as improving the note taking capabilities.

Discussion:
The InsightFinder that the authors developed was a good novel program. I hadn't thought much of tools aiding in finding connections between websites based on notes and so forth. I have only ever used a search engine and trial and error to find more specifically what I was looking for. I think a program like this would be a good extension for web-browsers. It may not be used on a day-to-day basis but when doing things such as research I can see how this would come in handy, scanning an accumulation of notes to suggest/recommend websites to view.

Thursday, March 11, 2010

Is the sky pure today? AwkChecker: an assistive tool for detecting and correcting collocation errors

Authors:
Taehyun Park University of Waterloo, Waterloo, ON, Canada
Edward Lank University of Waterloo, Waterloo, ON, Canada
Pascal Poupart University of Waterloo, Waterloo, ON, Canada
Michael Terry University of Waterloo, Waterloo, ON, Canada

Paper (Mov and Pdf):
http://portal.acm.org/citation.cfm?id=1449736&coll=ACM&dl=ACM&CFID=81067528&CFTOKEN=37358406&ret=1#Fulltext

Summary:
The purpose of this paper was to describe the AwkChecker program created by the authors that detects improper word phrases so that they can be replaced with the more commonly used phrases. In the paper these phrases are described as Collocation Preferences, which include such things as commonly used expressions, idioms, and word pairings. The goal for this program was to aide non-native speakers who would most likely encounter problems with these collocation preferences. The AwkChecker program works on the basis of a webbased text editor that flags collocation errors and suggests replacement phrases.

The paper also went into detail describing the language problems that Non Native Speakers (NNS) encounter as opposed to what Native Speakers generally encounter. One of the points they were trying to make with this is that the majority of English speakers, roughly 70%, are also NNS and as such there is a great demand in language tools to assist NNS. The authors created their program based on a guideline for NNS language tool design from Knutsson et al. Below is the guideline as described by this paper:
  • Real-time feedback is always desirable, especially
    since it helps one improve one’s understanding of
    the language as it is produced
  • Tools should not only indicate what is wrong, but
    also provide sufficient information (e.g.,
    examples, grammar rules, etc.) so that users can
    reason about the error and its solution
  • The tool should be transparent with respect to its
    capabilities and limitations; users should
    understand what it can and cannot do
  • The tool should not be too technical with its
    terminology and should avoid linguistic terms
  • Users should be able to focus on producing
    content, not on low-level details such as spelling,
    grammar, etc. That is, the tool should not distract
    from their primary goal of communication
The paper goes on to describe L2 Error Detection Tools and L2 Tutoring Systems, citing many recent work in the fields. The last portion of the paper went into detail describing the functions and algorithms involved in the AwkChecker program.

Discussion:
The AwkChecker program seems like a great step forward in linguistic tools. As a native English speaker I generally don't encounter collocation errors but having worked with non native speakers I have seen the use for a program such as this. I think that a system like AwkChecker would be a great tool to use in any text editor, something to go with already existing tools like spelling and grammar checking.

An interface for targeted collection of common sense knowledge using a mixture model

Authors:
Robert Speer MIT CSAIL, Cambridge, MA, USA
Jayant Krishnamurthy MIT CSAIL, Cambridge, MA, USA
Catherine Havasi Brandeis University, Waltham, MA, USA
Dustin Smith MIT Media Lab, Cambridge, MA, USA
Henry Lieberman MIT Media Lab, Cambridge, MA, USA
Kenneth Arnold MIT Media Lab, Cambridge, MA, USA

Summary:
This paper discusses a common sense knowledge gathering system constructed by the authors which is perceived as a 20 Questions type game while at the same time gathering information from its users. Below is an example of their program running:

Figure 1. Open Mind learns facts about the concept “microwave oven” from a session of 20 Questions.

The reason behind using a '20 Questions' game to collect this common sense knowledge is based on studies showing that users are not willing to freely contribute information unless they can be enticed somehow such as by entertainment. There have been some previous common sense acquisition games including Peekaboom and the ESP Game, both of which pitted two users against each other in an attempt to label images with the same description. The ESP Game focused primarily on generic labels for images while Peekaboom focused on particular components of images. There have also been a couple that work with matching phrases and words that generally correspond or describe each other, such as 'horse' and 'it has a hoof'. A couple examples of these games are Verbosity and Common Consensus.

The model that the authors used to collect common sense knowledge was built on a concept/relation representation similar to that of ConceptNet's data model. With these models they could determine certain 'features' which could simplify the algorithm in their 20 Questions game, where a features is described as 'A feature is a pairing of a concept with a relation which forms a complete assertion when combined with a concept.'. Through these features the authors were able to graphically show the AnalogySpace of these concepts and relations in a clustering model.

The authors also went into great detail in demonstrating equations and algorithms behind their common sense acquisition models. Below is another example of their game running:

Figure 3. Using the 20 Questions interface to develop a concept.

Later on in the paper they started to discuss some of the interface design objectives for their system. The primary goals are listed below:
  • Feedback, in this case the authors want a system that will provide what the computer is currently thinking so that the user can see how their responses are directly affecting the computers deterministic approach.
  • User Enjoyment, they just want the interface to be as enjoyable as possible to keep users interested in playing.
  • Minimalism, the game shouldn't be a stand alone or stand out in any situation, but should be there when needed and run seamlessly with the website.
  • Effortless Acquisition, in this case they don't want users to feel that they have to work at providing information, but instead it should appear 'effortless'.
A user study was done in which the authors presented an online comparison between the current manual OpenMind interface and the new designed 20 Questions. Users operated each system and afterward were asked a sample of questions to determine their enjoyment and the effectiveness of each system. The results of the study showed that the 20 Questions system out-performed the current OpenMind one on such fields as “I would use this system
again”, “I enjoyed this activity”, and “The system was adapting as I entered new information”. Apart from that the 20 Questions system took considerably less time to complete as seen in Figure 8.

Figure 7. The mean and SEM of the survey responses, grouped by test
condition.

Figure 8. Themean and SEMof the elapsed time to complete each task.

The results of their study and the conclusions they drew were that with this interface users will be more willing to contribute data, and that this will lead to better knowledge acquisition.

Discussion:
The overall point of this paper was that the authors designed a new interface for data acquisition to replace the current OpenMind one, and that their new system is based off of the '20 Questions' game. Despite this relatively simple point, they somehow found a way to express this through 10 pages. I did like that they modeled their system after a game, because it is quite normal to expect people to not want to contribute unless they can get something out of it, in this case some mild entertainment. Overall I did not find this paper interesting, but perhaps it has uses I can't foresee.