Tuesday, November 15, 2005

Visualisations for a Tagging based system

Visualisation of Search Results of a Tag based Search


This is the visualisation of the Search Results generated for the tag 'Folksonomy'. Here each dot represents one result ; and the results are sorted by the no. of tags on the result. The violet tags are the ones already existing in the user's bookmarks list. Only the 'unique tags' (if any) for every result are shown.

Also, only the first say, 40 results are shown , as most users would be interested in these only. The remaining results can be viewed by clicking at the arrows at the end of the Spiral. Further, the name of the url represented by tht dot , along with all its tags can be seen on rolling over that dot.

I feel the beauty of this visualisation is tht it gives a Systems level view of all the Search results, in relation to where they stand in comparision to the other results.

Visualisations for a Tag Match

During my Research on tagging based systems I came up with an idea
of making use of Tags as the profile indicator of a user. This tag
matching can be used to find people with similar interests as yours and
you can then subscribe to their bookmarks list (for ex. In delicious) or
just for social networking.




The idea is that you match your tags with all the users of the system , and the results are displayed as the percentage of tags
matched with that user (here each dot represents a user).
The innermost ring is for users with >90% tag match ; the next ring is for
users between 80 to 90% match and so on. Also the system should be intelligent enough to analyze the distribution of the tags matched. The
angle at which these dots lie indicates that out of the tags matched, the dominant tags in the other user’s library are those lying on the outermost
periphery. What this means is that the users lying in the first quadrant (and the innermost ring) are the most relevant ones for you, since their
dominant tags are also ur dominant tags , and as you move clockwise ( even within the same ring) the matching with users decreases, as their dominant
tags are ur not so dominant ones.

Thursday, October 13, 2005

Analysis of Yahoo! Podcasts

Technological Issues:

The biggest problem with hearing to podcasts is that due to limited bandwidths (which is the case with majority of the users). The podcast is not heard as a smooth, uninterrupted recording, but heard in chunks of 3-4 seconds. Hearing to such short bursts, and that too in a foreign accent, makes most podcasts utterly incomprehensible, and an irritating User Experience.

Instead, if these podcasts are buffered somewhere on the Client side, and delivered in chunks of say, 1 min, or 5 mins (decided by the user), it would be a much more satisfying experience. And while the user is listening to the first chunk, the next chunk simultaneously gets loaded. The user would be willing to wait for a brief period of time, if he gets a seamless experience later. In its present state, only the few podcasts in the popular category ( which are stored in the cache on Yahoo servers) offer a certain degree of seamlessness.


Even if a user does not listen to the Podcast directly from the website, and decides to download it, he or she would like to listen to a few minutes of recording atleast, to decide whether he would like to download it or not.



Usability Issues:

Rating:

Most Popular :

This category does not make it evident to the user, about what is the basis of this popularity, and on what basis are the results sorted. Is the popularity on the basis of no. of subscribers, no. of downloads or what?

Highly Rated:

Same is the problem with this category. Firstly, the results are not sorted on the basis of ratings, and secondly, almost all the podcasts in this category have a five star rating. So, a user does not know on what basis are these podcasts sorted. Probably, the no. of users which have rated the podcast ( and thus contributed to the overall average rating) would be a good measure of the reliability of the rating, and a good yardstick to sort the results.

Podcasts within a particular category:

Again, within a category, its not clear on what basis are the Podcasts sorted. They don’t seem to be on the basis of ratings, or no. of subscribers or any other popularity parameter.


Rating of Series:

Its not evident how does the rating system of the podcasts work. Whether or not the rating for a series is affected by the rating of its individual episodes. If yes, than how?


Content:

Information about the content of the podcast:

All the podcasts have a few words from the Introductory line, to tell the user what the podcast is about. In most cases, these few words are totally meaningless and superficial and don’t convey the information about the content at all. Instead,tags allocated by the listeners to the podcast give much more relevant info about it in just as many words. So, probably, the top 5 tags for that podcast would be much more effective in conveying its content.

Same is the case with the Podcasts in the ‘new and noteworthy’ and ‘staff picks’ categories. The big promotional images for the podcasts tell nothing about their content and only add to visual clutter.

Tagging of Podcasts:

Along with putting all the Tags for a series together as one cluster, it would also be helpful to give the tags associated with each episode along with the episode, to know which specific episode has that content specified by the tag.


Thus the system should allow putting not only series specific, but episode specific tags as well, which would be a more efficient way of tagging as far as searchability of content is concerned.


Accessing of specific content within the Podcast:

As the no. of episodes within a series increases (already a few series have more than 100 episodes), it would be convenient to allow searching for a particular tag (from within the cluster of tags assigned to the entire series), within a series itself and kind of filter out only those episodes from a series which have that tag.


Further, many podcasts are over 1-2 hours long. How to search for specific content within the episode is an issue which needs to be addressed. Probably the episodes need to be Time-tagged i.e. tagged along the timeline, with the episodes divided into smaller chunks according to the content. Further, the first few seconds of each such chunk can also be combined together to kind of form the Headlines or Highlights of the podcast, just like in any news show on TV. The user can then just listen to these Headlines to get a brief summary of the Podcast, and then choose to hear or download the full Podacst, or specific chunks of it.

Scalability of the System:

As more and more users start tagging the Podcasts, there would be tremendous pollution in the Tag Library, simply because tagging in its present form is a free text entry kind of system, where any user can give any tag. So synonyms, acronyms, misspelt tags etc. will become major hinderances to searchability of information and reduce the effectiveness of the system unless accounted for. This problem will worsen for systems like Yahoo Podcast which allow multi word tags than systems like del.icio.us ( a social bookmarking tool), which only allow single word tags.


For ex., tags like ‘Tech’ and ‘Technology’ are very similar, as tech is an acronym of technology. But when a user searches for all podcasts about ‘technology’ (in reality, all podcasts about ‘tech’ and ‘technology’), he only gets podcasts tagged with ‘technology’ and not ‘tech’.


Thus, the challenge is how to make the system realize that synonyms like ‘movie’ and ‘film’ , acronyms like ‘technology’ and ‘tech’ and singular and plurals like ‘computer’ and ‘computers’ are similar and therefore the podcasts tagged by such tags are also displayed in the search results.


Monday, October 10, 2005

my tryst with Info Viz....



over the past few months, I hv become keenly interested in Information visualization….had a course on it recently, and became hooked to it…


the first crash assignment was on visualizing the quarterly financial reports of IDFC…found it to be fascinating….but a huge struggle at the same time….majorly because my knowledge about finance was a big ZERO….to understand the jargon of the financial world, understand tht data, sieve through it ,extract the relevant info and make it comprehensible to the average person……….all of this in half a day was a task indeed!!.....Imagine visualizing….”Paid Up Equity Share Capital”…and u’ll know wht v were up against!!...


anyways, v came up with a Charles Minardish representation for the entire flow of the capital within IDFC…..right frm the Capital employed to Profit After Tax..



Our major assignment was sorting and categorizing through 1800 examples of Info Viz from books, magazines and the web…and come up with a classification scheme of our own…After a lot of brainstorming, we came up with a scheme of analyzing any Infographic along 5 axes:



Tools used

Methods used

Purpose of Representation

Type of Content

Way of Representation

The basic idea was to come up with a Designer’s guide to Infographics…and we tried to look at any Infographic lying along an n-dimensional space , having co-ordinates along each of these axes


Tools used

Color coding, Proximity, Typography, Size, Shape, Form, Layering, Symmetry


Methods Used

Level of Abstraction, Layering, Scaling, Choosing an axis- Temporal or Spatial

The tools and methods are generic, and go into the making of any Infographic, independent of the Content, Purpose or Way of Representation.


Purpose of Representation

Instruction- Images whose purpose is to instruct or teach; whether it be the details of an object or a process.

Comparision- Images whose purpose is Comparision; whether it be quantitative or qualitative data.

Revealation- Images where the data is revealed because of visualization. Ex. X-ray images.

Relationship- Images which visualize a (non hierarchial) relationship between 2 or more entities.

Type of Content

Quantitative- Content which is factual and quantitative in nature

Descriptive- Content which is describing an object or phenomena

Flow/Process- Content which has a temporal element to it i.e it flows along time. For ex. A story, a process etc.

Locational- Content which is locational in nature i.e changes with geographical location

All these assignments were done as a group work.

Saturday, October 08, 2005

Folksonomies....my 2 penny worth of bit about it....

Timing in life is important….never have I realised it so much, than in the past few months.When I had started my Systems Design Project on Tagging in July, no one (atleast in the IT savvy community of Gandhinagar) had even heard of it and now, barely 3 months later, it already seems like an overused word. Tagging this and tagging that, blogging this and podcasting that. Anyways, I have realized what an enormous advantage being the first player in the field can be…

To cut a long story short, I worked on Information systems for the web for my Systems Design project-specifically, Information system for Web 2.0 wherein I studied Tagging based Systems aka “Folksonomy” . I analysed the social bookmarking system of del.icio.us and needless to say, since folksonomies are just in their nascent stages and have to evolve much more before they can become robust, stable systems, there were loads of issues which needed to be solved. I looked into many such issues and how they can be addressed.

Existing System:

Input

  • Free text entry for tags
  • Tags are space delimited(therefore multiple word tags are not allowed)
  • Every user has his own way of creating compound tags(+ or / or _)
  • No synonym/acronym control
  • No singular/plural control
  • No guard against wrong spelling
  • Different kinds of tags all entered together
  • Types of Tags:

  • Attribute of the media (Ex.article,blog,reference,tutorial,resource,tools)
  • Identity/Affinity of the item (being tagged) (Ex.apple,java,microsoft,ajax,mobilephone,xml)
  • Attribute of the item(based on emotional response) (Ex.interesting,cool,funny,free,weird)
  • Action to be taken on the item
  • (Ex.todo, toread,read_later)

Pollution in the Tag Library:


  • Ambiguous Tags (Ex. apple,filter)
  • Synonyms (Ex. film,movie,cinema)
  • Acronyms (Ex. mac,bday,frnds)
  • Singular/Plural (Ex. blog/blogs, flower/flowers, film/films)
  • Wrongly spelt (Ex. friend/freind,design/deign)
  • Compound Tags (Ex. sanfrancisco+museum, sanfranciscomuseum, united_kingdom)

Output

Browse
Tag clouds (All types of tags;sorted by popularity)
Search
By Tag(only)
Search results sorted by Time(only)
Compound tags are invisible to search
The more narrow domained/specific the tags get, the more the search results are repeated
Search results of Synonymous tags not given




These are a few of the major problems with folksonomies:

  • Folksonomies imply a lack of precision for the variability of language.
  • Proposed tags have no hierarchy.
  • Folksonomies have a very low findability quotient. They are great for serendipity and browsing but not aimed at a targeted approach or search.
  • There is no synonym control in the system.
  • Different word forms, plural and singular, are also often both present. Acronyms also create problems

And here are some of the Design solutions for them ( a few are my own, and some have been proposed by others) :

(I believe in Open Source........after all I hv reached this far by standing on the shoulder of giants, as Newton said........and thts wht Web2.0 is all about, anyways...)

Faceted Folksonomies:

Introducing facets to existing tagging system:

  • By Tag
  • By url

-This will be mainly of use to the webmaster(s) administrating that url, to see the relative popularity of their pages, and the cognitive model of their website in the users’ mind, by analysing the tags assigned to each page:

  • By User

-By searching the screen name of the user (if known)

-By Tag Matching

  • By Time

-All search results can be sorted by time, to see the most recently tagged item under a given tag

Countering Tag Pollution:

  • Reducing tag pollution at input level itself, by classifying tags into ‘Subject tags’ and ‘Additional tags’, where

Subject tags = Identity/Affinity of the item

Additional tags = Attribute of the media

Attribute of the item

Action to be taken on the item

  • Synonymous Tags

- If there is an exact (or close to exact) correlation between the occurrence of Tag A and the occurrence of Tag B [ P(AB) ~ 1 ]

  • Singular/Plural Tags or tags with different forms of the same word

Tag Stemming (remove common endings from words, leaving behind an invariant root form)

  • Ambiguous Tags

Analyzing the neighbouring tags of a tag to understand its context of use

  • Misspelt Tags

Spellcheck (during Tag input)

  • Compound Tags

Making tags comma delimited instead of space delimited, thus allowing multiword tags


Tag Match:

  • Using Tags of a user as his/her profile indicator
  • In order to find another user with the same “profile” as mine, the system just matches my tags with tags of all other users of the system
  • Search results are sorted according to the percentage of match ( along with a list of the top most popular tags of mine which matched with that user)

Supporting Targeted Search:

  • Synonym tags are also accounted for while searching.
  • Tags in the Search results of 1 tag can be used to further refine the search in that domain.
  • All search results are sorted by the no. of people who have tagged a url with that particular tag, a greater no. of tags with that keyword showing the greater relevance/popularity of that page in the users’ mind.
  • If a user already has items tagged with the tag which is searched, then those are displayed with a tick mark, to avoid him from opening the already tagged links again.
  • Users can directly copy an item into their bookmark list from the search results itself without bothering to open the link, if they so desire.

Tag Tree (representation of tags of any user as a tree):

  • Relative popularity of tags shown by variable font size
  • Different forms of same tag collapsed into 1 tag
  • Degree of correlation between words represented by their closeness in space and colour of the connecting line
  • The most popular/dominant tags are the nodes around which the tree is formed
  • The clusters (if present) are shown separately, with the cluster name forming the central node
  • Tree can be formed either centered around 1 tag (entered by the user), or all tags
  • Also, the tree can be formed using only the related ‘Subject’ tags, or ‘All’ related tags

Inverse user- tag mapping

(given a tag, who all are the users using it?)

Supporting joins

(showing tags/tag cloud of raina_saboo + rashmisinha)

long overdue....

never really wanted to start writing a blog.....definitely not one for public view....had this firm notion that regular blog writers start seeing every moment in their lives through the 'Blog Lens'...,which i still believe in....their lives become an endless series of documentations...which somehow despecializes those moments.....something akin to what the digital camera has done.....photos are just not special anymore......i mean, when u know every moment can ( and frequently, is) captured, whts the big deal....and in this entire process of 'photographing' a moment...the real charm of just experiencing it.....somewhr gets lost....

then, during one of these arguments about the pros n cons of blogging, with a novice blog writer, he said something which made a lot of sense....the use of blogs, not in documenting your life ( which no one is interested in reading about anyways, except u) , but using them in documenting your ideas.... i mean........in this world spinning at a maddeningly high speed....and being a part of the designer community ( whr an IDEA can really change ur life...:) ).... it suddenly dawned upon me tht timely documentation of ideas is crucial....

so this blog is basically a proof of my ideas n work.....an attempt to establish "been there, done that".........besides my occasional grumbling and musings about life in general....and anything arbit enough to hold my attention in particular....

m not expecting anyone to read this blog as such.....if u accidently stumble upon it...gr8....if u dont, then it was not meant for that purpose anyways...