Saturday, October 08, 2005

Folksonomies....my 2 penny worth of bit about it....

Timing in life is important….never have I realised it so much, than in the past few months.When I had started my Systems Design Project on Tagging in July, no one (atleast in the IT savvy community of Gandhinagar) had even heard of it and now, barely 3 months later, it already seems like an overused word. Tagging this and tagging that, blogging this and podcasting that. Anyways, I have realized what an enormous advantage being the first player in the field can be…

To cut a long story short, I worked on Information systems for the web for my Systems Design project-specifically, Information system for Web 2.0 wherein I studied Tagging based Systems aka “Folksonomy” . I analysed the social bookmarking system of del.icio.us and needless to say, since folksonomies are just in their nascent stages and have to evolve much more before they can become robust, stable systems, there were loads of issues which needed to be solved. I looked into many such issues and how they can be addressed.

Existing System:

Input

  • Free text entry for tags
  • Tags are space delimited(therefore multiple word tags are not allowed)
  • Every user has his own way of creating compound tags(+ or / or _)
  • No synonym/acronym control
  • No singular/plural control
  • No guard against wrong spelling
  • Different kinds of tags all entered together
  • Types of Tags:

  • Attribute of the media (Ex.article,blog,reference,tutorial,resource,tools)
  • Identity/Affinity of the item (being tagged) (Ex.apple,java,microsoft,ajax,mobilephone,xml)
  • Attribute of the item(based on emotional response) (Ex.interesting,cool,funny,free,weird)
  • Action to be taken on the item
  • (Ex.todo, toread,read_later)

Pollution in the Tag Library:


  • Ambiguous Tags (Ex. apple,filter)
  • Synonyms (Ex. film,movie,cinema)
  • Acronyms (Ex. mac,bday,frnds)
  • Singular/Plural (Ex. blog/blogs, flower/flowers, film/films)
  • Wrongly spelt (Ex. friend/freind,design/deign)
  • Compound Tags (Ex. sanfrancisco+museum, sanfranciscomuseum, united_kingdom)

Output

Browse
Tag clouds (All types of tags;sorted by popularity)
Search
By Tag(only)
Search results sorted by Time(only)
Compound tags are invisible to search
The more narrow domained/specific the tags get, the more the search results are repeated
Search results of Synonymous tags not given




These are a few of the major problems with folksonomies:

  • Folksonomies imply a lack of precision for the variability of language.
  • Proposed tags have no hierarchy.
  • Folksonomies have a very low findability quotient. They are great for serendipity and browsing but not aimed at a targeted approach or search.
  • There is no synonym control in the system.
  • Different word forms, plural and singular, are also often both present. Acronyms also create problems

And here are some of the Design solutions for them ( a few are my own, and some have been proposed by others) :

(I believe in Open Source........after all I hv reached this far by standing on the shoulder of giants, as Newton said........and thts wht Web2.0 is all about, anyways...)

Faceted Folksonomies:

Introducing facets to existing tagging system:

  • By Tag
  • By url

-This will be mainly of use to the webmaster(s) administrating that url, to see the relative popularity of their pages, and the cognitive model of their website in the users’ mind, by analysing the tags assigned to each page:

  • By User

-By searching the screen name of the user (if known)

-By Tag Matching

  • By Time

-All search results can be sorted by time, to see the most recently tagged item under a given tag

Countering Tag Pollution:

  • Reducing tag pollution at input level itself, by classifying tags into ‘Subject tags’ and ‘Additional tags’, where

Subject tags = Identity/Affinity of the item

Additional tags = Attribute of the media

Attribute of the item

Action to be taken on the item

  • Synonymous Tags

- If there is an exact (or close to exact) correlation between the occurrence of Tag A and the occurrence of Tag B [ P(AB) ~ 1 ]

  • Singular/Plural Tags or tags with different forms of the same word

Tag Stemming (remove common endings from words, leaving behind an invariant root form)

  • Ambiguous Tags

Analyzing the neighbouring tags of a tag to understand its context of use

  • Misspelt Tags

Spellcheck (during Tag input)

  • Compound Tags

Making tags comma delimited instead of space delimited, thus allowing multiword tags


Tag Match:

  • Using Tags of a user as his/her profile indicator
  • In order to find another user with the same “profile” as mine, the system just matches my tags with tags of all other users of the system
  • Search results are sorted according to the percentage of match ( along with a list of the top most popular tags of mine which matched with that user)

Supporting Targeted Search:

  • Synonym tags are also accounted for while searching.
  • Tags in the Search results of 1 tag can be used to further refine the search in that domain.
  • All search results are sorted by the no. of people who have tagged a url with that particular tag, a greater no. of tags with that keyword showing the greater relevance/popularity of that page in the users’ mind.
  • If a user already has items tagged with the tag which is searched, then those are displayed with a tick mark, to avoid him from opening the already tagged links again.
  • Users can directly copy an item into their bookmark list from the search results itself without bothering to open the link, if they so desire.

Tag Tree (representation of tags of any user as a tree):

  • Relative popularity of tags shown by variable font size
  • Different forms of same tag collapsed into 1 tag
  • Degree of correlation between words represented by their closeness in space and colour of the connecting line
  • The most popular/dominant tags are the nodes around which the tree is formed
  • The clusters (if present) are shown separately, with the cluster name forming the central node
  • Tree can be formed either centered around 1 tag (entered by the user), or all tags
  • Also, the tree can be formed using only the related ‘Subject’ tags, or ‘All’ related tags

Inverse user- tag mapping

(given a tag, who all are the users using it?)

Supporting joins

(showing tags/tag cloud of raina_saboo + rashmisinha)

1 comment:

zeevveez said...
This comment has been removed by a blog administrator.