Timing in life is important….never have I realised it so much, than in the past few months.When I had started my Systems Design Project on Tagging in July, no one (atleast in the IT savvy community of Gandhinagar) had even heard of it and now, barely 3 months later, it already seems like an overused word. Tagging this and tagging that, blogging this and podcasting that. Anyways, I have realized what an enormous advantage being the first player in the field can be…
To cut a long story short, I worked on Information systems for the web for my Systems Design project-specifically, Information system for Web 2.0 wherein I studied Tagging based Systems aka “Folksonomy” . I analysed the social bookmarking system of del.icio.us and needless to say, since folksonomies are just in their nascent stages and have to evolve much more before they can become robust, stable systems, there were loads of issues which needed to be solved. I looked into many such issues and how they can be addressed.
Existing System:
Input
- Free text entry for tags
- • Tags are space delimited(therefore multiple word tags are not allowed)
- • Every user has his own way of creating compound tags(+ or / or _)
- • No synonym/acronym control
- • No singular/plural control
- • No guard against wrong spelling
- • Different kinds of tags all entered together
- Types of Tags:
- •Attribute of the media (Ex.article,blog,reference,tutorial,resource,tools)
- •Identity/Affinity of the item (being tagged) (Ex.apple,java,microsoft,ajax,mobilephone,xml)
- •Attribute of the item(based on emotional response) (Ex.interesting,cool,funny,free,weird)
- •Action to be taken on the item
- (Ex.todo, toread,read_later)
Pollution in the Tag Library:
- •Ambiguous Tags (Ex. apple,filter)
- • Synonyms (Ex. film,movie,cinema)
- • Acronyms (Ex. mac,bday,frnds)
- • Singular/Plural (Ex. blog/blogs, flower/flowers, film/films)
- • Wrongly spelt (Ex. friend/freind,design/deign)
- • Compound Tags (Ex. sanfrancisco+museum, sanfranciscomuseum, united_kingdom)
Output
- Folksonomies imply a lack of precision for the variability of language.
- Proposed tags have no hierarchy.
- Folksonomies have a very low findability quotient. They are great for serendipity and browsing but not aimed at a targeted approach or search.
- There is no synonym control in the system.
- Different word forms, plural and singular, are also often both present. Acronyms also create problems
(I believe in Open Source........after all I hv reached this far by standing on the shoulder of giants, as Newton said........and thts wht Web2.0 is all about, anyways...)
- By Tag
- By url
-This will be mainly of use to the webmaster(s) administrating that url, to see the relative popularity of their pages, and the cognitive model of their website in the users’ mind, by analysing the tags assigned to each page:
- By User
-By searching the screen name of the user (if known)
-By Tag Matching
- By Time
-All search results can be sorted by time, to see the most recently tagged item under a given tag
Countering Tag Pollution:
- Reducing tag pollution at input level itself, by classifying tags into ‘Subject tags’ and ‘Additional tags’, where
Subject tags = Identity/Affinity of the item
Additional tags = Attribute of the media
Attribute of the item
Action to be taken on the item
- Synonymous Tags
- If there is an exact (or close to exact) correlation between the occurrence of Tag A and the occurrence of Tag B [ P(AB) ~ 1 ]
- Singular/Plural Tags or tags with different forms of the same word
–Tag Stemming (remove common endings from words, leaving behind an invariant root form)
- Ambiguous Tags
–Analyzing the neighbouring tags of a tag to understand its context of use
- Misspelt Tags
–Spellcheck (during Tag input)
- Compound Tags
–Making tags comma delimited instead of space delimited, thus allowing multiword tags
Tag Match:
- Using Tags of a user as his/her profile indicator
- In order to find another user with the same “profile” as mine, the system just matches my tags with tags of all other users of the system
- Search results are sorted according to the percentage of match ( along with a list of the top most popular tags of mine which matched with that user)
Supporting Targeted Search:
- Synonym tags are also accounted for while searching.
- Tags in the Search results of 1 tag can be used to further refine the search in that domain.
- All search results are sorted by the no. of people who have tagged a url with that particular tag, a greater no. of tags with that keyword showing the greater relevance/popularity of that page in the users’ mind.
- If a user already has items tagged with the tag which is searched, then those are displayed with a tick mark, to avoid him from opening the already tagged links again.
- Users can directly copy an item into their bookmark list from the search results itself without bothering to open the link, if they so desire.
- Relative popularity of tags shown by variable font size
- Different forms of same tag collapsed into 1 tag
- Degree of correlation between words represented by their closeness in space and colour of the connecting line
- The most popular/dominant tags are the nodes around which the tree is formed
- The clusters (if present) are shown separately, with the cluster name forming the central node
- Tree can be formed either centered around 1 tag (entered by the user), or all tags
- Also, the tree can be formed using only the related ‘Subject’ tags, or ‘All’ related tags
Inverse user- tag mapping
(given a tag, who all are the users using it?)
Supporting joins
(showing tags/tag cloud of raina_saboo + rashmisinha)
1 comment:
Post a Comment