Open in new window / Try shogun cloud
--- Log opened Sun Dec 25 00:00:19 2011
-!- blackburn1 [~blackburn@188.168.4.33] has quit [Quit: Leaving.]00:49
-!- blackburn [~blackburn@188.168.4.177] has joined #shogun11:31
-!- blackburn [~blackburn@188.168.4.177] has quit [Ping timeout: 240 seconds]12:51
-!- blackburn [~blackburn@83.234.54.14] has joined #shogun12:52
-!- blackburn [~blackburn@83.234.54.14] has quit [Ping timeout: 252 seconds]13:21
-!- blackburn [~blackburn@188.168.5.99] has joined #shogun13:22
-!- blackburn [~blackburn@188.168.5.99] has quit [Ping timeout: 240 seconds]14:00
-!- blackburn [~blackburn@188.168.4.157] has joined #shogun15:00
-!- puneetgoyal [~puneetgoy@117.203.127.5] has joined #shogun15:43
-!- puneetgoyal [~puneetgoy@117.203.127.5] has quit [Ping timeout: 248 seconds]15:50
-!- puneetgoyal [~puneetgoy@117.203.127.5] has joined #shogun16:37
-!- puneetgoyal [~puneetgoy@117.203.127.5] has quit [Read error: Connection reset by peer]17:05
-!- puneetgoyal [~puneetgoy@117.203.127.5] has joined #shogun18:33
-!- blackburn [~blackburn@188.168.4.157] has quit [Quit: Leaving.]19:10
-!- blackburn [~blackburn@188.168.4.204] has joined #shogun19:41
-!- puneetgoyal [~puneetgoy@117.203.127.5] has quit [Read error: No route to host]20:33
-!- puneetgoyal [~puneetgoy@117.203.127.5] has joined #shogun20:33
CIA-1shogun: Soeren Sonnenburg master * r4688b54 / examples/undocumented/java_modular/check.sh : require 1GB java heap space - http://git.io/Xo9XAA21:09
blackburnheh you online21:09
blackburnsonney2k: hello!:)21:13
CIA-1shogun: Soeren Sonnenburg master * rfd27f86 / examples/undocumented/python_modular/serialization_matrix_modular.py : add missing import os - http://git.io/unSFKg21:51
shogun-buildbotbuild #256 of java_modular is complete: Success [build successful]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/java_modular/builds/25622:35
puneetgoyalblackburn: u there?22:40
blackburnpuneetgoyal: yes22:42
puneetgoyalblackburn: Have you checked the results of the file I sent you ?22:42
blackburnpuneetgoyal: sry not very carefully22:42
blackburnhow to use it properly?22:42
blackburnI'll check it now22:43
puneetgoyalok, I will mail you some details22:43
blackburnyou probably should not use absolute paths..22:44
blackburnlike indir='/home/puneet/shogun/test'22:44
blackburnpuneetgoyal: what is stopwords?22:44
puneetgoyalblackburn: stopwords are those which we dont need to take in into account while categorizing them22:45
blackburnhmm22:45
puneetgoyalas spam or ham22:45
blackburnyou probably better to use TF-IDF for this..22:45
puneetgoyalok22:47
blackburnpuneetgoyal: but anyways, what exactly I'm supposed to check?22:50
puneetgoyalI just need to know..if I am going on the right path....and what to do next22:50
blackburnpuneetgoyal: hmm okay it is useful experience anyway22:51
blackburnI would suggest you to write simple TF-IDF22:51
blackburnand use it as features22:51
blackburnfor classify22:51
blackburndo you know what TF-IDF is?22:51
puneetgoyalI havent read much about it22:52
blackburnit is pretty easy22:52
blackburnpuneetgoyal: thresholding tf-idf can be used just like 'stopwords' concept22:52
blackburni.e. common words will have ~0 tf-idf22:53
puneetgoyal0 means it has no contribution in calculating the probability of the mail being a spam or a ham?22:54
blackburnpuneetgoyal: yes22:55
puneetgoyalok...and more the no. that word occurred...more will be its value of tf-idf22:56
puneetgoyalif it is not a stop word22:56
blackburnpuneetgoyal: idf(term) = log of (total word counts) / (number of documents having term)22:57
blackburnyes, then thresholding it say 0.1 or so22:57
blackburnyou will get most valuable words22:57
blackburnand then you can just form feature vectors22:57
puneetgoyalI was stucked here22:58
blackburnwhy?22:58
puneetgoyalActually what method I was using..I found the words which were valuable....but was not able to find out where should procede further22:59
puneetgoyalfeature vector of what I should make?22:59
blackburnpuneetgoyal: okay if you have computed tf-idfs22:59
blackburnyou will get some 'rates'23:00
blackburnfor words X,Y,Z,...23:00
puneetgoyalyup23:00
blackburnthen for document 1 you will get (X rate for doc 1, Y rate for doc 1, Z rate for doc 1, ...)23:00
blackburnsame way for other docs23:00
blackburnthen you can use really any classifier23:01
blackburncause you will get euclidean representation for your texts23:01
puneetgoyalhmm..ok23:01
blackburnisn't it clear for you yet? we've got to make it really clear :)23:02
puneetgoyalactually I am not much clear with the concept of classification...23:02
blackburnpuneetgoyal: hmm ok23:03
puneetgoyalEven after running some examples...I got the training data...I got all the results which you were asking us to get...but didnt get to know how it was being classified23:03
blackburnpuneetgoyal: you could check some lectures probably..23:04
blackburnwhat exactly don't you understand?23:04
puneetgoyalhmm...I guess I will need to look for more examples..23:05
blackburnpuneetgoyal: we have really bad examples23:06
blackburnthat's the thing you can help us with23:06
blackburnin fact our examples is just tests :)23:07
puneetgoyalhmmm23:08
blackburnpuneetgoyal: okay I'll write you a snippet23:10
puneetgoyalblackburn: oh...no need to do that if you re busy...I will just get back to you with some solid example..where I could tell you what my real problem is23:11
blackburnpuneetgoyal: not busy now :)23:11
puneetgoyalblackburn: gr8!....I got an example though...23:16
blackburnpuneetgoyal: I sent a little example23:38
blackburnpuneetgoyal: there are two figures: one for train data - two gaussian blobs23:39
blackburnthen we add new points and predict it23:39
puneetgoyalok23:39
blackburnyou will see how it work a little23:39
blackburnok sleep time now :)23:40
blackburnpuneetgoyal: see you23:40
puneetgoyalblackburn: ok..thanks a lot.good nite :)23:40
-!- blackburn [~blackburn@188.168.4.204] has quit [Quit: Leaving.]23:41
-!- puneetgoyal [~puneetgoy@117.203.127.5] has quit [Quit: Leaving]23:41
--- Log closed Mon Dec 26 00:00:19 2011