Open in new window / Try shogun cloud
--- Log opened Fri Dec 30 00:00:19 2011
-!- blackburn [~blackburn@] has quit [Quit: Leaving.]06:02
-!- in3xes [~in3xes@] has joined #shogun15:46
-!- heiko [] has joined #shogun17:21
-!- in3xes [~in3xes@] has quit [Ping timeout: 240 seconds]17:32
-!- heiko [] has quit [Ping timeout: 240 seconds]17:39
-!- heiko [] has joined #shogun18:01
-!- heiko [] has quit [Ping timeout: 276 seconds]18:06
-!- heiko [] has joined #shogun18:45
-!- puneetgoyal [~puneetgoy@] has joined #shogun19:36
-!- heiko [] has quit [Ping timeout: 240 seconds]19:36
-!- blackburn1 [~blackburn@] has joined #shogun19:56
-!- heiko [] has joined #shogun19:57
blackburn1heiko: how do you do?19:58
-!- blackburn1 [~blackburn@] has quit [Quit: Leaving.]19:59
heikohej blackburn19:59
heikoI am fine :)19:59
heikoand you?19:59
-!- Netsplit *.net <-> *.split quits: CIA-120:08
-!- Netsplit *.net <-> *.split quits: puneetgoyal20:19
-!- heiko [] has quit [Ping timeout: 240 seconds]20:20
-!- Netsplit *.net <-> *.split quits: naywhayare20:23
-!- Netsplit *.net <-> *.split quits: @sonney2k, shogun-buildbot20:23
-!- Netsplit over, joins: puneetgoyal, CIA-1, naywhayare, @sonney2k, shogun-buildbot20:24
-!- heiko [] has joined #shogun20:36
-!- blackburn [~blackburn@] has joined #shogun20:48
puneetgoyalblackburn: hi21:32
-!- puneetgoyal [~puneetgoy@] has quit [Quit: Leaving]21:43
-!- puneetgoyal [~chatzilla@] has joined #shogun22:52
blackburnpuneetgoyal: hi23:05
-!- in3xes [~in3xes@] has joined #shogun23:06
puneetgoyalblackburn: I have calculated the tf-idf values for every word in every payload of every email.....but I am confused on the representation of them23:07
blackburnpuneetgoyal: nice23:07
blackburnwell just store it as matrix23:07
blackburnchoose e.g. 10 of them23:07
blackburnwith highest tf-idf23:07
puneetgoyalso should I consider the tf-idf value of a word for different documents, or take some average?23:08
puneetgoyalI mean a lot of documents will have same word...and each of them will have a different value...23:09
blackburnhmm yes23:09
-!- heiko [] has quit [Ping timeout: 240 seconds]23:09
blackburnI don't know what is the best way23:09
puneetgoyalhow will it work if we have different values for a same word..23:10
blackburnehm? can't see any problem23:10
puneetgoyalplease elaborate23:10
blackburnpuneetgoyal: ok consider you have calculated td-idfs23:11
blackburnthen you can choose some of them23:12
blackburne.g. 323:12
blackburnthen feature vector for document23:12
blackburnis tf-idfs of X,Y,Z respectively23:12
blackburngot it?23:12
blackburnI am not sure it is the best way23:13
blackburnbut would work23:13
blackburnthe only heuristics is how to choose words23:14
-!- in3xes [~in3xes@] has quit [Quit: Leaving]23:16
-!- puneetgoyal [~chatzilla@] has quit [Remote host closed the connection]23:17
-!- blackburn [~blackburn@] has quit [Quit: Leaving.]23:21
-!- puneetgoyal [~puneetgoy@] has joined #shogun23:21
--- Log closed Sat Dec 31 00:00:19 2011