blackburn1heiko: how do you do?19:58
heikohej blackburn19:59
heikoI am fine :)19:59
heikoand you?19:59
puneetgoyalblackburn: hi21:32
blackburnpuneetgoyal: hi23:05
puneetgoyalblackburn: I have calculated the tf-idf values for every word in every payload of every email.....but I am confused on the representation of them23:07
blackburnpuneetgoyal: nice23:07
blackburnwell just store it as matrix23:07
blackburnchoose e.g. 10 of them23:07
blackburnwith highest tf-idf23:07
puneetgoyalso should I consider the tf-idf value of a word for different documents, or take some average?23:08
puneetgoyalI mean a lot of documents will have same word...and each of them will have a different value...23:09
blackburnhmm yes23:09
blackburnI don't know what is the best way23:09
puneetgoyalhow will it work if we have different values for a same word..23:10
blackburnehm? can't see any problem23:10
puneetgoyalplease elaborate23:10
blackburnpuneetgoyal: ok consider you have calculated td-idfs23:11
blackburnthen you can choose some of them23:12
blackburne.g. 323:12
blackburnthen feature vector for document23:12
blackburnis tf-idfs of X,Y,Z respectively23:12
blackburngot it?23:12
blackburnI am not sure it is the best way23:13
blackburnbut would work23:13
blackburnthe only heuristics is how to choose words23:14
