Open in new window / Try shogun cloud
--- Log opened Sun May 26 00:00:18 2013
-!- hushell [~hushell@8-92.ptpg.oregonstate.edu] has joined #shogun00:05
-!- HeikoS [~heiko@176.248.212.176] has joined #shogun00:27
-!- mode/#shogun [+o HeikoS] by ChanServ00:27
@HeikoSsonney2k: can we do multiple inheritence in shogun?00:32
@HeikoSI think I remember no, but I forgot00:32
@HeikoSwhats the best way to do a thing similar to java's interface (which is inheriting a couple of pure virtual methods)00:32
@HeikoSlisitsyn: ^00:33
@lisitsynHeikoS: no MI :)00:34
@HeikoSlisitsyn: how else=?00:34
@HeikoSlisitsyn: also, I am thinking of adding a general ComputationTask class00:34
@HeikoSwhich can be registered in another class00:35
@HeikoSwhich then handles computation of all those00:35
@HeikoSand different implementations may do it differently (multicore, mpi, etc)00:35
@lisitsynHeikoS: hmm00:35
@lisitsynwhen did you come to that idea?00:35
@lisitsynit reproduces something that is in my mind too00:36
@HeikoSlisitsyn: I need this for log-det project00:36
@HeikoSbut would be better to have this in general00:37
@HeikoSevery task impleentation should have code how to solve it00:37
@HeikoSand the std impleentation of CComputationPool00:37
@HeikoSjust does everything sequentially00:37
@HeikoSthen we program against that interface00:37
@HeikoSand people might come up with more fancy things00:37
@HeikoSwithout changing algorithm code00:38
@lisitsynHeikoS: I see00:38
@HeikoSlisitsyn:  so lets think about this a bit00:38
@lisitsynbut why do you need interfaces there?00:38
@HeikoSlisitsyn: nevermind about this00:38
@HeikoSlets talk about the computation00:39
@HeikoS :)00:39
@lisitsynHeikoS: hah ok00:39
@lisitsynHeikoS: what is computation pool?00:39
@HeikoSok00:39
@HeikoSso CComputationPool00:40
@HeikoSis an abstract base where one can register tasks00:40
@HeikoSand one can call solve_all(), which gives a list of CComputationTaskResult instances00:40
@HeikoSregister(CComputationTask task)00:40
@lisitsynwhat is real instances of computation pool?00:40
@HeikoSone example:00:40
@HeikoSCSequentialComputationPool00:41
@HeikoSsolve_all just loops over all Tasks and solves them00:41
@HeikoSeach task knows how it gets solved00:41
@HeikoSanother exaple:00:41
@lisitsynokay sequential parallel what else?00:41
@HeikoSMPI00:41
@HeikoSgroup like structures00:41
@lisitsynohh hah00:41
@HeikoSproblem dependent00:41
@HeikoSthere are only few generic ones00:41
@HeikoSmost of them will be problem specific00:41
@HeikoSstill only one interface from main algorithm00:42
@HeikoSCIndependentParallelComputationPool00:42
@HeikoSI think of using external libraries for more structured stuff00:43
@HeikoSbut for now, I am just interested in the interface00:43
@lisitsynwhat libraries?00:43
@HeikoSsomething to schedule for example00:43
@HeikoSbut doesnt matter now00:43
@HeikoSyou could imagine that one class uses graphlab for example00:43
@HeikoSif there is a lot of structure00:44
@HeikoSbut even multicore might be nice00:44
@lisitsynbut it looks like doing a task in multicore manner is a more frequent case00:44
@lisitsynthere is a strong reason to do tasks multicore00:45
@HeikoSyes00:45
@HeikoSdefinitely00:45
@HeikoSonce class could for example do grid-search in a multicore way00:45
@lisitsynwhen you do totally different things your context is switching like crazy00:45
@HeikoSbut bad example since grid-search is already impleented, and not in terms of tasks that one regiusters00:45
@lisitsynHeikoS: I'd rather call it Queue btw00:46
@HeikoSbut new things could be written in terms of tasks that one first registers, and then solves00:46
@lisitsynPool is a different pattern00:46
@HeikoSlisitsyn: it is not a queue00:46
@HeikoSbut agreed on pool is not good00:46
@lisitsynit is not a pool too ;)00:46
@HeikoSSet? :)00:46
@lisitsynpool is a set of prepared objects00:46
@lisitsynset is so neutral that it doesn't tell anything00:47
@HeikoSOrganizer ?=00:47
@lisitsynengine may be00:47
@HeikoSEngine is good! :)00:47
@lisitsynwell it is engine in graphlab00:47
@lisitsyn:D00:47
@lisitsynI've seen they have some fancy algorithms00:48
@lisitsynfor philosophers thing00:48
@HeikoSindeed00:48
@HeikoSthis is not what I want to do00:48
@lisitsynHeikoS: why do you need it btw?00:49
@HeikoSlog-det estimates *have* to be parallelized00:49
@HeikoScan do up to factor few hundred00:49
@lisitsynHeikoS: did you consider opencling it too btw?00:49
@HeikoSlisitsyn: I dont want to actually do this for now, but rather prepare it00:50
@HeikoSits an experiment00:50
@HeikoSother way would be to say:00:50
@HeikoSah nevermind00:50
@HeikoSso I want to try it00:50
@HeikoSeven computing 1 estimate can be parallelized massively00:50
@lisitsynHeikoS: I like idea of formulating *all* operations as jobs/tasks00:50
@HeikoSusually one needs a few hundred of them00:51
@HeikoSlisitsyn: yes, thats the experiment, if we can make this work, things might be easier to parallelize00:51
@HeikoSwhich they should00:51
@lisitsynI mean if we call train00:51
@HeikoSso many loops of independent things in our code00:51
@lisitsynwe should just enqueue some operation00:51
@HeikoSexactly00:51
@HeikoSalso this would separate the code structure froim the actual computation a bit more00:52
@lisitsynHeikoS: as for pools - I hope we will get to them too00:52
@lisitsynwould be cool to have a thing  that manages memory00:53
@HeikoSindeed00:54
@HeikoSlets experiment with those!00:54
@lisitsynHeikoS:  I personally have difficulties with experimenting in shogun00:54
@lisitsynit is big and I have superstitions :D00:55
@HeikoSlisitsyn: the best point to do this is when the framework is extended00:55
@HeikoSwhich the log-det project does00:56
@HeikoSquite a few classes are necessary for this00:56
@HeikoSI wouldnt do it for GP for example00:56
@HeikoSthere is already too much in single-thred logic00:56
@HeikoSthread00:56
@lisitsynHeikoS: I would not go for generic design of that actually00:57
@lisitsynso lets just gradually do that specifically for your task00:57
@HeikoShow do you mean that?00:57
@HeikoSyes thats my plan00:57
@lisitsynand then generalize when we see a generalization point00:57
@lisitsynHeikoS: I failed too many times with generic design :D00:58
@HeikoShaha :)00:58
@HeikoSI will send you the class diagram once lambday and I have worked this out01:00
@HeikoShe is a smart guy and probably can help a lot there...01:00
@HeikoSit makes no sense to do this stuff single-threaded btw01:00
@HeikoSlisitsyn: and we should have at least a general framework for multicore stuff with a unified interface01:02
@HeikoSsince so many tasks are like that01:03
@HeikoSI  mean independent loops01:03
@lisitsynHeikoS: yes true01:04
@lisitsynjust avoid trying to do that general right now01:04
@HeikoSwell, a little bit at least :)01:05
@HeikoSgeneral enough to have multiple forms for the log-det stuff01:05
@lisitsynHeikoS: it would be possible to design a general thing if we had experience01:05
@lisitsynotherwise we have to do that evolutionary01:06
@lisitsynHeikoS: I can design multiagent systems now - but soooo many mistakes have been fixed01:07
@HeikoSi see01:08
@lisitsynso is that thing I am sure01:09
@lisitsyn:)01:09
@HeikoSone should never start coding too early :)01:09
@HeikoSwant to spend some time planning this01:09
@lisitsynwe are just unexperienced to foreseen that01:09
@lisitsynnahh that fails too01:09
@lisitsynHeikoS: it depends on the experience again01:10
@lisitsynin this case I'd rather plan something not really detailed then code it01:10
@lisitsynthen see everything is wrong01:10
@lisitsynand refactor01:10
@lisitsynthen guess what :D01:10
@lisitsynHeikoS: should a task have a separate object to store data? how to store dependencies? what are types of dependencies?01:12
@HeikoSlisitsyn: no dependencies01:12
@HeikoSas I said, this is not my goal01:12
@HeikoSindependent loops01:12
@lisitsynHeikoS: yeah I mean there are a lot of questions01:12
@HeikoSdata is stored within task, or reference01:13
@lisitsynand not all of them are answer-able design-time01:13
@HeikoSthis depends on the implementation01:13
@lisitsynHeikoS: I see a lot of possibilities there anyway01:13
@HeikoSlisitsyn: yes01:13
@HeikoSlisitsyn: I mean, I just want to have something for the log-dets01:14
@lisitsynmost of them are usually unforeseen so be ready to refactor and refactor ;)01:14
@HeikoSI have coded up all of this in Matlab, both seq and par, so I know what happens, but maybe you are right and I should not be so general01:15
@lisitsynHeikoS: no I just warn you and lambday to not strive for generality from the very beginning01:15
@HeikoSlisitsyn: again, he is not meant to implement parallel things01:16
@HeikoSjust write the sequential one against an interface that might be able to handle this01:16
@lisitsynI see01:16
@HeikoSso and the interface I wanted to have btw is01:16
@HeikoSthat a class can inherit a set of methods01:17
@HeikoSthat are: register stuff, solve subproblem, etc01:17
@HeikoSso whats a good way to simulat interfaces?01:17
@HeikoSjava doesnt have MI, thats why they have interfaces, but how do we do this?01:17
@lisitsynHeikoS: we are tied to no MI so forget about java :)01:17
@lisitsynI don't know01:18
@HeikoSno way?01:18
@lisitsynit is problem dependent01:18
@HeikoSby hand01:18
@lisitsynMI is totally troublesome01:18
@lisitsynHeikoS: you mean they form an hierarchy of classes to share some methods01:18
@lisitsynbut all of them implement Task01:19
@lisitsyn?01:19
@HeikoSyes for example01:19
@lisitsynHeikoS: well I see no problem putting Task to the very top of that hierarchy01:20
@lisitsynso all depends..01:21
@HeikoSlisitsyn:  Ill show you the class diagram :)01:23
@HeikoSwhen its more or less done01:23
@lisitsynHeikoS: interfacing is java idiom so may be it just requires to change a point of view01:23
@lisitsynwe will see01:23
@lisitsynHeikoS: alright will try to schlafen :)01:26
@HeikoSgood night lisitsyn! :)01:26
@lisitsynHeikoS: good night01:27
-!- foulwall [~foulwall@2001:da8:215:6901:93a:5fb3:ab52:7a68] has joined #shogun02:03
-!- HeikoS [~heiko@176.248.212.176] has quit [Quit: Leaving.]03:06
-!- nube is now known as out04:12
-!- out is now known as nube04:12
shogun-buildbotbuild #407 of nightly_default is complete: Failure [failed test]  Build details are at http://www.shogun-toolbox.org/buildbot/builders/nightly_default/builds/40704:17
-!- gsomix [~gsomix@83.149.21.63] has joined #shogun05:02
gsomixgood morning05:02
foulwallgsomix: morning05:02
-!- gsomix [~gsomix@83.149.21.63] has quit [Ping timeout: 264 seconds]06:41
-!- nube [~rho@49.244.28.55] has quit [Ping timeout: 256 seconds]07:36
-!- foulwall [~foulwall@2001:da8:215:6901:93a:5fb3:ab52:7a68] has quit [Remote host closed the connection]07:40
-!- nube [~rho@49.244.116.16] has joined #shogun07:51
-!- flxb_ [~flxb@master.ml.tu-berlin.de] has joined #shogun07:53
-!- flxb [~flxb@master.ml.tu-berlin.de] has quit [Write error: Broken pipe]07:54
@sonney2kmorning...08:33
-!- sijin [~smuxi@144.214.222.109] has quit [Read error: Connection reset by peer]08:57
@sonney2kpickle27, any insights?08:59
-!- iglesiasg [d58f32ac@gateway/web/freenode/ip.213.143.50.172] has joined #shogun09:27
-!- mode/#shogun [+o iglesiasg] by ChanServ09:28
-!- sijin [~smuxi@144.214.222.109] has joined #shogun09:40
-!- hushell [~hushell@8-92.ptpg.oregonstate.edu] has quit [Ping timeout: 264 seconds]10:05
-!- foulwall [~foulwall@2001:da8:215:503:d9a2:88ea:88e3:5e47] has joined #shogun10:15
-!- hushell [~hushell@c-67-189-100-116.hsd1.or.comcast.net] has joined #shogun10:23
-!- votjakovr [~votjakovr@host-46-241-3-209.bbcustomer.zsttk.net] has joined #shogun10:30
-!- foulwall [~foulwall@2001:da8:215:503:d9a2:88ea:88e3:5e47] has quit [Remote host closed the connection]10:49
-!- votjakovr [~votjakovr@host-46-241-3-209.bbcustomer.zsttk.net] has quit [Quit: Leaving]11:18
-!- iglesiasg [d58f32ac@gateway/web/freenode/ip.213.143.50.172] has quit [Ping timeout: 250 seconds]12:43
-!- vgorbati [5f8777f7@gateway/web/freenode/ip.95.135.119.247] has joined #shogun13:16
-!- van51 [~van51@athedsl-320452.home.otenet.gr] has joined #shogun13:18
-!- foulwall_ [~foulwall@2001:da8:215:503:746c:70bc:a9be:cac0] has joined #shogun13:30
-!- lisitsyn [~blackburn@109-226-74-97.clients.tlt.100megabit.ru] has quit [Ping timeout: 246 seconds]13:31
-!- vgorbati_ [5f85daa8@gateway/web/freenode/ip.95.133.218.168] has joined #shogun14:01
-!- vgorbati [5f8777f7@gateway/web/freenode/ip.95.135.119.247] has quit [Ping timeout: 250 seconds]14:03
-!- vgorbati_ is now known as vgorbati14:05
-!- zxtx [~zv@ool-457e751d.dyn.optonline.net] has quit [Ping timeout: 246 seconds]14:08
-!- zxtx [~zv@ool-457e751d.dyn.optonline.net] has joined #shogun14:10
-!- vgorbati [5f85daa8@gateway/web/freenode/ip.95.133.218.168] has quit [Ping timeout: 250 seconds]14:12
-!- vgorbati [5f85daa8@gateway/web/freenode/ip.95.133.218.168] has joined #shogun14:24
-!- gsomix [~gsomix@188.168.2.227] has joined #shogun14:34
gsomixhi14:34
gsomixsonney2k, sent PR.14:35
-!- van51 [~van51@athedsl-320452.home.otenet.gr] has left #shogun ["JOIN #shogun"]14:38
gsomixsonney2k, I hope it's readable now. :)14:38
-!- van51 [~van51@athedsl-320452.home.otenet.gr] has joined #shogun14:38
-!- vgorbati [5f85daa8@gateway/web/freenode/ip.95.133.218.168] has quit [Ping timeout: 250 seconds]14:50
-!- foulwall_ [~foulwall@2001:da8:215:503:746c:70bc:a9be:cac0] has quit [Remote host closed the connection]15:27
-!- foulwall [~foulwall@2001:da8:215:c252:4b2:f64d:b662:b135] has joined #shogun16:50
gsomixcu later, guys16:52
-!- sanyam [uid10602@gateway/web/irccloud.com/x-myercfhnlmkikdyu] has quit [Ping timeout: 252 seconds]17:39
-!- foulwall [~foulwall@2001:da8:215:c252:4b2:f64d:b662:b135] has quit [Ping timeout: 240 seconds]17:45
-!- nube [~rho@49.244.116.16] has quit [Ping timeout: 264 seconds]18:23
-!- nube [~rho@49.126.16.146] has joined #shogun18:26
-!- nube [~rho@49.126.16.146] has quit [Ping timeout: 256 seconds]18:54
-!- nube [~rho@49.244.8.172] has joined #shogun19:17
-!- gsomix [~gsomix@188.168.2.227] has quit [Ping timeout: 245 seconds]19:25
-!- gsomix [~gsomix@188.168.2.227] has joined #shogun19:26
-!- van51 [~van51@athedsl-320452.home.otenet.gr] has left #shogun ["PING 1369589370"]19:29
-!- sanyam [uid10602@gateway/web/irccloud.com/x-nsunvhvtlukipqcp] has joined #shogun19:36
-!- katia_ [5f43c1c3@gateway/web/freenode/ip.95.67.193.195] has joined #shogun19:48
-!- deerishi [c649b206@gateway/web/freenode/ip.198.73.178.6] has joined #shogun19:56
-!- vgorbati [5f6ff438@gateway/web/freenode/ip.95.111.244.56] has joined #shogun20:00
pickle27sonney2k: sorry haven't had a chance to work on that yet20:10
-!- deerishi [c649b206@gateway/web/freenode/ip.198.73.178.6] has quit [Ping timeout: 250 seconds]20:18
-!- katia_ [5f43c1c3@gateway/web/freenode/ip.95.67.193.195] has quit [Ping timeout: 250 seconds]20:22
-!- katia_ [5f43c1c3@gateway/web/freenode/ip.95.67.193.195] has joined #shogun20:38
gsomixgood evening20:46
-!- vgorbati [5f6ff438@gateway/web/freenode/ip.95.111.244.56] has quit [Ping timeout: 250 seconds]21:21
-!- vgorbati [5f6ff438@gateway/web/freenode/ip.95.111.244.56] has joined #shogun21:30
pickle27sonney2k: valgrind didn't complain about qda21:32
pickle27sonney2k: paste is here http://pastebin.com/xc3SERUR21:34
-!- vgorbati [5f6ff438@gateway/web/freenode/ip.95.111.244.56] has quit [Ping timeout: 250 seconds]21:35
@sonney2kpickle27, well yeah it is no memory leak but something else21:42
@sonney2kpickle27, how about you pickle.dump all the input that the function gets when you run tester.py21:43
@sonney2kand then load that to reproduce/debug the issue21:43
* sonney2k off21:43
pickle27sonney2k: I thought valgrind might complain because I thought it might be a result that is bigger than its return allocation if that makes sense21:44
pickle27sonney2k: okay21:44
pickle27sonney2k: the function doesn't get any input from tester.py it just runs the example22:05
pickle27sonney2k: at least thats what it looks like to me22:05
gsomixnite22:07
pickle27sonney2k: if I run the modular example on my own it runs fine22:07
pickle27sonney2k: I'll try the same data in the c++ example22:07
@sonney2kpickle27, no22:15
@sonney2kpickle27, did you pickle dump?22:15
pickle27sonney2k: I was just looking through to see what tester actually did22:15
pickle27sonney2k: doesn't it just run classifier_qdq_modular.py?22:16
@sonney2kpickle27, yes but did you dump the data it gets?22:16
pickle27what do you main it doesn't get data the data is loaded in the example itself22:17
pickle27*mean22:17
@sonney2kso did you dump it or not?22:18
pickle27theres no need to dump it, its the data/fm_train_real data22:18
@sonney2kok then let me do it22:18
@sonney2kpickle27, alright so the reason is that m_store_covs is True in one test22:22
@sonney2kpickle27, so just put a true as last argument and you can reproduce the crash in the example22:23
pickle27sonney2k: I thought that might be the problem but it still runs for me when I do that22:23
pickle27sonney2k: ahh got the bug now22:24
@sonney2kpickle27, parameter_list = [[traindat, testdat, label_traindat, 1e-4, True], \22:24
@sonney2kthen it will crash22:24
pickle27sonney2k: and I see sort of whats happening with the tester22:24
pickle27okay I'll work on fixing this now22:25
@sonney2kthanks22:26
pickle27sonney2k: got it now, I just switched to ozansener's covar calc instead22:39
pickle27theres a lot of room for better use of Eigen3 in QDA but it'll work for now22:40
@sonney2kpickle27, heh feel free to do it - ohh and benchmarks welcome too!22:46
pickle27sonney2k: yeah I'd like to give it a try in the next bit!22:48
-!- katia_ [5f43c1c3@gateway/web/freenode/ip.95.67.193.195] has quit [Quit: Page closed]22:49
pickle27sonney2k: the test runs now but the result is different in a 2 places (unsure why slight numerical differences?) should I make a PR with the fix now and continue investigating?22:53
@sonney2kgsomix, yes readable finally :-)22:54
-!- HeikoS [~heiko@176.248.212.176] has joined #shogun23:02
-!- mode/#shogun [+o HeikoS] by ChanServ23:02
@sonney2kHeikoS, hey there!23:03
@HeikoSsonney2k: hi!23:03
@HeikoShow is it going?23:04
@sonney2ktomorrow is the day students will be notified23:04
@HeikoSI know23:06
@HeikoSsonney2k btw, discussing something with lambday23:08
@HeikoSwhich is basically a class CIndependentComputationEngine23:08
@HeikoSwhich can take instances of a CIndependentComputationTask23:08
@HeikoSand run all of them in parallel23:09
@HeikoSor sequentially23:09
@HeikoSor whatever23:09
@HeikoSwe need this for the log-det stuff23:09
@sonney2kHeikoS, ohh I think sergey had some thoughts on that too23:09
@HeikoSand maybe it might be worth to think about generalising it for other things23:09
@HeikoSyes we already discussed23:09
@sonney2kand wiking would need this for his bagging machine and you for your xval stuff23:09
@HeikoSso algorithms just produce a set of tasks instead of doing computations23:10
@HeikoSthose are given to the computation class23:10
@HeikoSit returns results23:10
@HeikoSresults are being passed to algorithm which aggregate23:10
@HeikoSbut only for independent/trivially parallelizable stuff23:10
@HeikoSotherwise it will be too complicated23:10
@HeikoSbut this way, many things might benefit23:10
@HeikoSgrid-serach for example23:11
@sonney2kHeikoS, I am not sure how exactly this would work23:11
@HeikoSwe could have one class which does things in a multicore way23:11
@sonney2kyeah multi core / multiple machines23:11
@sonney2kmachines == computeres23:11
@HeikoSyes23:11
@HeikoSand future implementations might be coded against this23:11
@HeikoSsergey had some doubts however23:12
@HeikoSand he is right, its not easy to do this in general23:12
@sonney2khowe would it work in case of say bagging?23:12
@sonney2khow do you tell which stuff is to be transferred to the remote machine and which not?23:13
@HeikoSso the way I would do the abstraction is this23:13
@sonney2kI currently can see this work with threads and just a couple of parameters23:13
@HeikoSone has a class for indepedent tasks23:14
@HeikoSwhich has abstract method solve23:14
@sonney2k(beware already here - you have to set obj->parallel->set_num_threads(1) then)23:14
@HeikoSThe task itself know everything it needs to know23:14
@sonney2kbetter compute()23:14
@HeikoSyou one can just call compute/solve23:14
@HeikoSand the implementation of the task does everything and returns an instance of an abstract base for result23:14
@HeikoSso then your algorithm just produces a set of those tasks23:15
@HeikoSthese may share data for now (as long as its not modified)23:15
@HeikoSbut the point is that they hold a complete representation of the subproblem23:15
@HeikoSyou pass them to computation enginge class23:15
@HeikoSbasic case: sequential: just a loop over all task.compute()23:16
@HeikoSreturns a set with result instances23:16
@HeikoSone passes them to the algorithm which knows how to aggregate the results if it has produced the tasks23:16
@HeikoSmulticore implementation would run things at once23:16
@HeikoSfor this, one needs to clone stuff which is modified23:17
@HeikoSread-only things can stay in shared memory23:17
@HeikoSdistributed implementation might serialize objects and send them to computer23:17
@HeikoSs23:17
@HeikoSsince we are only considereing independent stuff, we dont have scheduling problems23:18
@sonney2kHeikoS, so to get it right the task creates all required objects that are passed to the compute engine23:18
@HeikoSyes23:18
@HeikoScomputation engine just calls compute() method in some way23:18
@sonney2khow can one do that efficiently? I mean you don't want to create 10 copies of a data set in memory?23:18
@HeikoSsonney2k, indeed23:18
@HeikoSthe thing is: if data is modified, there is no way around that23:19
@sonney2kso you pass references only23:19
@HeikoSanyway23:19
@sonney2kand they are copied if needed23:19
@HeikoSexactly23:19
@HeikoSso multicore works on the same objects23:19
@sonney2kyes for single machines23:19
@HeikoSfor multiple machines, data needs to be transfered anyway23:19
@HeikoSno way around that23:19
@sonney2kbut for clusters we would just serialize23:20
@HeikoSyes23:20
@HeikoSto a byte stream or so23:20
@sonney2kthere is the issue with crashing parts23:20
@HeikoSwhat do you mean with that?23:20
@sonney2k(I get the picture and it should be OK)23:20
@sonney2ksay a cluster node crashes23:21
@HeikoSI see23:21
@sonney2khow do you fail over23:21
@sonney2kor a thread cannot be created etc23:21
@HeikoSwell thats all to be handled by the computation engine implementation23:21
@HeikoSso we can do this later23:21
@HeikoSno problem, just try, if it doesnt work, try another machine23:21
@sonney2kwe have to somehow be able to 'resume' or to restart failed stuff or how $BIGCOMPANY does it start say 30% more jobs23:22
@sonney2kto anticipate failures23:22
@HeikoSsonney2k, I wouldnt do that23:22
@HeikoSrather make tasks smaller23:22
@HeikoSsubtasks23:22
@HeikoSan algorithm can even produce a set of different tasks23:22
@HeikoSas long as it knows to aggregate the results23:22
@HeikoSresuming is very difficult23:23
@HeikoS(I think at least)23:23
@HeikoSsonney2k: so I dont want to get involved in too much techical distributed programming, but rather start thinking about a framework that could be extended to this23:23
@HeikoSfor now, just multicore23:24
@HeikoSbut formulate algorithm in terms of that task-based framework23:24
@HeikoSfor independent stuff23:24
@HeikoSso quite simple actually23:24
@sonney2kHeikoS, I see, but IIRC you have a cluster @work?23:25
@HeikoSyes, can do23:25
@sonney2kqsub based stuff?23:25
@HeikoSyes23:25
@HeikoSso I have in mind to create a computation engine that submits qsub jobs23:25
@HeikoSat some point23:25
@sonney2kso IMHO it would be worth to do that23:26
@sonney2kqsub and (just ssh based)23:26
@HeikoSyes definitely, we have many independent subproblems in shogun23:26
@sonney2kdo your nodes share a common file system?23:26
@HeikoSI am currently runnign a thing on 100 nodes, thats quite a big speedup factor.23:26
@HeikoSyes23:26
@HeikoSso one could indeed serialize23:27
@HeikoSto a file23:27
@sonney2kyes data to one big file and then load only the modified parameters from different files23:27
@sonney2kI recall very much the limits we hit with a shared file system23:28
@HeikoSthis can even be handled by the tasks - give a filename for the main data, and just store the parameters in local variables23:28
@HeikoSsonney2k, well one usually doesnt get ones hands on more than a few hundred nodes23:28
@sonney2kback then I used bittorrent to cache data on all cluster nodes - all copying would otherwise have taken a week23:29
@HeikoSsonney2k: haha :) when was that?23:29
@sonney2khmmhh 2007 or 8?23:29
@HeikoSsonney2k: I mean these are all details on how the tasks are implemented, but it all works under the interface23:29
@sonney2klooong time ago23:29
@HeikoSso if one invests a lot of brainpower, one gets good results23:30
@HeikoSif one does not, it might not scale23:30
@sonney2kyes23:30
@HeikoSpoint now would be more the general structure23:30
@sonney2kthe standard map-reduce scheme would work aswell with this23:30
@HeikoStrue23:31
@sonney2kissue is still no loops possible23:31
@HeikoSI would rather go for this independent task based stuff23:31
@HeikoSmore intuitive23:31
@HeikoSalso, shogun is not really parallel based23:31
@HeikoSI mean its focus is not on this23:31
@HeikoSbut these independent things are just so easy and so useful23:32
@HeikoSthat  we could focus on just them23:32
@sonney2ktrue23:32
@sonney2kshogun is meant to run on single machines23:32
@sonney2kwith lots of cores23:32
@HeikoSI would say this stuff that one would parallelize on qsub clusters would be very useful though23:33
@HeikoSparameter sweeps etc23:33
@HeikoSand exactly, first engine would be one with a shared memory model23:33
@HeikoSthen we could start modifying existing algorithms23:34
@HeikoSand once this is more or less stable23:34
@HeikoSone could try adding distributed things23:34
@HeikoSstep by step23:34
@sonney2kHeikoS, the problem really is that you hardly get speedups by just switching multi-core -> multi-machine23:35
@sonney2kthe algorithm needs to be designed for that usually23:35
@HeikoSlets see how it goes, will start with the log-det stuff, which is already a bit of a challenge under this framework. Many linear systems that share a lot of stuff23:35
@sonney2kso yes only the big independent jobs will benefit23:35
@HeikoSsonney2k: yes23:35
@sonney2kbut that is what you have in mind23:35
@HeikoSthe rest is too complicated anyway23:36
@HeikoSgrid-search is the best example23:36
@sonney2kso bagging/ms/etc23:36
@HeikoSand random forests etc23:36
@sonney2kwhen I had to parallelize I did mostly ms23:36
@HeikoSyes same, usually only independent stuff23:36
@sonney2ksometimes data was too big23:36
@sonney2kso I trained or applied on chunks23:37
@HeikoSI see23:37
@HeikoSsonney2k: gotta go now, diner is ready :) be back later23:37
@sonney2kcu23:38
@sonney2knice talking to you as always :D23:38
-!- HeikoS [~heiko@176.248.212.176] has quit [Quit: Leaving.]23:39
--- Log closed Mon May 27 00:00:19 2013