1、TalkingData全球算法大赛盘点路瑶 数据科学部What is thePower of DataScienceMystery of DataScientistMagic of DataScientistThe Arena of DataScientistThe Arena of DataScientistA communitytobuildthebestsolutiononproblemsposedbyindustry,governmentandacademia.Over1,200datasciencechallenges.Morethan600,000registeredusersov
2、er194countriesfromaroundtheworld,fromawidevarietyofeducationalbackgroundsandareoftenexpertsintheirfields.PlatformofKDD CUPKaggle:leadingplatformforcrowdsourcingdatachallenges.Famous competitionand scientist of KaggleKaggle Drug Discovery Competition,2012GeoffreyHintonTalkingDatahost the competitioni
3、n Kaggle forA famous MachineLearning companywho createdGraphLab.Acquired by Apple Inc.in AugustProvide open data andopen platform forSmartest scientist and smartest methodologyTo solve most challenging topicsAcknowledgeTuriTalkingDataCompetitionwas announced in DataScience Summit in San Francisco on
4、 July13thTalkingDataMobile User DemographicsToknow your users profilebyLearning from their behavior.Goal ofthe competition:GivenApplication usageand tracewith time stampsMobilebrand and device modeTo optimizeThe estimation of their ageand groupEvaluated byParticipationof the competition Largest comp
5、etitionhostedbyaChinesecompany.Amongthehighest numberofKernelsofall.Greatproportionoftopkagglers participated.70+countriesandregionsrepresentedbytheparticipantpool.1689 teams,1961 playerssubmitted,24,629 entries,2729 kernelsWhat is thePower of DataScienceMystery of DataScientistMagic of DataScientis
6、tThe Arena of DataScientistMagic of DataScientist Interpret the raw data Execute featureengineering Fine tune individual and ensemble models Avoid overfitting Fetchthe best toolsInterpret your dataFeatureengineeringActive time slotsActive areaDistrict specificfeaturesInstall but not activeBinary/Wei
7、ghted FeatureApp LabelPatternsAge/Gender asfeatureFine tune individual and ensemble modelsRandom ForestLogisticRegressionGBDTNeural NetworkSVMAdaboostStackAvoid OverfittingTrainSetTest SetPublicBoardPrivate BoardAlways doCross ValidationFetchthe best toolshighlymodularneuralnetworkslibrary,writtenin
8、PythoncapableofrunningontopofeitherTensorflowor Theano.allowsforeasyandfastprototypingrunsseamlesslyonCPUandGPU.KerasXGboostScalable,PortableandDistributedGradientBoostingLibrary,Runsonsinglemachine,Hadoop,Spark,FlinkandDataFlowHigher precision,winnerinHiggs BosonsignalcompetitioninKaggle,2014What i
9、s thePower of DataScienceMystery of DataScientistMagic of DataScientistThe Arena of DataScientistHow are our competitors Open and helpful Smart and creative,willing to solve problems in reality Elegantand rational Hard workingWhat we offeredIndustrial dataand valuable business problemsHonor to DataScientists professionalskills and spiritWhat we achieved from the competitionMind and Heart of the smartestscientistOver the WorldTHANKSLetsrockdatatogether!聘