ࡱ> bdabjbjUU > ??)%%%%%#%#%#%8[% g%#%x{%%%%%%N%%kxmxmxmxmxmxmxz}Jmx%&%%&&mx%%%%x&&&&%%%%kx&&kx&&NTX%@#%&U0Wxx0x V}&}`XX4 }%b`&&&&&&&&mxmx&&&&x&&&&}&&&&&&&&& #:Resumo Sistemas de recomendao so uma popular ferramenta para sugerir produtos, servios e informao para potenciais consumidores, que se baseiam em transaes passadas e feedback de outros usurios tm interesses comuns. Com o enorme crescimento de usurios, produtos e informaes disponveis na internet e o rpido surgimento de novos servios online, a tarefa de gerar diversas recomendaes por segundo, para milhares de usurios se tornou uma necessidade e um desafio. Muitos sistemas de recomendao sugerem itens para os usurios utilizando tcnicas de filtro colaborativo, que processam o histrico de itens vistos, comprados ou avaliados pelos usurios. Dois principais problemas enfrentados pela maior parte das abordagens de filtro colaborativo so a escalabilidade e a esparsidade da matriz de perfil dos usurios, que foram superados com sucesso pela tcnica de modelos de fatores latentes. As concepes mais bem sucedidas de modelos de fatores latentes so baseadas em fatorao de matrizes. Dentre os algoritmos de fatorao de matrizes, o mnimos quadrados alternantes (ALS, do ingls alternating least squares), se destaca pelo fato de que suas etapas de clculo so facilmente paralelizveis. Neste trabalho propomos uma metodologia para comparao do desempenho entre duas implementaes paralelas do algoritmo ALS, um executado com o paradigma MapReduce no arcabouo Apache Hadoop, e outro executado no arcabouo Apache Spark, que faz uso de blocos de forma cuidadosa para reduzir o processo de coleta de lixo da JVM e tambm para melhor utilizar operaes de lgebra linear de alto nvel. So realizados experimentos de avaliao de desempenho quanto acurcia das recomendaes e ao tempo de treinamento de ambos os algoritmos quando executados em diferentes conjuntos de dados publicamente disponveis, de diferentes tamanhos e pertencentes a diferentes domnios de recomendao. Os resultados experimentais confirmam que a implementao no Spark mais eficiente, uma vez que o processamento realizado em memria e no em disco como no Hadoop. Palavras-chave: sistemas de recomendao escalveis, filtro colaborativo, fatorao de matrizes, alternating least squares, Apache Spark, Apache Hadoop, MLlib, Mahout. Abstract Recommender systems are now a popular tool used to suggest products, services and information to potential consumers, based on their profile of past transactions and feedback from other users that share similar interests. With the tremendous growth of users, products and information made available on the web and the rapid introduction of new e-business services, performing many recommendations per second for millions of users has become a necessity and a challenge. Many recommender systems suggest items to users employing collaborative filtering techniques, which process historical records of items that the users have viewed, purchased, or rated. Two major problems that most collaborative filtering approaches have to resolve are scalability and sparseness of the users profile matrix, which have been successfully overcome with the use of latent factor models technique. The most successful realizations of latent factor models are based on matrix factorization. Among the algorithms for matrix factorization, alternating least squares (ALS) stands out because its computations are easily parallelizable. In this work we propose a methodology for comparing the performance of two parallel implementations of the ALS algorithm, one executed with MapReduce in Apache Hadoop framework and another executed in Apache Spark framework, which makes careful use of blocking to reduce JVM garbage collection overhead and to utilize higher-level linear algebra operations. We perform experiments to evaluate the accuracy of generated recommendations and the execution time of both algorithms, using publicly available datasets with different sizes and from different recommendation domains. Experimental results show that running the recommendation algorithm on Spark framework is in fact more efficient, once it provides in-memory processing, in contrast to Hadoops two-stage disk-based MapReduce paradigm. Keywords: scalable recommender systems, collaborative filtering, matrix factorization, alternating least squares, Apache Spark, Apache Hadoop, MLlib, Mahout.     PAGE \* MERGEFORMAT2 ^ d R k DEFG     ͻxtxtxtxtxtxkxtdt h?LnhY5OhY5OmHnHuhY5OjhY5OUhYhY5O5\mH sH BhYhY5OB*CJOJQJ^JaJfHmH ph%%%q sH hYhY5OmH sH "hYhY5O5CJ\aJmH sH h)IhY5O5\hB hY5O5\hB hY5O6] hB hY5OhB hY5O5CJ\aJ"FG dgdD $dhha$gdV $dha$gd5 $dha$gd5$dh`a$gd{y$a$gdS $dha$gdU< $dha$gdU<$ +dh`a$gdB $dhha$gdA   $dhha$gdVdhgd?Ln$a$ dgdD <P1h:p . A!5"5#n$n% Dpj3  666666666666666 666666666666666666666666666 6666666666 666666666666 6666666666666666666666666666666666666666666666666666666666666666662 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XVx OJPJQJ_HmHnHsHtHR`R GvnNormal d CJOJQJ_HaJmHsHtH XX d!n0 Heading 1$$dhx@&5CJPJ\aJnn x90 Heading 27$$ & F h@dhx@&^@` 5PJ\nn x90 Heading 37$$ & F h0dhx@&^`0 5PJ\nn x90 Heading 47$$ & F h`dhx@&^`` 5PJ\xx t @0 Heading 51$$ & F h@&^`B*OJPJQJ^Jph$?`~~ t @0 Heading 61$$ & F h@&^`6B*OJPJQJ]^Jph$?`~~ t @0 Heading 71$$ & F h@&^`6B*OJPJQJ]^Jph@@@ t @0 Heading 81$$ & F h`@&^``!B*CJOJPJQJ^JaJph@@@  t @0 Heading 91 $$ & F h0@&^0`'6B*CJOJPJQJ]^JaJph@@@DA D 0Default Paragraph FontRiR 0 Table Normal4 l4a (k ( 0No List N/N d!n0Heading 1 Char5CJOJQJ\^JaJN/N x90Heading 2 Char5CJOJQJ\^JaJN/N x90Heading 3 Char5CJOJQJ\^JaJN/!N x90Heading 4 Char5CJOJQJ\^JaJJ/1J t @0Heading 5 CharB*OJQJ^Jph$?`P/AP t @0Heading 6 Char6B*OJQJ]^Jph$?`P/QP t @0Heading 7 Char6B*OJQJ]^Jph@@@R/aR t @0Heading 8 CharB*CJOJQJ^JaJph@@@X/qX  t @0Heading 9 Char#6B*CJOJQJ]^JaJph@@@>@> D0Header 8!d./. D0 Header Char> > D0Footer 8!d./. D0 Footer CharRR D0 Balloon Text dCJOJQJ^JaJN/N D0Balloon Text CharCJOJQJ^JaJ: : &~z0 TOC Heading@& tH>> 140TOC 2 p& d^22 =0TOC 1 & d6U6 A0 Hyperlink >*B*phFF 140TOC 3" (& d^PJtH@2@ t @0List Paragraph #^4 4 ;0 Bibliography$BQB r0apple-converted-spaceB'aB ,0Comment ReferenceCJaJBrB (,0 Comment Text'dCJaJN/N ',0Comment Text CharCJOJQJ^JaJ@jqr@ *,0Comment Subject)5\F/F ),0Comment Subject Char5\BB I|0Placeholder Text B*ph>"> u0Caption,d5CJ\aJP P A0 No Spacing- CJOJQJ_HaJmHsHtH @#@ ,?0Table of Figures.  M^0 Table GridI:V/0ak / CJ^JaJ  M^0 Light ShadingY:V00aj;@ j; jDjDk 440f f AB*CJ^JaJph5\5\5\5\  M^0 Medium List 2=:V10aj;@ j; jQ jQ jQ jQ j j k 441.B*CJOJPJQJ^JaJph CJaJ,/!,oEstilo12 F PK![Content_Types].xmlj0Eжr(΢Iw},-j4 wP-t#bΙ{UTU^hd}㨫)*1P' ^W0)T9<l#$yi};~@(Hu* Dנz/0ǰ $ X3aZ,D0j~3߶b~i>3\`?/[G\!-Rk.sԻ..a濭?PK!֧6 _rels/.relsj0 }Q%v/C/}(h"O = C?hv=Ʌ%[xp{۵_Pѣ<1H0ORBdJE4b$q_6LR7`0̞O,En7Lib/SeеPK!kytheme/theme/themeManager.xml M @}w7c(EbˮCAǠҟ7՛K Y, e.|,H,lxɴIsQ}#Ր ֵ+!,^$j=GW)E+& 8PK!Ptheme/theme/theme1.xmlYOo6w toc'vuر-MniP@I}úama[إ4:lЯGRX^6؊>$ !)O^rC$y@/yH*񄴽)޵߻UDb`}"qۋJחX^)I`nEp)liV[]1M<OP6r=zgbIguSebORD۫qu gZo~ٺlAplxpT0+[}`jzAV2Fi@qv֬5\|ʜ̭NleXdsjcs7f W+Ն7`g ȘJj|h(KD- dXiJ؇(x$( :;˹! I_TS 1?E??ZBΪmU/?~xY'y5g&΋/ɋ>GMGeD3Vq%'#q$8K)fw9:ĵ x}rxwr:\TZaG*y8IjbRc|XŻǿI u3KGnD1NIBs RuK>V.EL+M2#'fi ~V vl{u8zH *:(W☕ ~JTe\O*tHGHY}KNP*ݾ˦TѼ9/#A7qZ$*c?qUnwN%Oi4 =3ڗP 1Pm \\9Mؓ2aD];Yt\[x]}Wr|]g- eW )6-rCSj id DЇAΜIqbJ#x꺃 6k#ASh&ʌt(Q%p%m&]caSl=X\P1Mh9MVdDAaVB[݈fJíP|8 քAV^f Hn- "d>znNJ ة>b&2vKyϼD:,AGm\nziÙ.uχYC6OMf3or$5NHT[XF64T,ќM0E)`#5XY`פ;%1U٥m;R>QD DcpU'&LE/pm%]8firS4d 7y\`JnίI R3U~7+׸#m qBiDi*L69mY&iHE=(K&N!V.KeLDĕ{D vEꦚdeNƟe(MN9ߜR6&3(a/DUz<{ˊYȳV)9Z[4^n5!J?Q3eBoCM m<.vpIYfZY_p[=al-Y}Nc͙ŋ4vfavl'SA8|*u{-ߟ0%M07%<ҍPK! ѐ'theme/theme/_rels/themeManager.xml.relsM 0wooӺ&݈Э5 6?$Q ,.aic21h:qm@RN;d`o7gK(M&$R(.1r'JЊT8V"AȻHu}|$b{P8g/]QAsم(#L[PK-![Content_Types].xmlPK-!֧6 +_rels/.relsPK-!kytheme/theme/themeManager.xmlPK-!Ptheme/theme/theme1.xmlPK-! ѐ' theme/theme/_rels/themeManager.xml.relsPK]   (((((+  "$+! _Toc421724427 _Toc421724428_GoBack& & uC0*%,gE6+)>!+~ E 36;4e4-0&8YBwC9.z@*JDBWx>FJ:NG%5hčxG;iऌ|8j%KGl^KU:n^k>Mo%=0w2z IT~M@l-^-`OJQJ^Jo(^`OJQJ^Jo(o ^ `OJQJ^Jo( ^ `OJQJ^Jo(m^m`OJQJ^Jo(o=^=`OJQJ^Jo( ^ `OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo( ^`o( Chapter .^`.pL^p`L.@ ^@ `.^`.L^`L.^`.^`.PL^P`L.^`OJQJ^Jo(e^e`OJQJ^Jo(o5 ^5 `OJQJ^Jo(^`OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo(u^u`OJQJ^Jo(E^E`OJQJ^Jo(o^`OJQJ^Jo( ^`o( Chapter - ^`.pL^p`L.@ ^@ `.^`.L^`L.^`.^`.PL^P`L.-^-`OJPJQJo(^`OJQJ^Jo(o ^ `OJQJ^Jo( ^ `OJQJ^Jo(m^m`OJQJ^Jo(o=^=`OJQJ^Jo( ^ `OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo(^`OJQJ^Jo(^`OJQJ^Jo(op^p`OJQJ^Jo(@ ^@ `OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo(^`OJQJ^Jo(^`OJQJ^Jo(oP^P`OJQJ^Jo(^`OJPJQJo(e^e`OJQJ^Jo(o5 ^5 `OJQJ^Jo(^`OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo(u^u`OJQJ^Jo(E^E`OJQJ^Jo(o^`OJQJ^Jo( ^`o( Chapter .^`.pL^p`L.@ ^@ `.^`.L^`L.^`.^`.PL^P`L. ^ `OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo({^{`OJQJ^Jo(K ^K `OJQJ^Jo(o^`OJQJ^Jo(^`OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo(-^-`OJQJ^Jo(^`OJQJ^Jo(op^p`OJQJ^Jo(@ ^@ `OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo(^`OJQJ^Jo(^`OJQJ^Jo(oP^P`OJQJ^Jo( e^e`o( Chapter - ^`.pL^p`L.@ ^@ `.^`.L^`L.^`.^`.PL^P`L. #^`56CJOJQJ\]^JaJo( Chapter - ^`.pL^p`L.@ ^@ `.^`.L^`L.^`.^`.PL^P`L.P^`P@^@`.0^`0..`^``... ^` .... ^` ..... ^` ...... `^``....... 0^0`........ e^e`o( Chapter - ^`.pL^p`L.@ ^@ `.^`.L^`L.^`.^`.PL^P`L.-^-`OJPJQJo(^`OJQJ^Jo(o ^ `OJQJ^Jo( ^ `OJQJ^Jo(m^m`OJQJ^Jo(o=^=`OJQJ^Jo( ^ `OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo(,^,`OJQJ^Jo(^`OJQJ^Jo(o ^ `OJQJ^Jo( ^ `OJQJ^Jo(l^l`OJQJ^Jo(o<^<`OJQJ^Jo( ^ `OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo(,^,`.^`. L^ `L. ^ `.l^l`.<L^<`L. ^ `.^`.L^`L. ^`o( Chapter - ^`.pL^p`L.@ ^@ `.^`.L^`L.^`.^`.PL^P`L.P^`P@^@`.0^`0..`^``... ^` .... ^` ..... ^` ...... `^``....... 0^0`........ P^`Po( Chapter @^@`.0^`0..`^``... ^` .... ^` ..... ^` ...... `^``....... 0^0`........-^-`OJQJ^Jo(^`OJQJ^Jo(o ^ `OJQJ^Jo( ^ `OJQJ^Jo(m^m`OJQJ^Jo(o=^=`OJQJ^Jo( ^ `OJQJ^Jo(^`OJQJ^Jo(o^`OJQJ^Jo(:n+)&8%JD5hWx>F=0w:NGk>MoxG;iwC94e4z@E 3|8juCIT~!+Gl=0wgEHde*Hd^L^^*RtIw ,Jw ,Kw ,Lw ,Mw ,Nw ,Ow ,Pw ,Qw ,Rw ,Sw ,Tw ,Uw ,Vw ,Ww ,Xw ,Yw ,Zw ,[w ,\w ,]w ,w , <K^w ,_w ,`w ,aw ,bw ,cw ,dw ,ew ,fw ,gw ,hw ,iw ,w ,jw ,kw ,lw ,mw ,nw ,ow ,pw ,qw ,rw ,sw ,tw ,uw ,vw ,ww ,xw ,yw ,zw ,{w ,|w ,}w ,~w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,]w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,-w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,w ,x ,x ,R `9:?l3-36AU=Z\lj5Vc=ET[]q+R5CXm`(6FG\a~ly s@@mIJyQ\u|"Zz $ k7 < d` w   B > H `P o J  LU $c  b A* ; Y^ f m   * OO  jvpYZf4EAFnr{#H5X[ \ 4dVtT}>?4O=^qrsK[l}:!7EYZ][\~ra}  W(2:_m7!9=EEfa &+J 3FLY"y{0 ?+AER&Xc  eh#D:Um| @ ! L1 !!%!xx!)"G"x"#9##32#P6#6#iL##g#Or#R,$U%F%"%,)%B%RD%E%&(&&T&V&r'','*'2'v@'S'0t'q((I`(s(A )A)d)*3*c?*^*;b*Wz*+#+v-+q6+=+:J+J+L+`R+n+0~+v,0,8,-_,o,w,--C-P-_-or-w-D.V.@{.1/@L/rR/b/~/;Z0)111hD1vD1m1|12T@2A2CT2y2f&3)3,3>3g34"4e-4.4A4Y4X 551#5B25d5:y5#6Wc64f6 |6-|6Y7Vg7v77'8z=8A8\8a8u89N19A9|M94h9x9:# :$:^;:@:pG:Y:u:+;;+;D,;A;J;L;d;y;7<I<U<~<==/=2=">)>K>E[>OB? _?_?d?Zv?t @@A4A`ABBBGBmMBPB.\BxBCKC D(D-D!2DaD~D_EJE&]ErEbtEoF9FJFPFBWFG?GH0H/pHCII)I&dIUgI |I@JNJWJ2_J K KKQ^KzmK L:LP;LzBLVFLKLSL cLlL9M[MNN$N9NBNfNTmNdoN{NY5O OOWOp^ObnOP5"P/P<7PV7PBPBPJPzPQ fQvQOR.S1S:SD_SNbScS_|S& T TUTs2TE9TxTU!UA/U7U'SU8lU V)V(VTV\VgVW W@W)W=-W-GWXBXoNX;PXyX~ Y1YEDY+cYlYSmYqYZ-ZxTZ2tZ8[yi[n[.t[O?\m]\] ]%]5]}O]U]mx]y]0^^i^M^]^h^j^_+_B_F_ed_*m_ `I`J`{e`a=%ah)a-aI8aSama bJb/bw3b7b;bzy{yzBz&~zF{sk{Rq{B|6|.|I|Q|}0} D} U}X}Z~>~O~i[~l~* :ivj=@8GiOXuR "14fk%?\fkqc/, 8: K>,?Gfopu<s5)Gch?C-`x `/Tcpr{` giz?"@E-K5 V,\3<g@%8,=]_Ix&}"G*ARs\!ooY-)8x^9"BmCINOg/F*=/_7c,FGSw`uk+,EI/ZOKS2yqX2:=d c=,[^!vB$+ 38C'hUu:3YIn}*Min|68Bc{q Q$,07Zal\i.K25==sTdXu|H1 .1(CdPU.6_?X:[l?u>C5M:g}57BH OVbd2jy>&I`hNzh((/3E[bZeiB 7 gi9w0M,O@WkmsyX%,aeh7hHJK{$\f,p~1 Mj1Y%})8;Q<LOTYsF4PXcJ <A0ExHX Y_zA1J>W$.m0<E&oA9co+ ,'$&`cdk0o 6/NQ hh H! I%5;aAZaR&8N]SmADJ$Trr;jlr~ AAirD i }u?}: %&'S/!;X){{"[U%nDJVoH]{~fDEMMlht>m#'OWbm|>S2D*;X0KVVHZMJfpW7%9D<LFf@v=_uUvs'-?.DdJfke1s*c+"5XkP-??ZwXB<GmHP%nF<[$im u'-Ai!Z[,:t@Tc-ms0$,G\pe={bzU6NNq  +1&JMxU\SvE LkN P Qim *O_\#ty\55K8?@J9Kys !&-H[V]C. (] bmr%X"qrz@@@Unknown G*Ax Times New Roman5Symbol3. *Cx Arial7.@Calibri7@Cambria5..[`)Tahoma?= *Cx Courier New;WingdingsACambria Math"9g9g8'C5n0$PD!xxResumoDaniaHelio`         2     2  Oh+'0  < H T `lt|ResumoDaniaNormal_WordconvHelio2Microsoft Office Outlook@F#@ i71@a@aC՜.+,0 hp|   Resumo Title  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPRSTUVWXZ[\]^_`cRoot Entry Fݿe1Table.~WordDocument> SummaryInformation(QDocumentSummaryInformation8YCompObjy  F'Microsoft Office Word 97-2003 Document MSWordDocWord.Document.89q  F#Documento do Microsoft Office Word MSWordDocWord.Document.89q