ࡱ> 352bjbjUU v$??$!!!!5 !AAAAAAAAuwwwwwwtwAAAAAwAAAAAAAAuAAuAAAA@o!AAa0AAAA AAAAAAAAwwAAAAAAAAAAAAAAAAA :NOVEL CLASSIFIER CHAIN METHODS FOR MULTI-LABEL CLASSIFICATION BASED ON GENETIC ALGORITHMS Abstract. Multi-label classification (MLC) is the task of automatically assigning an object to multiple categories based on its characteristics. There are many important and modern applications of MLC such as music categorization (associating songs to various music genres) and functional genomics (determining the multiple biological functions of genes and proteins). First proposed in 2009, the classifier chains model (CC) has become one of the most influential methods for MLC. It is distinguished by its simple and _effective approach to exploit label dependencies. The CC method involves the training of q single-label binary classifiers, where q represents the number of labels. Each one is solely responsible for classifying a specific_c label. These q classifiers are linked in a chain in random order, such that each binary classifier is able to consider the labels predicted by the previous ones as additional information at classification time. CC is considered one of the most effective MLC methods, in the sense that it has proved to be competitive with state-of-the-art techniques. However, the basic CC model suffers from two major drawbacks: (i) it decides the label sequence randomly, although different label sequences might have a strong effect on the predictive accuracy of the model; (ii) it forces all labels to be present in the chain, despite the fact that some of them might carry redundant and/or irrelevant information to predict the various other labels. The main contribution of this thesis is the proposal of two novel techniques that enhance the effectiveness of multi-label chain classifiers by searching for a single optimized label sequence (i.e., a label sequence that leads to an improvement on the predictive accuracy of the CC model). These two techniques, named GACC and GA-PartCC, are based on Genetic Algorithms (GAs) which are search and optimization methods inspired by the principle of natural selection. One of the proposed strategies (GA-PartCC) is capable of evaluating chain sequences that vary not only in the ordering but also in length. The proposed GAs are evaluated, in terms of predictive performance, on diverse benchmark datasets. Overall, the results of our computational experiments have shown that the proposed GAs are competitive with well-known alternative multi-label classifier chain methods. Keywords: Multi-label classification, classifier chains, genetic algorithms. NOVEL CLASSIFIER CHAIN METHODS FOR MULTI-LABEL CLASSIFICATION BASED ON GENETIC ALGORITHMS Resumo. Classificao multirrtulo pode ser de_nida como a tarefa de associar automaticamente objetos a mltiplas categorias com base nas caractersticas dos mesmos. Existem muitas aplicaes modernas e importantes para a tarefa como, por exemplo, categorizao de msicas (associar msicas a diversos gneros musicais) e genmica funcional (determinar as mltiplas funes biolgicas de genes e protenas). Proposto em 2009, o modelo denominado classifier chains (CC) se tornou um dos mais influentes mtodos para classificao multirrtulo, destacando-se por sua abordagem simples e eficaz para explorar a questo da dependncia entre rtulos. O mtodo bsico envolve o treinamento de q classificadores monorrtulo binrios, onde q representa o nmero de rtulos. Cada um deles responsvel unicamente pela classificao de um rtulo de classe especfico. Esses q classificadores so ligados em uma estrutura de cadeia, de maneira que cada classificador binrio torna-se capaz de considerar os rtulos preditos pelos classificadores anteriores como informao adicional em tempo de classificao. O mtodo CC considerado um dos mais eficazes para classificao multirrtulo, demonstrando-se competitivo com o estado da arte nesta rea. Entretanto, ele possui duas desvantagens: (i) determina a ordem dos rtulos na cadeia de maneira aleatria, embora diferentes ordenaes possam influir de maneira significativa na acurcia do modelo; (ii) obriga todos os rtulos a participar da cadeia, mesmo que alguns contenham informao redundante e/ou irrelevante para a previso dos vrios outros rtulos. O objetivo principal deste trabalho a proposta de duas novas tcnicas capazes de aprimorar a eficcia dos classificadores multirrtulo em cadeia atravs da busca por uma ordenao de cadeia otimizada (isto , determinar uma ordenao capaz de aumentar a acurcia do classificador). Essas duas tcnicas, denominadas GACC e GA-PartCC, so baseadas em Algoritmos Genticos (AGs), que correspondem a mtodos de busca e otimizao inspirados no princpio da seleo natural. Uma das estratgias propostas (GAPartCC) capaz de avaliar cadeias de rtulos que variam no apenas na ordenao, mas tambm em comprimento. Os AGs propostos foram avaliados, em termos de desempenho preditivo, em diferentes bases de dados. Os resultados dos experimentos computacionais demonstraram que, em geral, os AGs propostos produzem resultados competitivos em relao a outros mtodos de classificao multirrtulo em cadeia propostos na literatura. Palavras-chave: Classificao multirrtulo, cadeias de classificadores, algoritmos genticos. Y[d| < R S T _ v<D"˺˺˺˰ٰ땃كnnn(h\<6CJOJQJ]^JaJmHsH"h\<CJOJQJ^JaJmHsH(h\<5CJOJQJ\^JaJmHsH h\<^Jh\<OJQJ^J h\<6CJOJQJ]^JaJh\<CJOJQJ^JaJ"h\<CJOJQJ^JaJmH sH (h\<5CJOJQJ\^JaJmH sH )Z['=#$%$a$"%3"h\<CJOJQJ^JaJmHsH(h\<5CJOJQJ\^JaJmHsHh\<OJQJ^JmHsH6P1h. A!"#$% Dp<P1h:p\<. A!"#$% Dp^ 666666666vvvvvvvvv66666686666666666666666666666666666666666666666666666666hH6666666666666666666666666666666666666666666666666666666666666666662 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XV~_HmHnHsHtH^`^ Normald*$(CJOJQJ^J_HaJmH nHsH tHDA D 0Default Paragraph FontRiR 0 Table Normal4 l4a (k ( 0No List @/@ 0Fonte parg. padro1`` 0Pr-formatao HTML CharCJOJPJQJ^JaJN"N 0Ttulo1 $xCJOJPJQJ^JaJ8B"8 0 Body Text d P/1P t0Body Text CharOJQJ^JmH nHsH tH$/!B$ 0ListD"RD 0Caption  $xx6CJ]aJ,b, 0ndice $er 0HTML PreformattedA 2( Px 4 #\'*.25@9dCJOJQJ^JaJmHsHh/h t0HTML Preformatted Char$CJOJQJ^JaJmH nHsH tHPK![Content_Types].xmlj0Eжr(΢Iw},-j4 wP-t#bΙ{UTU^hd}㨫)*1P' ^W0)T9<l#$yi};~@(Hu* Dנz/0ǰ $ X3aZ,D0j~3߶b~i>3\`?/[G\!-Rk.sԻ..a濭?PK!֧6 _rels/.relsj0 }Q%v/C/}(h"O = C?hv=Ʌ%[xp{۵_Pѣ<1H0ORBdJE4b$q_6LR7`0̞O,En7Lib/SeеPK!kytheme/theme/themeManager.xml M @}w7c(EbˮCAǠҟ7՛K Y, e.|,H,lxɴIsQ}#Ր ֵ+!,^$j=GW)E+& 8PK!Ptheme/theme/theme1.xmlYOo6w toc'vuر-MniP@I}úama[إ4:lЯGRX^6؊>$ !)O^rC$y@/yH*񄴽)޵߻UDb`}"qۋJחX^)I`nEp)liV[]1M<OP6r=zgbIguSebORD۫qu gZo~ٺlAplxpT0+[}`jzAV2Fi@qv֬5\|ʜ̭NleXdsjcs7f W+Ն7`g ȘJj|h(KD- dXiJ؇(x$( :;˹! I_TS 1?E??ZBΪmU/?~xY'y5g&΋/ɋ>GMGeD3Vq%'#q$8K)fw9:ĵ x}rxwr:\TZaG*y8IjbRc|XŻǿI u3KGnD1NIBs RuK>V.EL+M2#'fi ~V vl{u8zH *:(W☕ ~JTe\O*tHGHY}KNP*ݾ˦TѼ9/#A7qZ$*c?qUnwN%Oi4 =3ڗP 1Pm \\9Mؓ2aD];Yt\[x]}Wr|]g- eW )6-rCSj id DЇAΜIqbJ#x꺃 6k#ASh&ʌt(Q%p%m&]caSl=X\P1Mh9MVdDAaVB[݈fJíP|8 քAV^f Hn- "d>znNJ ة>b&2vKyϼD:,AGm\nziÙ.uχYC6OMf3or$5NHT[XF64T,ќM0E)`#5XY`פ;%1U٥m;R>QD DcpU'&LE/pm%]8firS4d 7y\`JnίI R3U~7+׸#m qBiDi*L69mY&iHE=(K&N!V.KeLDĕ{D vEꦚdeNƟe(MN9ߜR6&3(a/DUz<{ˊYȳV)9Z[4^n5!J?Q3eBoCM m<.vpIYfZY_p[=al-Y}Nc͙ŋ4vfavl'SA8|*u{-ߟ0%M07%<ҍPK! ѐ'theme/theme/_rels/themeManager.xml.relsM 0wooӺ&݈Э5 6?$Q ,.aic21h:qm@RN;d`o7gK(M&$R(.1r'JЊT8V"AȻHu}|$b{P8g/]QAsم(#L[PK-![Content_Types].xmlPK-!֧6 +_rels/.relsPK-!kytheme/theme/themeManager.xmlPK-!Ptheme/theme/theme1.xmlPK-! ѐ' theme/theme/_rels/themeManager.xml.relsPK] $8$"\< @@@UnknownG*Ax Times New Roman5Symbol3" Arial7.@Calibri?= *Cx Courier NewG& xP!Liberation SansG.R<(Microsoft YaHeiACambria Math" C: C:4P!0$P !xxYNOVEL CLASSIFIER CHAIN METHODS FOR MULTI-LABEL CLASSIFICATION BASED ON GENETIC ALGORITHMSEduardo Corra GonalvesHelioOh+'0,8D\ lx   \NOVEL CLASSIFIER CHAIN METHODS FOR MULTI-LABEL CLASSIFICATION BASED ON GENETIC ALGORITHMSEduardo Corra GonalvesNormal_WordconvHelio2Microsoft Office Outlook@F#@ @ 4P՜.+,0@ hp|  " ZNOVEL CLASSIFIER CHAIN METHODS FOR MULTI-LABEL CLASSIFICATION BASED ON GENETIC ALGORITHMS Title  !#$%&'()+,-./014Root Entry F 夯61TableWordDocumentv$SummaryInformation("DocumentSummaryInformation8*CompObjy  F'Microsoft Office Word 97-2003 Document MSWordDocWord.Document.89q  F#Documento do Microsoft Office Word MSWordDocWord.Document.89q