╨╧рб▒с;■  #■                                                                                                                                                                                                                                                                                                                                                                                                                                                   ¤       ■   ■     !"■   $■                                                                                                                                                                                                                                                                                                                                                                               Root Entry            ■               ■               ■               ■   ■   ■    ■    ■   "■                                                                                                                                                                                                                                                                                                                                                                                       ■       └FMicrosoft Word-Dokument MSWordDocWord.Document.8Ї9▓q [dё dNormal$a$1$A$*$3B*OJQJCJmHsHKHPJnHtH^JaJ_H9D■DTэtulo дЁдx$OJQJCJPJ^JaJ>B>Corpo do textodддМ"/"Lista^J@""@Legenda дxдx $CJ6^JaJ](■2(═ndice $^Jq4    т,ьт,qqPGРTimes New Roman5РSymbol3&РArialiРLiberation SerifTimes New Roman7РNSimSun3РArialSРLiberation SansArialGРMicrosoft YaHeiBН┼hiм√Eiм√Eiм√EгPГРгP╟ 0 0А■ рЕЯЄ∙OhлС+'│┘0|8 @ L X d pщ¤2@d№N@@i┼$╢н╪@Ам┘Г╗н╪■ ╒═╒Ь.УЧ+,∙оD╒═╒Ь.УЧ+,∙о\щ¤щ¤ьеM Ё┐0т,Caolan80 54q      Иммм└ ╠(р)bьр ┘RESUMO Computaчуo em nuvem щ atualmente uma das principais opчїes no cenсrio de infraestrutura computacional. Alщm de vantagens como o modelo de fatura pay-per-use e elasticidade de recursos, hс vantagens tщcnicas quanto р heterogeneidade e configuraчуo em larga escala. No modelo Infrastructure as a Service (IaaS), uma gama de recursos fэsicos e virtuais estс disponэvel para alocaчуo dinтmica, de acordo com a demanda do cliente, muitas vezes parecendo ilimitada em termos de tempo ou quantidade. Alщm disso, como alternativa ao modelo padrуo de precificaчуo de mсquinas virtuais (VMs), os provedores de nuvem oferecem preчos com desconto para o aluguel de VMs preemptivas, que podem ser revogadas a qualquer momento pelo provedor. Assim, aplicaчїes tolerantes a falhas podem se beneficiar deste mercado preemptivo visando a reduчуo de custos monetсrios. Ao lado da necessidade clсssica de desempenho (e.g., tempo, espaчo, e energia), hс um interesse no custo financeiro que pode vir de restriчїes orчamentсrias. Com base nas consideraчїes de escalabilidade e no modelo de preчos das nuvens p·blicas tradicionais, uma saэda esperada para a estratщgia de otimizaчуo poderia ser a configuraчуo de VMs mais adequada para executar uma carga de trabalho especэfica. Neste trabalho, щ desenvolvida uma aplicaчуo Spark baseada no modelo MapReduce, denominada Diff Sequences Spark, que realiza comparaчїes de sequъncias biolєgicas e identifica as ocorrъncias de caracteres de nucleotэdeos nуo correspondentes. Tal aplicaчуo щ executada considerando comparaчїes de sequъncias de coronavэrus SARS-CoV-2, que щ o vэrus responsсvel pela doenчa COVID-19, usando o serviчo de nuvem AWS EC2 da Amazon em instтncias de VM on-demand (padrуo) e spot (preemptiva). Sob a perspectiva de otimizaчїes de tempo de execuчуo e custo monetсrio, щ proposta uma adaptaчуo de um modelo de custo de execuчуo extraэdo da literatura, cuja avaliaчуo experimental obteve baixas taxas de erro. Os resultados experimentais usando tais otimizaчїes superaram os cenсrios onde um usuсrio de nuvem inexperiente selecionaria VMs sem qualquer critщrio razoсvel. Por fim, foram alcanчados reduчїes de custos monetсrios ao usar instтncias spot em comparaчуo com suas respectivas opчїes on-demand, mesmo em cenсrios com vсrias revogaчїes de spot Workers em um cluster Spark. Palavras-chave: Apache Spark; MapReduce; Computaчуo em nuvem; Otimizaчуo. ABSTRACT Cloud computing is currently one of the prime choices in the computing infrastructure landscape. In addition to advantages such as the pay-per-use bill model and resource elasticity, there are technical benefits regarding heterogeneity and large-scale configuration. In the Infrastructure as a Service (IaaS) model, a range of physical and virtual resources is available for dynamic allocation, according to the customer demand, often appearing limitless in terms of time or quantity. Besides, as an alternative to the standard virtual machines (VMs) pricing model, the cloud providers offer discounted prices for the rental of preemptive VMs, which can be revoked anytime by the provider. Therefore, fault-tolerant applications may benefit from this preemptive market seeking monetary cost reductions. Alongside the classical need for performance (e.g., time, space, and energy), there is an interest in the financial cost that might come from budget constraints. Based on scalability considerations and the pricing model of traditional public clouds, an expected output for the optimization strategy could be the most suitable configuration of VMs to run a specific workload. In this work, is developed an Spark application based on the MapReduce model, named Diff Sequences Spark, which performs comparisons of biological sequences and identifies the occurrences of mismatching nucleotide characters. Such an application runs considering the SARS-CoV-2 coronavirus sequence comparisons, which is the virus responsible for the COVID-19 disease, using Amazon's AWS EC2 cloud service in both on-demand (standard) and spot (preemptive) VM instances. From the perspective of execution time and monetary cost optimizations, the adaptation of an execution cost model extracted from the literature is provided, whose experimental evaluation obtained low error rates. Experimental results using such optimizations outperformed scenarios where an inexperienced cloud user would select VMs without any reasonable criteria. Finally, reduced monetary costs were achieved when using spot instances compared to their respective on-demand options, even in scenarios with multiple spot Workers revocations on a Spark cluster. Keywords: Apache Spark; MapReduce; Cloud computing; Optimization.  2 J 4 l <Hl~Ш└\nЖО.6МЮ°8>\╢!╛!Ё$%Z,\,^,p,р,т,№№°°°°°°°°°°°°№№ЇЇэЇэЇЇ№Ї 65]\5\6]5\+<>╥╘╓╪┌▄▐ртфцшъь·їЁыцс▄╫╥═╚├╛╣┤пке$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$ьюЁЄЇЎ°·№■\,^,т,·їЁыцс▄╫╥═╚├╛╣┤$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$$a$3<0░В. ░╞A!░n"░n#Рn$Рn2P1Рd0p3P(20Root Entry         └F└CompObj    jOle         1Table            ЛSummaryInformation(    мWordDocument            54DocumentSummaryInformation8            !t            ■