• Contact us
  • Give feedback
  • About
    • CONACYT Institutional Repository (RI-CONACYT)
    • Frequently Asked Questions
    • español
    • English
View Item 
  •   RI-CONACYT Home
  • Producción académica
  • Tesis de Maestría
  • View Item
  •   RI-CONACYT Home
  • Producción académica
  • Tesis de Maestría
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Time series clustering and data augmentation techniques to improve the forecast of Dengue cases in paraguay with deep learning

Juan_Bogado-Tesis.pdf (3.209Mb)
Export
RISMendeleyRefworksZotero
Share
URI
http://hdl.handle.net/20.500.14066/3971
Metadata
Show full item record
Author(s)
Bogado Machuca, Juan Vicente
Adviser
Schaerer Serra, Christian EmilioCONACYT Authority
Date of publishing
2020
Type of publication
master thesis
Subject(s)
DENGUE
MACHINE LEARNING AND DEEP LEARNING (DL)
STATICAL PROJECTIONS
 
Abstract
Dengue fever is a public health problem and accurate forecasts can help govern ments to take the best preventive actions. As the volume of data provided contin uously increases, machine learning and deep learning (DL) models have become an attractive approach. However, it is difficult to perform accurate predictions in areas with fewer cases. In this work, traditional approaches such as LARS LASSO Re gression (LR), Random Forest (RF), Support Vector Regression (SVR) vs DL mod els based on Long Short-Term Memory (LSTM) are compared, considering weekly Dengue incidence and climate, in 217 cities in Paraguay. Several cities may present heterogeneous behaviors and poor accuracy, to miti gate this problem, two approaches are proposed: clustering and data augmentation. First, clustering analysis between time series was performed, based on silhouette scores for measuring how well observations are clustered. Results indicate that hi erarchical clustering combined with correlation is the most appropriate approach. Then several LSTM models are compared on subgroups of similar time series. Sec ond, several data augmentation techniques were applied, and the synthetic time series obtained was used as input to train models, the results indicate that the syn thetic series obtained with Bayesian estimation technique are the one that improved the performance of the model. The Root Mean Square Error (RMSE) confirms that the LSTM clustered mod els improve the accuracy in 19.48 ± 18.80% and LSTM with Bayesian based data augmentation improves 16.86±16.57%. The main contribution of this work are two techniques that can improve the performance of time series models by combining information from similar time-series and weather data.
Collections
  • Tesis de Maestría

Browse

All of RI-CONACYTCommunities and CollectionsBy Issue DateAuthorsTitlesSubjectsAuthor profilesThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

View Usage Statistics

Consejo Nacional de Ciencia y Tecnología (CONACYT)

Dr. Justo Prieto N 223 entre Teófilo del Puerto y Nicolás Billof, Villa Aurelia.

Telefax: +(595-21) 506 223 / 506 331 / 506 369

Código Postal 001417 - Villa Aurelia

Asunción - Paraguay