Dataset
The data used for the present analysis are: the solar wind (SW) plasma parameters; the interplanetary magnetic field (IMF); the Dst index. The entire dataset has been obtained from the National Space Science Data Center of NASA, namely, from the OMNI database30. In particular, we used hourly averages of the three components ((B_x), (B_y), (B_z)) of the IMF in the GSM (Geocentric Solar Magnetospheric) reference frame (i.e. the x-axis of the GSM coordinate system is defined along the line connecting the center of the Sun to the center of the Earth; the origin is defined at the center of the Earth and is positive towards the Sun; the y-axis is defined as the cross product of the GSM x-axis and the magnetic dipole axis and is positive towards dusk; The z-axis is defined as the cross-product of the x- and y-axes; the magnetic dipole axis lies within the xz plane), the SW plasma temperature (T), density (D), total speed (V), pressure (P), and eastwest component of the electric field ((E_y) derived from (B_z) and (V_x)).
The dataset covers the period January 1990November 2019, and includes half of the 22nd solar cycle, all of the 23rd, and almost all of the 24th. To produce a robust forecasting of the Dst index, it is crucial to determine how the dataset is split and processed for the training and evaluation of the model. On the other hand, adopting a correct methodology for treating data is crucial to avoid bias especially when both a machine learning approach is used to develop predictive models and the data are time series.
If data are periodic, it is safe to train the model considering at least one complete period and test it on different periods. In fact, being the arrow of time fixed and the future unknown, the training operation that make use of points that follow the data used in the test can introduce bias. Therefore, the validation and test data-sets must be constructed by points of the time series that follow what is used for training one. In the present case, since we have only data from two solar cycles, the best option is to use one cycle for training and the other for both validation and test. Anyway, such a choice forces the validation to contain data relative to the first half of a solar cycle with a distribution of Dst values and storms different from the test set. Therefore, in our opinion, the most efficient choice for the validation and test process is to select points randomly for the two datasets.
Training a supervised fashion Deep Learning (DL) model requires both a balanced sampling of data referring to quiet and storm periods, and a proper evaluation of the metrics used to measure the performances. If not, the model will learn to predict only the most frequent case represented in the training set. Moreover, the standard performance metrics, computed on the full validation and test dataset, would produce a the prediction that would be correct most of the time but wrong in most relevant cases.
Taking care of these two aspects, we split the dataset using all the data before 1/1/2009 for training, and the remaining part for validation and test. In this way, we have at least one solar cycle for the training and one for the evaluation of the model. As previously said, for the validation and test we can choose dataset subsequent in time (i.e. ordered) or an equal number of points randomly from those available after 1/1/2009. The difference between random and ordered selection are displayed in Fig.1. In panel a the validation data includes the points in the first half of the cycle while the test is the other half. It is evident that the tail of the two distributions is different: in the validation dataset, events with very low Dst, which are particularly important being connected with storms, are missing. The situation completely changes when the points are picked randomly. In this case, the distributions are quite similar and also similar to the training dataset, representing the best starting point for the development of a data-driven predictive model. The last problem, directly connected to the data distribution, is that there are only few events associated with storms. In the framework used in this paper, where the algorithm learns by looking at the data, if the distribution is highly peaked around some value of the target variable, the algorithm will learn to predict only such values. To avoid this issue, we apply a re-weighting function for the sampling of the data that feed the algorithms training. In this way, every value of Dst is almost equally probable. The difference between the nominal distribution and the flatten (weighted) distribution is presented in Fig.1c.
Normalized distributions of Dst in the dataset used for training, validation and test. (a) Validation is the first half of the solar cycle period, test the second half. (b) Points for validation and test are randomly extracted. Train dataset includes all the available points before 1/1/2009. (c) Train dataset without and with re-weighting the low Dst events.
The points discussed above limit also the applicability of standard cross-validation methods usually recommended in machine learning applications to test the robustness of the models. While specific schemes of cross-validations have been developed for time series (e.g., the TimeSeriesSplit function available in the Scikit Python library), we prefer not to adopt this type of check because this kind of split increases the size of the training dataset, namely: in the first iterations, there are much fewer storms than in the latest. This automatically will favor the last iterations of the procedure in predicting storms, introducing an indirect bias in the interpretation of the results.
All the features are scaled linearly on a compact range as an additional pre-processing step. The scale is fitted on the training dataset, mapping these min and max values of data in 0.1 and 0.9, respectively. This choice leaves some room to accommodate smaller or larger values than those available in the training dataset that can emerge in future measurements of the variables.
The architecture of the Neural Network considered in this study is close to the one used in26 where a Long Short-Term Memory (LSTM) module is combined with a Fully Connected Network (FCNN). LSTM is a recurrent layer composed of cells designed to process long time series. The input of the proposed network is time series containing the variables described in Dataset for the 12 points in the time window ([t-11, t]). Each cell of the LSTM layer (Fig.2) receives in input one element (x_{t_i}) of this time series together with the outputs of the previous cells: the hidden state, (h_{t_{i-1}}), and the memory state, (c_{t_{i-1}}). As schematically depicted in the figure, these three sources of information are processed through fully connected layers and element-wise operations, all internal to the cells. In standard application of LSTM, the hidden state from the last cell represents the networks prediction, and the hidden states of all the other cells are not considered. In our approach, we collect and concatenate all the hidden states ([h_{t-11}, h_t]) in a multidimensional vector. This vector is then fed as input of a fully connected module. The output of this FCNN is the forecast of the Dst index for the hours ([t+1, t+12]).
Neural Network architecture used to forecast the Dst index as described in the text. In the LSTM cell, the square blocks are Fully Connected layers with activation function, while the circles are elementwise operations.
In optimizing DL networks, two types of parameters need to be fixed: the layers weights and the hyper-parameters specifying the architecture. During training, the back-propagation procedure takes care of the former, which can be millions or even billions (in our case 25,244). The others, typically limited in number (in our case 7), are usually determined manually by testing different solutions and considering only the training and validation dataset in the evaluation to avoid bias.
We found that better predictions are obtained using the following values for the hyper parameters:
LSTM, number of hidden layers: 2,
LSTM, size of the hidden layers: 8,
FCNN, number of layers: 4,
FCNN, number of output features for each layer: 96, 96, 48, 12.
Batch normalization is applied to the input vector of the FCNN, ReLU activation function, and a dropout layer with a drop factor of 0.2 follows every fully connected layer except the last one.
The loss function minimized during the training of the network is the Mean Absolute Error (MAE) function
$$begin{aligned} {text {MAE}} = frac{1}{N}sum _{i=1}^{N}left| y_{pred} - y_{true}right| _i end{aligned}$$
(1)
We use the Adam optimizer and a learning rate of (10^{-5}). During the training, back-propagation is applied after computing the loss on samples extracted from the dataset in batches. The procedure is repeated an arbitrary number of times. Statistics are collected after iterating back-propagation on as many samples as the number of elements in the training dataset: this is called an epoch. The training ends once the loss function stops decreasing on the validation dataset. We used batches of size 256 and stopped training after 10,000 epochs. Examples of the loss function behaviors are presented in Fig.3.
History of the loss function in the 10,000 epochs of the training.
The code with the implementation of the network architecture and the procedure to generate the training, validation, and test datasets are available as a Python notebook in the public GitLab repository gitlab.fbk.eu/dsip/dsip_physics/dsip_ph_space/Dstw.
A typical baseline forecast method for time series is the persistent model. The assumption at the base of this approach is that nothing changes between the last known value and all the future points:
$$begin{aligned} Dst(t + n) = Dst(t),quad nin mathbb {N}. end{aligned}$$
(2)
It is expected that the predictive power of this model will decrease with the increase of the forecast horizon; on the contrary, in the short term, assuming persistence is often a good approximation of the actual trend.
Different metrics can be considered to highlight and study models features and compare their predictive power. However, the focus of this work is the importance of how the training data are selected and used. This is appreciable even considering only the most common of these metrics, the Root Mean Squared Error (RMSE), defined as:
$$begin{aligned} text {RMSE}=sqrt{frac{sum _{i=1}^N left( y_{pred_i}-y_{true}right) ^2}{N}}. end{aligned}$$
(3)
Read the original here:
Prominence of the training data preparation in geomagnetic storm prediction using deep neural networks | Scientific Reports - Nature.com
- Working at DeepMind | Glassdoor [Last Updated On: September 8th, 2019] [Originally Added On: September 8th, 2019]
- DeepMind Q&A Dataset - New York University [Last Updated On: October 6th, 2019] [Originally Added On: October 6th, 2019]
- Google absorbs DeepMind healthcare unit 10 months after ... [Last Updated On: October 7th, 2019] [Originally Added On: October 7th, 2019]
- deep mind Mathematics, Machine Learning & Computer Science [Last Updated On: November 1st, 2019] [Originally Added On: November 1st, 2019]
- Health strategies of Google, Amazon, Apple, and Microsoft - Business Insider [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- To Understand The Future of AI, Study Its Past - Forbes [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- Tremor patients can be relieved of the shakes for THREE YEARS after having ultrasound waves - Herald Publicist [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- The San Francisco Gay Mens Chorus Toured the Deep South - SF Weekly [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- The Universe Speaks in Numbers: The deep relationship between math and physics - The Huntington News [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- MINI John Cooper Works GP is a two-seater hot hatch that shouts its 306 HP - SlashGear [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- How To Face An Anxiety Provoking Situation Like A Champion - Forbes [Last Updated On: November 21st, 2019] [Originally Added On: November 21st, 2019]
- The Most Iconic Tech Innovations of the 2010s - PCMag [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- Why tech companies need to hire philosophers - Quartz [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- Living on Purpose: Being thankful is a state of mind - Chattanooga Times Free Press [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- EDITORIAL: West explosion victims out of sight and clearly out of mind - Waco Tribune-Herald [Last Updated On: November 24th, 2019] [Originally Added On: November 24th, 2019]
- Do you need to sit still to be mindful? - The Sydney Morning Herald [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Listen To Two Neck Deep B-Sides, Beautiful Madness And Worth It - Kerrang! [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Worlds Last Male Northern White Rhino Brought Back To Life Using AI - International Business Times [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Eat, drink, and be merryonly if you keep in mind these food safety tips - Williamsburg Yorktown Daily [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- The alarming trip that changed Jeremy Clarksons mind on climate change - The Week UK [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Actionable Insights on Artificial Intelligence in Law Market with Future Growth Prospects by 2026 | AIBrain, Amazon, Anki, CloudMinds, Deepmind,... [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Searching for the Ghost Orchids of the Everglades - Discover Magazine [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Parkinsons tremors could be treated with SOUNDWAVES, claim scientists - Herald Publicist [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Golden State Warriors still have prolonged success in mind - Blue Man Hoop [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- 3 Gratitude Habits You Can Adopt Over The Thanksgiving Holiday For Deeper Connection And Joy - Forbes [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- The minds that built AI and the writer who adored them. - Mash Viral [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Parkinson's Patients are Mysteriously Losing the Ability to Swim After Treatment - Discover Magazine [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Hannah Fry, the woman making maths cool | Times2 - The Times [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Meditate with Urmila: Find balance of body, mind and breath - Gulf News [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- We have some important food safety tips to keep in mind while cooking this Thanksgiving - WQOW TV News 18 [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Being thankful is a state of mind | Opinion - Athens Daily Review [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- Can Synthetic Biology Inspire The Next Wave of AI? - SynBioBeta [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- LIVING ON PURPOSE: Being thankful is a state of mind - Times Tribune of Corbin [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- AI Hardware Summit Europe launches in Munich, Germany on 10-11 March 2020, the ecosystem event for AI hardware acceleration in Europe - Yahoo Finance [Last Updated On: December 5th, 2019] [Originally Added On: December 5th, 2019]
- Of course Facebook and Google want to solve social problems. Theyre hungry for our data - The Guardian [Last Updated On: December 5th, 2019] [Originally Added On: December 5th, 2019]
- Larry, Sergey, and the Mixed Legacy of Google-Turned-Alphabet - WIRED [Last Updated On: December 6th, 2019] [Originally Added On: December 6th, 2019]
- AI Index 2019 assesses global AI research, investment, and impact - VentureBeat [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- For the Holidays, the Gift of Self-Care - The New York Times [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Stopping a Mars mission from messing with the mind - Axios [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Feldman: Impeachment articles are 'high crimes' Founders had in mind | TheHill - The Hill [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Opinion | Frankenstein monsters will not be taking our jobs anytime soon - Livemint [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- DeepMind co-founder moves to Google as the AI lab positions itself for the future - The Verge [Last Updated On: December 11th, 2019] [Originally Added On: December 11th, 2019]
- Google Isn't Looking To Revolutionize Health Care, It Just Wants To Improve On The Status Quo - Newsweek [Last Updated On: December 12th, 2019] [Originally Added On: December 12th, 2019]
- Artificial Intelligence Job Demand Could Live Up to Hype - Dice Insights [Last Updated On: December 12th, 2019] [Originally Added On: December 12th, 2019]
- What Are Normalising Flows And Why Should We Care - Analytics India Magazine [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
- Terence Crawford has next foe in mind after impressive knockout win - New York Post [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
- DeepMind proposes novel way to train safe reinforcement learning AI - VentureBeat [Last Updated On: December 15th, 2019] [Originally Added On: December 15th, 2019]
- Winning the War Against Thinking - So you've emptied your brain. Now what? - Chabad.org [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- 'Echo Chamber' as Author of the 'Hive Mind' - Ricochet.com [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- Lindsey Graham: 'I Have Made Up My Mind' to Exonerate Trump and 'Don't Need Any Witnesses' WATCH - Towleroad [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- Blockchain in Healthcare Market to 2027 By Top Leading Players: iSolve LLC, Healthcoin, Deepmind Health, IBM Corporation, Microsoft Corporation,... [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- In sight but out of mind - The Hindu [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- The Case for Limitlessness Has Its Limits: Review of Limitless Mind by Joe Boaler - Education Next - EducationNext [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- The Top 10 Diners In Deep East Texas, According To Yelp - ksfa860.com [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- 3 breathing exercises to reduce stress, anxiety and a racing mind - Irish Examiner [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- DeepMind exec Andrew Eland leaves to launch startup - Sifted [Last Updated On: December 16th, 2019] [Originally Added On: December 16th, 2019]
- The Top 10 Diners In Deep East Texas, According To Yelp - kicks105.com [Last Updated On: December 17th, 2019] [Originally Added On: December 17th, 2019]
- Mind the Performance Gap New Future Purchasing Category Management Report Out Now - Spend Matters [Last Updated On: December 17th, 2019] [Originally Added On: December 17th, 2019]
- Madison singles and deep cuts that stood out in 2019 - tonemadison.com [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
- Hilde Lee: Latkes bring an ancient miracle to mind on first night of Hanukkah - The Daily Progress [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
- Political Cornflakes: Trump responds to impeachment with complaints about the 'deep state' and toilet flushing - Salt Lake Tribune [Last Updated On: December 19th, 2019] [Originally Added On: December 19th, 2019]
- Google CEO Sundar Pichai Is the Most Expensive Tech CEO to Keep Around - Observer [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Christmas Lectures presenter Dr Hannah Fry on pigeons, AI and the awesome power of maths - inews [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- The ultimate guitar tuning guide: expand your mind with these advanced tuning techniques - Guitar World [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Inside The Political Mind Of Jerry Brown - Radio Ink [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Elon Musk Fact-Checked His Own Wikipedia Page and Requested Edits Including the Fact He Does 'Zero Investing' - Entrepreneur [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- The 9 Best Blobs of 2019 - Livescience.com [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- AI from Google is helping identify animals deep in the rainforest - Euronews [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Want to dive into the lucrative world of deep learning? Take this $29 class. - Mashable [Last Updated On: December 24th, 2019] [Originally Added On: December 24th, 2019]
- Re: Your Account Is Overdrawn - Thrive Global [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- Review: In the Vale is full of characters who linger long in the mind - Nation.Cymru [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- 10 Gifts That Cater to Your Loved One's Basic Senses - Wide Open Country [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- The Most Mind-Boggling Scientific Discoveries Of 2019 Include The First Image Of A Black Hole, A Giant Squid Sighting, And An Exoplanet With Water... [Last Updated On: December 27th, 2019] [Originally Added On: December 27th, 2019]
- DeepMind's new AI can spot breast cancer just as well as your doctor - Wired.co.uk [Last Updated On: January 1st, 2020] [Originally Added On: January 1st, 2020]
- Why the algorithms assisting medics is good for health services (Includes interview) - Digital Journal [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- 2020: The Rise of AI in the Enterprise - IT World Canada [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- An instant 2nd opinion: Google's DeepMind AI bests doctors at breast cancer screening - FierceBiotech [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- Google's DeepMind AI outperforms doctors in identifying breast cancer from X-ray images - Business Insider UK [Last Updated On: January 4th, 2020] [Originally Added On: January 4th, 2020]
- New AI toolkit from the World Economic Forum is promising because it's free - The National [Last Updated On: January 20th, 2020] [Originally Added On: January 20th, 2020]
- AKA Wants to Help People Break Bad Habits and Create New Positive Ones - Hospitality Net [Last Updated On: January 20th, 2020] [Originally Added On: January 20th, 2020]