Go back to all categories

The Enlitia Ecosystem: A Comprehensive Glossary

What is an Artificial Neural Network?

An artificial neural network (ANN), often referred to simply as a neural network, is a computational model inspired by the structure and functioning of the human brain's neural networks. It is a powerful machine learning algorithm used to recognise complex patterns and relationships in data.

At its core, an artificial neural network consists of interconnected nodes, called artificial neurons or "neurons." These neurons are organised into layers, typically comprising an input layer, one or more hidden layers, and an output layer. Each neuron receives input data, performs a mathematical operation on that input, and produces an output. The output is then passed to the next layer, forming a sequential flow of information through the network.

The neurons in the network are connected by weighted connections, which determine the strength and significance of the information being passed between them. These weights are initially assigned randomly and are adjusted during the learning process to optimise the network's performance.

To make predictions or classify data, neural networks undergo a training phase. Training involves presenting labelled examples of input data to the network and adjusting the connection weights based on the discrepancy between the network's predicted output and the desired output. This process, known as backpropagation, iteratively refines the network's ability to make accurate predictions or classifications.

Artificial neural networks can learn from large amounts of data and generalize patterns, enabling them to make predictions or classify new, unseen data based on the knowledge gained during training. They are widely used in various applications, including image and speech recognition, natural language processing, recommendation systems, autonomous vehicles, and many other areas where pattern recognition and prediction are essential.

It's important to note that neural networks are just one type of machine learning algorithm within the broader field of artificial intelligence (AI). They have contributed significantly to the advancement of AI and continue to be an active area of research, with various network architectures and learning algorithms continually being developed and refined to tackle increasingly complex tasks.

What is the difference between an algorithm and artificial intelligence?

An algorithm is a step-by-step procedure or a set of rules designed to solve a specific problem or accomplish a specific task. It is a well-defined and finite set of instructions that takes input, processes it, and produces an output. Algorithms have been used for centuries in various fields, including mathematics, computer science, and everyday problem-solving. They are deterministic and rely on predefined rules and logic to achieve their objectives.

Artificial Intelligence, on the other hand, refers to the development of computer systems or machines that can perform tasks that would typically require human intelligence. AI aims to simulate human cognitive abilities, such as learning, reasoning, problem-solving, perception, and decision-making. It encompasses a broad range of techniques, methodologies, and algorithms to enable machines to exhibit intelligent behaviour.

While algorithms are part of the tools used in AI, they are not synonymous. AI involves more than just algorithms; it encompasses the entire field of creating intelligent systems. AI systems can utilise various algorithms, including machine learning algorithms, to process data, learn from it, and make predictions or decisions.

The key differences between algorithms and artificial intelligence are as follows:

  • Scope and Complexity: Algorithms are specific procedures or rules designed to solve well-defined problems, while AI deals with developing systems that exhibit intelligent behaviours across a range of tasks. AI involves more complex methodologies, including advanced algorithms, machine learning, natural language processing, computer vision, and more.

  • Adaptability and Learning: Algorithms are generally static and don't possess the ability to adapt or learn from experience. In contrast, AI systems, particularly those based on machine learning, can learn from data, identify patterns, and improve their performance over time. AI systems can adjust their behaviour based on new information or changes in their environment.
  • Decision-Making: Algorithms follow predefined rules and logic to make decisions, while AI systems can use a variety of techniques, including algorithms, to make decisions based on patterns, probabilistic models, or learned knowledge. AI systems can handle complex decision-making scenarios that may involve uncertainty or incomplete information.
  • Intelligence Simulation: Algorithms are not inherently designed to mimic human-like intelligence. AI, however, aims to replicate or simulate human intelligence by leveraging algorithms and other techniques to enable machines to perform tasks that typically require human intelligence.

In summary, algorithms are specific problem-solving procedures, while artificial intelligence encompasses a broader field focused on creating intelligent systems that can perform tasks requiring human-like intelligence. Algorithms are tools used within AI, and AI systems utilise algorithms, among other techniques, to exhibit intelligent behaviour.

What is Data Profiling?

Data profiling is a comprehensive process that involves examining and assessing the content, structure, and quality of a dataset. It aims to gain a deeper understanding of the data to ensure its reliability, accuracy, and usability. Data profiling encompasses a range of activities, including data discovery, statistical analysis, and data quality assessment.

During data profiling, various aspects of the dataset are examined in detail. This includes analysing the data types of present, such as text, numbers, or dates, and understanding the range and distribution of values within each data field. It also involves identifying missing or incomplete data, duplicated records, and identifying any inconsistencies or outliers that might impact data quality.

For example, let's consider a dataset related to solar energy production. The dataset contains information about solar panels installed across different locations, their energy generation, and environmental factors. Data profiling of this dataset would involve examining each attribute, such as panel efficiency, location coordinates, solar irradiance levels, and timestamp of energy production.

There are various types of data profiling techniques used to gain insights into different aspects of a dataset. Here are some common types of data profiling:

  • Structure Profiling: This type of profiling focuses on understanding the structure of the dataset, such as the number of columns, their names, data types, and relationships between tables. It helps in gaining an overview of the dataset's schema and its organisation.
  • Content Profiling: Content profiling involves analysing the actual values present in the dataset. It includes examining the distribution of values within each attribute, identifying data ranges and detecting outliers or anomalies. Content profiling provides insights into the characteristics and quality of the data.
  • Completeness Profiling: Completeness profiling assesses the level of missing or incomplete data within the dataset. It involves identifying attributes or records with missing values and calculating the percentage of missing data. This type of profiling helps in understanding the data's completeness and identifying potential data gaps.
  • Uniqueness Profiling: Uniqueness profiling focuses on determining the uniqueness of values within specific attributes. It helps in identifying duplicate or redundant records and assessing the level of data duplication or inconsistency. Uniqueness profiling ensures data integrity and helps in eliminating redundant data.
  • Statistical Profiling: Statistical profiling involves applying statistical techniques to the dataset to derive insights. It includes calculating summary statistics (mean, median, standard deviation, etc.), assessing data distributions, and identifying outliers. Statistical profiling helps in understanding the statistical properties of the data and detecting any abnormal patterns.

By employing these different types of data profiling, organisations can gain a comprehensive understanding of their data, assess its quality and integrity, and make informed decisions regarding data management, data cleansing, and data-driven initiatives.

What is Data Mining?

Data mining is a process of discovering patterns, relationships, and insights from large volumes of data. It involves using various techniques, algorithms, and statistical models to extract valuable information and knowledge from complex datasets. The goal of data mining is to uncover hidden patterns, trends, and correlations that can be used for making informed business decisions, predicting future outcomes, and gaining a deeper understanding of the data.

Data mining encompasses a range of methods, including statistical analysis, machine learning, pattern recognition, and visualisation techniques. These methods are applied to structured, semi-structured, and unstructured data from different sources such as databases, data warehouses, websites, social media, and more.

The data mining process typically involves different stages:

  • Data Collection: Gathering relevant data from various sources and consolidating it into a suitable format for analysis.
  • Data Preprocessing: Cleaning and transforming the data to remove noise, handle missing values, resolve inconsistencies, and prepare it for further analysis.
  • Exploratory Data Analysis: Conducting initial data exploration to understand the characteristics, patterns, and distributions of the data.
  • Model Building: Applying data mining algorithms and techniques to create models that can uncover patterns and relationships within the data. This may involve techniques such as clustering, classification, regression, association rule mining, and more.
  • Evaluation and Validation: Assessing the performance and validity of the data mining models using appropriate metrics and validation techniques. This step helps ensure that the models are accurate and reliable.
  • Interpretation and Deployment: Interpreting the results obtained from the data mining process, deriving actionable insights, and applying them to real-world scenarios. This could involve making predictions, identifying trends, optimising business processes, or making data-driven decisions.

Both the “Data Processing” and “Exploratory Data Analysis” stages greatly benefit from Data Profiling techniques, like content analysis (content profiling), identifying and removing outliers (statistical profiling) and identifying and removing duplicate data (uniqueness profiling). So, we can understand data profiling as the first stage where the general structure of the data set is defined, and possible data issues are identified. After this stage is done, the data set is already “clean” and ready to be processed – data mining stage.

Data mining has applications in various fields, including marketing, finance, healthcare, renewable energy and more. It enables organisations to extract valuable knowledge from their data, uncover hidden patterns, and gain a competitive edge by making data-driven decisions and predictions.

Difference between Machine Learning and Deep Learning

the relations between artificial intelligence, machine learning, deep learning and data science.

What is Machine Learning?

Machine Learning is a field of study within artificial intelligence that focuses on developing algorithms and models capable of automatically learning from data and improving their performance over time. It is inspired by the idea that computers can learn from experience and make intelligent decisions without explicit programming.

At the core of machine learning is training algorithms on a dataset to learn patterns and relationships. This training process involves presenting the algorithm along with a set of input data that can be accompanied or not by the corresponding expected output/labels. Through iterative optimisation, the algorithm adjusts its internal parameters to minimize errors and improve its ability to generalize and make accurate predictions or classifications on new, unseen data.  

Machine Learning encompasses various types of algorithms, each with its own characteristics and applications:

  • Supervised learning algorithms learn from labelled data, where the correct output is provided, enabling them to make predictions or classifications on new instances.
  • Unsupervised learning algorithms, on the other hand, deal with unlabelled data and aim to discover underlying patterns, clusters, or structures within the data.
  • Reinforcement learning algorithms learn through interactions with an environment, receiving feedback in the form of rewards or penalties to optimise their actions and make decisions.

One of the key advantages of machine learning is its ability to handle complex and large-scale datasets with numerous dimensions. While humans can’t think through various dimensions, ML models do it quite easily. By automatically extracting meaningful features and identifying patterns, machine learning algorithms can uncover insights that may not be apparent to human analysts. This can lead to improved decision-making, predictive capabilities, and the ability to automate tasks that were previously time-consuming or labour-intensive.

Machine learning finds applications in numerous domains, including image and speech recognition, natural language processing, recommendation systems, fraud detection, financial modelling, and healthcare diagnostics. It enables the development of intelligent systems that can analyse vast amounts of data, adapt to changing circumstances, and make informed decisions based on patterns and trends.

As the availability of data continues to grow and computational power advances, machine learning is expected to play an increasingly vital role in solving complex problems, driving innovation, and transforming industries across the globe.

What is Deep Learning?

Deep Learning is a subfield of machine learning that focuses on training artificial neural networks to learn and make intelligent decisions by mimicking the structure and function of the human brain. It involves the development of deep neural networks with multiple layers of interconnected nodes, just like the billions of neurons in our brain are connected and interact through synapsis, allowing for the extraction of hierarchical representations from complex data.

At its core, Deep Learning leverages the power of neural networks to automatically learn and discover intricate patterns and relationships in large datasets. These deep neural networks are designed to process data through multiple layers, where each layer extracts and transforms features at increasing levels of abstraction. This hierarchical representation enables the network to understand complex patterns and make accurate predictions or classifications.

One of the defining features of Deep Learning is its ability to learn directly from raw data, removing the need for handcrafted features or extensive domain knowledge. In the case of supervised learning, a deep neural network analyses vast amounts of data during the training process and adjusts its internal parameters or weights to minimise the difference between the predicted outputs and the ground truth. This optimisation process, often performed using techniques like back-propagation, allows the network to continuously improve its performance and generalize to unseen data.

Deep Learning has demonstrated remarkable success in various domains, including computer vision, natural language processing, speech recognition, and recommendation systems. It has enabled breakthroughs in image and object recognition, allowing systems to accurately identify and classify objects in images and videos. In natural language processing, Deep Learning has led to advancements in machine translation, sentiment analysis, and text generation.

The depth and complexity of Deep Learning networks enable them to capture intricate nuances and subtle dependencies in data, leading to state-of-the-art performance in many tasks. However, this complexity also poses challenges, such as the need for significant computational resources and large labelled datasets for training.

As research in Deep Learning continues to advance, there is an ongoing exploration of novel network architectures, optimisation algorithms, and regularisation techniques. This field holds immense potential for further advancements in artificial intelligence, empowering systems to understand, interpret, and extract valuable insights from complex data in ways that were previously unattainable.

What are classification algorithms?

Classification algorithms are a category of machine learning techniques used to assign data points into predefined classes or categories based on their features. These algorithms are part of supervised learning, where the model is trained on a labelled dataset with known class labels. The primary objective of classification algorithms is to learn a mapping between input features and corresponding class labels so that it can accurately predict the class of unseen data.

In classification tasks, the output is discrete and falls into specific categories. The algorithm analyses the patterns and relationships in the training data to create a decision boundary that separates different classes. Once trained, the model can classify new instances into one of the predefined classes with a certain level of confidence. Classification models can be set up to produce a probabilistic outcome rather than a binary one. Instead of providing a direct prediction of class 1 or 0, these models can offer predictions of the probability associated with each class (by selecting the class with the highest probability, you obtain the discrete prediction). This approach allows us to gauge the level of confidence in the classification, providing insights into the degree of certainty for each class.

Classification algorithms are widely used in various real-world applications, like in equipment/sensor fault detection, spam email detection, sentiment analysis, disease diagnosis, image recognition, fraud detection, and customer segmentation. They are fundamental in building intelligent systems that can make informed decisions and automate processes based on data-driven insights.

What are the main components of time-series analysis?

Time series analysis is a specialised field of data science and statistics that focuses on studying and interpreting data collected over successive time intervals. In this analytical approach, data points are recorded at regular time intervals, such as hourly, daily, monthly, or yearly, creating a chronological sequence of observations. The primary goal of time series analysis is to extract meaningful patterns, trends, and dependencies from the temporal data to gain insights, make predictions, and inform decision-making.

The main components of time series analysis include:

  • Stationarity: A time series is said to be stationary when its statistical characteristics, such as mean, variance, and autocorrelation, exhibit no significant changes over time. Stationarity is essential for applying many time series analysis techniques, as it ensures that relationships between data points are consistent throughout the time series.
  • Seasonality: Seasonality refers to regular and predictable fluctuations in the time series data that occur at specific time intervals, typically within a year. Detecting seasonality is essential for understanding repeating patterns in the data.
  • Noise: Noise, also known as randomness or error, represents the irregular and unpredictable fluctuations in the time series data that do not follow any discernible pattern. It is caused by various factors, such as measurement errors or external influences, and can obscure the underlying patterns in the data.
  • Autocorrelation: Autocorrelation measures the correlation between a time series data point and its lagged values. Positive autocorrelation indicates that past data points influence future values, while negative autocorrelation suggests an inverse relationship. Autocorrelation is crucial in identifying patterns and seasonality in time series data.
  • Decomposition: Decomposition involves breaking down a time series into its individual components, such as trend, seasonality, and noise. This process helps in isolating and understanding the underlying patterns and variations present in the data.
  • Forecasting: Forecasting is the process of predicting future values of a time series based on historical data and identified patterns. It helps in anticipating future trends, making informed decisions, and planning.

These components collectively form the foundation of time series analysis, enabling analysts and data scientists to gain valuable insights, make accurate predictions, and optimise decision-making in various domains, including renewable energy, finance, economics, and more.

What is dimensionality reduction in machine learning?

Dimensionality reduction is a technique in machine learning and data analysis that aims to reduce the number of input variables or features in a dataset while preserving the most important information. The goal is to simplify the dataset, making it more manageable and efficient for analysis and modelling, while minimising the loss of relevant information. Dimensionality reduction can be particularly useful when dealing with high-dimensional data, where the number of features is large, as it can help improve model performance, reduce computational complexity, and prevent overfitting.

Dimensionality reduction is applied in various machine learning tasks, including classification, regression, clustering, and visualisation. It offers several advantages:

  • Improved Model Performance: By reducing dimensionality, models may become less prone to overfitting and can perform better on the test data.
  • Faster Training: With fewer features, models can be trained more quickly, saving computational resources.
  • Easier Visualisation: Lower-dimensional data is easier to visualize, making it simpler to understand and interpret the data distribution and relationships.

However, dimensionality reduction can also have some downsides:

  • Information Loss: Removing features or combining them can result in some loss of information, which can affect the model's predictive power.
  • Increased Complexity: Some dimensionality reduction techniques, especially feature extraction methods, can introduce complexity in the interpretation of the transformed features.

The choice of whether to use dimensionality reduction and which technique to employ depends on the specific problem, the nature of the data, and the trade-off between simplification and preserving important information. It's essential to carefully evaluate the impact of dimensionality reduction on the performance of your machine learning models and choose the approach that best suits your objectives.

What is Feature Selection in Machine Learning?

Feature selection in machine learning is a process of choosing a subset of relevant features or input variables from the original set of features in a dataset. The objective of feature selection is to identify and retain the most informative and discriminative features while discarding those that may be redundant, irrelevant, or noisy.  

This subset of selected features is then used for model training and analysis. Feature selection is closely related to dimensionality reduction, as both aim to reduce the number of features in a dataset, but they differ in their goals and methods:

  1. Goal of Feature Selection: The primary goal of feature selection is to improve the performance of a machine learning model by selecting a subset of features that contribute the most to the model's predictive power. By retaining only, the most relevant features, feature selection simplifies the model, reduces overfitting, and often leads to faster training and more interpretable models.
  2. Methods for Feature Selection: Feature selection methods evaluate the importance of each feature based on various criteria and then decide whether to include or exclude each feature. Common techniques for feature selection encompass filter methods, wrapper methods, and embedded methods. Wrapper methods use a specific model's performance as a criterion for feature selection. Embedded methods are characterised by the integration of feature selection into the model training process as an integral component.
  3. Preservation of Features: In feature selection, the goal is to retain a subset of the original features, keeping their original meanings and interpretations. This means that the selected features are still part of the dataset and maintain their original values.
  4. Information Loss: While feature selection reduces the dimensionality of the dataset, it aims to minimize information loss. The selected features are expected to contain the most relevant information for the task, ensuring that the model can make accurate predictions or classifications.
  5. Use Cases: Feature selection is commonly used when there is a belief that not all features in the dataset are equally important for the task at hand. It is particularly valuable when working with high-dimensional data, where feature reduction can enhance model performance and reduce computational complexity.

Feature selection is a crucial step in the machine learning pipeline, as it can significantly impact the effectiveness and efficiency of models. The choice of feature selection method depends on the specific problem, the nature of the data, and the underlying assumptions. It is essential to carefully evaluate the impact of feature selection on model performance and select the most appropriate method based on the problem's objectives.

What is Transfer Learning in machine learning?

Transfer learning is the practice of training a model on one task or dataset and then applying the knowledge gained to another, often related, task. It's a paradigm shift in AI, acknowledging that models can build on their existing knowledge to learn more efficiently in new environments. This knowledge transfer can occur across a variety of domains, from computer vision to natural language processing.

In the realm of machine learning, transfer learning takes on a particular significance. It involves the use of pre-trained models as the foundation for training new models, saving both time and computational resources. Rather than starting from scratch, a pre-trained model, often trained on a massive dataset, forms the basis. By fine-tuning this model using data specific to the task at hand, it can be adapted to perform new, domain-specific tasks with remarkable accuracy.

Consider a scenario in the world of renewable energy, where a wind turbine park is diligently generating data from most of its turbines, but one remains offline. This particular turbine poses a challenge – how can we predict its performance or detect potential failures without the wealth of data that its operational counterparts produce?

This is where transfer learning in machine learning shines. By utilising a pre-trained model that has already learned the intricacies of wind turbine data and patterns from operational turbines within the same park, we can transfer this knowledge to the offline turbine. The model, already well-versed in recognising critical features and performance indicators, can be fine-tuned using whatever data is available from the offline turbine. The result? A highly accurate performance forecast and failure detection mechanism that bridges the data gap, ensuring that every turbine operates optimally, even when there is not a lot of available data.

In essence, transfer learning in machine learning is the bridge that spans the data divide, enabling us to extract valuable insights even in scenarios where data may be sparse or lacking. By transferring the collective knowledge of AI models, we can enhance predictive capabilities, reduce training overhead, and harness the full potential of machine learning in diverse real-world applications.

So, when it comes to adapting AI and machine learning to new tasks or datasets, remember that transfer learning can be the key to unlocking the full potential of these technologies. From renewable energy to healthcare, and everything in between, it's a transformative tool that empowers us to do more with less and usher in a future where AI continually learns, evolves, and innovates.

What is explainability in AI (XAI)?

In the intricate landscape of Artificial Intelligence (AI), the term "explainability" has emerged as a guiding beacon, offering a clearer view into the inner workings of AI systems. Explainability, often referred to as Explainable AI (XAI), is a pivotal concept that sheds light on the decisions and predictions made by AI models, enhancing their transparency and trustworthiness.

Explainability in AI, or XAI, is the capacity of an AI system to provide human-understandable explanations for its outputs, decisions, and actions. It's the quest to make AI systems more interpretable, allowing us to understand why a specific prediction was made, what features influenced the decision, and whether the AI model can be trusted.

Explainable AI operates on several principles, each contributing to the overarching goal of transparency. These principles include transparency, interpretability, accountability, and fairness.

  1. Transparency: XAI aims to make the decision-making process of AI models as transparent as possible, providing insight into how and why specific outcomes were reached.
  2. Interpretability: It's not just about revealing the black box; XAI ensures that the explanations are interpretable by humans, enabling users to make sense of AI-driven decisions.
  3. Accountability: XAI promotes accountability by allowing users to trace and audit the decision process, holding AI systems responsible for their actions.
  4. Fairness: Ensuring that AI models are fair and unbiased is a cornerstone of XAI. It helps identify and rectify biases in the data and model to deliver more equitable outcomes.

The advantages of XAI are profound, making it an essential tool in the AI toolkit. Here are some key benefits:

  • Increased Trust: XAI fosters trust in AI systems by demystifying their decisions. This is crucial in domains where reliability and accountability are paramount, such as renewable energy asset management.
  • Improved Decision-Making: XAI provides valuable insights into AI model predictions, empowering users to make informed decisions based on the explanations offered.
  • Diagnosing Errors: In cases of mispredictions or anomalies, XAI allows for error diagnosis, making it easier to identify and rectify issues in AI models.

Imagine a scenario in the renewable energy industry, where wind turbines and solar panels are part of a vast renewable energy hybrid farm (also known as hybridisation). The performance and maintenance of these assets are critical to ensure optimal energy production. AI models play a central role in predicting maintenance needs and optimising performance, but the outcomes are often influenced by a multitude of factors.

Explainable AI can step in to provide clarity. When an AI model predicts that a specific wind turbine requires maintenance, it can also offer a transparent explanation. For example, it might reveal that the prediction is based on a decrease in wind speed, vibrations in the turbine, and irregular energy production patterns. This explanation enables the asset manager to not only trust the AI-driven maintenance recommendation but also understand why that recommendation was made.

In asset management, where the consequences of errors or misjudgements can be substantial, XAI can serve as a guiding hand. By offering interpretable insights into AI model decisions, it enhances the efficiency and accuracy of maintenance operations, ultimately contributing to higher energy output and lower operational costs.

In conclusion, explainability in AI, or XAI, is a transformative force that enables us to demystify AI systems and trust the decisions they make. With principles rooted in transparency, interpretability, accountability, and fairness, XAI enhances trust, improves decision-making, and empowers users to make data-driven choices. As it finds applications in critical sectors like renewable energy asset management, it paves the way for a future where AI operates transparently, reliably, and ethically.

What is data normalisation in machine learning?

At its core, data normalisation is a preprocessing technique that transforms the features of a dataset to a standardized range. In the realm of machine learning, where diverse datasets with varying scales and units are the norm, normalisation becomes a crucial step. This process ensures that no single feature dominates the learning algorithm due to its scale, ultimately leading to a more balanced and effective model.

Consider a dataset encompassing various features with distinct measurement units and scales. Each feature might carry a unique range of values, making direct comparisons challenging. This heterogeneity can potentially skew the learning process, as machine learning algorithms tend to assign more significant importance to features with larger scales.

Data normalisation steps in to bridge this gap. By employing statistical methods such as Min-Max scaling or Z-score normalisation, data points are transformed into a unified scale, typically between 0 and 1 or centred around a mean of 0. This transformation ensures that no single feature dominates the learning process due to its inherent scale.

Why do we bother with data normalisation in the first place? The benefits extend beyond mere equivalency. By bringing all features to a common scale, normalisation enhances the convergence speed of machine learning algorithms. It also prevents certain features from overshadowing others, fostering a more equitable learning process. Additionally, normalisation aids algorithms that rely on distance-based metrics, ensuring that each feature contributes proportionately to the model's understanding.

To demystify the role of data normalisation, let's delve into the realm of a weather forecast model. Imagine a weather prediction model analysing two crucial features: “Cloud Coverage” measured on a scale from 0% to 100% and ‘Wind Speed’ on a scale from 0 to 30 m/s.

Without normalisation, the algorithm might struggle to discern patterns effectively. "Cloud Coverage” values could range widely, from clear skies to completely overcast while ‘Wind Speed’ remains constrained between 0 and 30. If the model doesn't account for these disparate scales, it might disproportionately prioritize “Cloud Coverage” in its predictions, overlooking nuanced variations in 'Wind Speed’.

Enter data normalisation: by standardizing both "Cloud Coverage” and ‘Wind Speed’ to a common scale, say between 0 and 1, we empower the algorithm to weigh each feature appropriately. Now, it can discern patterns influenced by both cloud coverage fluctuations and wind speed levels without being swayed by their disparate scales. This real-world application showcases how data normalisation plays a pivotal role in fine-tuning models for accurate weather forecasts or, for example, predicting the health of critical assets in, for instance, a wind farm.

In conclusion, data normalisation transforms complex, disparate datasets into a harmonised form, empowering algorithms to glean enlightening patterns and insights. As we navigate the intricate landscape of machine learning, understanding and implementing data normalisation emerge as a key to unlocking the true potential of our models.

What is an outlier in machine learning?

An outlier is essentially a data point that deviates significantly from the majority of the dataset. Picture a sea of consistent values, and an outlier is that lone surfer riding a wave of distinctiveness. This deviation can manifest in various ways, be it an unusually high or low value, or even a pattern misfitting the general trend.

In other words, think of outliers as those weird data points that stand out like a sore thumb. Imagine you're looking at a list of numbers, and suddenly, there's that one number that's either way too high or way too low compared to the rest. That's an outlier! It's like having a friend who doesn't quite fit in with the group—interesting, but it can mess up group dynamics.

Now, why care about these oddballs? Well, in the world of machine learning, we're all about making predictions based on patterns. Outliers mess with these patterns. They could be mistakes in the data or, sometimes, super important events that need special attention.

Imagine we have a bunch of data about weather forecasts and how much power we expect from our wind turbines. While checking this data, we might notice something odd—a weather prediction that doesn't match the usual trends. Maybe the forecast says there's going to be a hurricane in a desert area. That's our outlier! By catching these oddities, we keep our models in check and ensure they make sensible predictions.

In a nutshell, outliers are the mavericks in our data, and understanding them is like learning the secret code to make our machine learning models much smarter.

(this glossary is being continuously updated with new concepts and information - last update: November 30, 2023)