The book on Intelligent Technologies for Research and Engineering contains new research findings from academics. This edited anthology covers a range of research topics on science, engineering, and technology over forty six chapters. Discussion topics include artificial intelligence learning techniques, computerised medical image processing, human-computer interface for hand gesture recognition, community energy storage, e-learning, diabetes risk prediction, solar cells, hydrogen fuel cells for cars, and more. Many great engineering achievements have been made in the past century alone that we now generally take them for granted. For most of the globe, technology enables access to a plentiful supply of food and clean drinking water. For many of our everyday tasks, we depend on electricity. We can deliver products and services to any location in the world with great simplicity. Growing advances in communications and computer technology are creating new avenues for entertainment and information discovery. Even with these incredible technological accomplishments, there are undoubtedly still a ton of outstanding opportunities and challenges to be addressed. A lot of them are blurry, and many more are undoubtedly beyond most people's comprehension, even if some seem clear.
The latest advances in solidification research and the problems the community faces in the 21st century in processing and analysis are presented. On behalf of the editors, we would like to offer our appreciation to everyone who took part in this project. First and foremost, we offer all credit and respect to our almighty Lord for his bountiful grace, which enabled me to finish this book successfully. Moving forward, the authors, whose excellent work is at the core of the book, are acknowledged, and we gratefully congratulate all those involved and wish them great success. We would like to take this time to thank our family and friends for their support and encouragement while we worked on this book. We would also like to express our gratitude to the writers for their contributions to this edited book. In addition, our special thanks is extended to Bentham Science Publishers and its whole team for facilitating us in publishing this work and us providing the opportunity to present our work to the audience.
The content of this book is summarized as follows:
1. In Chapter 1, The practise of "lifelogging" involves documenting an increasing amount of one's everyday experience with the intention of using the recordings in the future as a memory aid or the foundation for data-driven self-development. Therefore, the usefulness of the generated lifelogs depends on the lifeloggers' ability to efficiently sift through them. The logs' intrinsic multi-modality and semi-structure allow them to combine data from a variety of sources, including cameras and other wearable physical and virtual sensors. As a result, expressing the data in a graph structure allows for the effective capturing of all created interrelations. Alternative methods must be developed to capture the higher-level semantics because it is impossible to manually or mechanically annotate each entry with a significant amount of semantic context. We describe Improved Life Graph (ILG), the first method for building a Knowledge Graph-based lifelog representation and retrieval solution, which can capture a lifelog in a graph structure and enhance it with external data to help with the connection of higher-level semantic information.
2. Chapter 2 presents that as a result of increased competition and a decline in new clients, the global telecommunications sector is suffering from a dramatic decline in revenue. Most operators first spend a significant portion of their income on expansion in order to maintain competitive advantages and attract a large user base. A company's ability to boost its selling, marketing, and servicing operations across all client touchpoints is greatly aided by a well-developed client Relationship Management (CRM) strategy. Predicting customers' propensity to leave is a major challenge in CRM. The purpose is to identify consumers who could be at risk of leaving based on their historical data and actions. In the study under consideration, data mining methods were used for accurate churn forecasting. In this case, we preprocess the dataset using the normalised k-means approach. After the picture has been preprocessed, attributes are chosen using the minimal Redundancy & Maximum Relevance (mRMR) method. When making a decision, it favours qualities that have low connections among themselves and a strong relationship with the class (output). The ability to classify or forecast client turnover based on the provided attributes is explored using a Support Vector Machine with Photon Swarm Optimisation (SVM with PSO). To optimise the SVM's hyperparameters, PSO is used. Additionally, the problem of discovering a local optimal solution is avoided, and the accuracy of the classification is enhanced. The experimental results show that the proposed system is superior to the current one in terms of processing time.
3. Chapter 3 presents Multilayer perceptrons (MLPs) along with support vector machines (SVMs), which are examples of TMLTs that have been utilized effectively for churn prediction in the past, but only after considerable time and energy were spent configuring the training parameters. Choosing appropriate training settings for unsupervised learning is usually an ad hoc process that relies on experimentation. When it comes to churn forecasts, deep neural networks (DNNs) have demonstrated to be much more accurate than TMLTs. To set the instruction hyperparameters for DNNs throughout churn modelling, however, requires more time and effort because of DNNs' more complicated design and their ability to analyze vast volumes of non-linear input data. This creates extra difficulty for novice machine learning professionals and researchers. Few studies have been conducted to date to determine how various hyperparameters affect DNN performance when used for churn prediction. When it comes to churn modelling, DNNs aren't backed by much in the way of experimentally developed heuristics to help with hyperparameter selection. To better predict customer attrition in the banking sector, this work conducts an experimental analysis of the effect of adjusting DNN hyperparameters. The deep neural network (DNN) simulations beat the MLP across three separate trials, with the DNN models using a rectifier activation function in the hidden layers and the MLP using a sigmoid activation function in the output layer. Rems Prop training was more accurate than Adam, AdaGrad, Ad delta, and Adam ax, and it was also more effective than stochastic gradient descent (SGD). The DNN did best when the number of batches was smaller than the total number of data points in the test set. This study provides heuristic insights that may be useful to academics and practitioners when DNNs are used to predict churn from CRM table data in the financial services industry.
4. Chapter 4 suggests that event logs may be used for process mining. Event logs may include confidential data that cannot be analysed without agreement, preventing process mining. Anonymizing the event log prevents anybody from being identified by it. Differential privacy guarantees anonymity. Heterogeneous private event log anonymity aims to provide a log with high usefulness and a privacy guarantee. Event log anonymization methods introduce noise into traces by replicating, perturbing, or filtering them. Subsampling before noise injection improves the privacy-utility trade-off, according to research on differential privacy. Subsampling enhances privacy. Libra uses this observation to anonymize event logs. Libra takes numerous samples of trace from a log, separately enters noise, maintains statistically meaningful traces from every sample, then makes up the samples to create a uniquely private log. The suggested method yields far better utility for identical privacy assurances than baselines.
5. Chapter 5 tells us that banks and other financial organisations rely heavily on credit scoring (CS) as a method of risk management since it is both effective and necessary. It reduces financial risks and gives sound advice on loan disbursement. As a result, businesses and financial institutions are exploring innovative automated solutions to the CS dilemma in an effort to safeguard their own resources and those of their clients. The use of various machine learning (ML) as well as data mining (DM) approaches has led to significant progress in CS prediction in recent years. The Deep Genetic Hierarchical Network of Learner (DGHNL) is a novel approach developed for this study. Support Vector Machines (SVMs), k-Nearest Neighbours (kNNs), Probabilistic Neural Networks (PNNs), and fuzzy structures are just some of the many types of learners that may be used in the suggested method. The Statlog German (1000 occurrences) approval of credit dataset from the UCI machine learning library was used to evaluate our model. We use a DGHNL model with five unique learner types, two feature extraction methods, three kernel functions, and three methods for optimising model parameters. In addition to conventional cross-validation (CV) and train-testing (stratified 10-fold) methods, this model employs a cutting-edge biological layered training (participant selection) approach. Because it makes use of coordinated and shared information (the DGHNL architecture and the optimisation of it), our approach is innovative. Using data on German credit approvals from Statlog, we show that the suggested DGHNL model can obtain a prediction accuracy of 94.60% (54 errors per 1000 classifications) with its 29-layer architecture.
6. Chapter 6 suggests that the primary use of forecasting short-term loads is in control centres, where it is used to investigate shifting consumer load patterns and estimate the value of the load at a future time. It is a crucial piece of equipment for building a smart grid. Several dimensions of influence impact the load parameters. In this research, we present a Residual Neural Network (ResNet)/Long Short-Term Memory (LSTM) hybrid model for load forecasting, which can better take advantage of the time series properties of load data and lead to more reliable predictions. Before feeding the data into the ResNeT network for feature extraction, it is first rebuilt using numerous feature parameters. The second step in short-term load forecasting using LSTM is to feed the feature that was extracted vector into the network. Finally, the technique is compared with other models using a real example, demonstrating that the proposed combination method has better prediction accuracy and confirming the practicality and superiority of input-parameter feature extraction. Additionally, this study examines weather prediction based on a variety of elements and characteristics.
7. Chapter 7 presents a deep learning method, named Convolutional Neural Network (CNN). Natural language processing problems, such as text classification, are simplified using this approach. In this study, we use a deep learning strategy, namely the CNN method, to address the issue of text classification. CNNs, which require a large amount of time as well as finances to train and use, have been greatly impeded by the rise of Big Data and the increased complexity of tasks. To get around these problems, we introduce a MapReduce-based CNN that rethinks what a CNN has learnt by breaking it down into a series of smaller networks and training them in parallel. Subsets of incoming text are analysed by many autonomous networks.
8. Chapter 8, it is well recognised that data preparation is necessary to provide reliable data with which mining tools may derive useful insights. Preprocessing methods need to be modified when dealing with multiple sources of continuous data. This study suggests using active rule-based systems, and more particularly, complex event processing (CEP) systems and engines, to aid domain experts in the definition and execution of pretreatment tasks for data streams. Our method's key innovation is the way it allows domain experts to easily manage temporal data by formulating preprocessing methods as identifying events rules stated in a SQL-like language. This concept is implemented in a freely accessible software package that combines a CEP processor with libraries for web-based data mining. To test the efficacy of our method, we provide three real-world examples of applications in which CEP rules preprocess data streams by incorporating new temporal information, modifying features, and handling missing values. Preprocessing activities may be expressed in a flexible and high-level fashion using CEP rules, as shown by the experiments, without incurring excessive memory and time overheads. The generated streams of data not only enhance classification algorithms' predictive accuracy but also simplify decision models and shorten the time required to learn.
9. Chapter 9 offers a fresh theoretical perspective on the issue of interpretive topic modelling. Instead of using words or n-grams as the fundamental units of analysis, as in more traditional methods, this method employs whole sentences. Specifics of the proposed method include clustering of phrase embeddings and probabilistic sentence assessments within the text corpus. Sentence frequency distributions within subjects and topic frequency distributions throughout the text are both estimated using the topic model. Since sentences, unlike words, are more meaningful and include entire grammatical and semantic structures, our method allows for explicit understanding of themes. The process for doing this automatically is also given. Sentence embeddings are obtained via the use of context embeddings built on the BERT paradigm. Our method also demonstrates the feasibility of integrating both internal and external information sources in the subject modelling process, allowing for big data processing. In conventional topic modelling methods, the text corpus itself stands in for the internal knowledge source, and this source is often a single one. BERT, a machine learning model first trained on a massive amount of textual data, serves as the external knowledge source and produces context-dependent sentence embeddings.
10. Chapter 10 shows that by analyzing the geographical, chronological, and semantic components of geographic data, it is possible to reconstruct users' real route itineraries and get insights into their preferences and behavior. In order to analyse tourist traffic patterns in A-level scenic spots in Jiangsu and Zhejiang across time and space, this research collects and preprocesses data from Weibo check-ins at these locations. The author used a temporal perspective, looking at how the check-in data fluctuated between 2016 and 2018, as well as how it differed on weekends, holidays, and weekdays. The acquired data were subjected to a spatial kernel density analysis, which revealed the most active regions. Lastly, the vacation travel mode and characteristics were identified through an examination of spatial and locational flows and their orientations. The results of this study provide the groundwork for the growth of wisdom tourism.
11. Chapter 11 presents the amount of data available online is growing at an exponential rate. The Internet has a wealth of information that can be mined for specific details on any niche subject, depending on the user's needs. When faced with a bewildering array of options, users often find it difficult to decide which product to purchase online. In this case, it is crucial to review the available data to advise consumers on products and learn from other customers. The proposed approach allows us to efficiently filter, prioritize, and convey very important information, therefore mitigating the problem of information overload. To narrow down a list of options based on your individual tastes is the job of a recommendation system. The approach relies heavily on several different types of similarity measurements. It is generally agreed that collaborative filtering is the most effective method for making specific recommendations to users or providers. Since there are limitations to the user-based collaborative filtering strategy, the item-based strategy is considered an alternative. To address this shortcoming, we analyzed the effectiveness of several similarity calculation methods by comparing correlation-based and distance-based similarity measures, aiming to improve recommendation performance. The study's findings were utilized to design a better technique, which uses statistical accuracy measures to provide the best informed suggestion possible. This study's findings provide a benchmark for evaluating and comparing similarity measurements. This work aims to help readers select suitable distance measures for datasets, and to make it easier to compare and evaluate new similarity measures against established ones.
12. Chapter 12 claims that recommender systems have quickly become an integral part of people's everyday digital lives as it is present on virtually every online service today. Modern deep learning-based models can only function at their peak when fed with a massive amount of data. Multiple domains, including Amazon, restaurants, and breweries, have offered datasets that meet this condition. The hotel industry has seen relatively few advancements and databases, with even the largest review dataset being in the hundreds of thousands rather than millions. Traditional collaborative-filtering methods are also inapplicable to the hotel domain due to its increased data sparsity compared to standard recommendation datasets. In this research, we present HotelRec, a TripAdvisor-derived, massively scaled hotel recommendation dataset comprising 50 million reviews. To the best of our knowledge, HotelRec is the largest recommendation dataset in a single domain, including textual reviews (50M vs 22M) in the hotel domain (50M vs 0.9M).
13. Chapter 13 suggests that, the field of Recommender Systems (RS) research has expanded to include a broad range of AI methods, from simpler ones like Matrix Factorization (MF) to more advanced ones like Deep Neural Networks (DNN) in recent years. Because they only consider a linear combination of user and item vectors, traditional Collaborative Filtering (CF) recommendation algorithms like MF have limited learning capacity. Neural collaborative filtering (NCF) uses deep neural networks (DNN) in combination with collaborative filtering (CF) to learn non-linear correlations. CF approaches still have issues with cold start and data sparsity, though. To improve recommendation accuracy, address cold starts, and address data sparsity, this research proposes a new hybrid RS, Neural Matrix Factorization++ (NeuMF++). We propose NeuMF++, which improves upon NeuMF by adding Stacked Denoising Autoencoders (SDAE) for a more accurate latent representation. GMF++ and MLP++ can be combined to form NeuMF++. By combining Generalized Matrix Factorization (GMF) with Multilayer Perceptrons (MLP), NeuMF provides a robust NCF architecture. By combining the linearity of GMF with the nonlinearity of MLP, NeuMF achieves state-of-the-art results. At the same time, GMF++ and MLP++ have been developed due to the successful integration of latent representations into the original GMF and MLP. NeuMF++'s learning capacity is greatly improved by the latent representation it obtains from the SDAEs' latent space, which enables it to learn user and item characteristics accurately. However, NeuMF++'s performance may suffer if GMF++ and MLP++ are forced to share feature extractions. Consequently, enabling GMF++ and MLP++ to learn features independently increases their adaptability and dramatically boosts their performance. The experimental root-mean-square error of 0.8681 obtained by NeuMF++ in a real-world dataset shows that it performs exceptionally well. Additional data, such as text or photos, can be added to NeuMF++ in later development. NeuMF++ allows for the incorporation of several neural network building elements to create a more robust recommendation model.
14. Chapter 14 shows that finding relevant information online has gotten increasingly difficult as the amount of data available on the internet has grown exponentially. In high-data-density, complex-domain settings, the recommendation system may be a big aid to users in making decisions. In the recommender system, several approaches have been presented. In the recommendation system, collaborative filtering is a common practice. The cold-start problem is one of the remaining issues with collaborative filtering approaches. To address this issue, we offer a movie recommendation system that uses social network analysis and collaborative filtering. We used user preferences like age, gender, and profession to generate a connection matrix, and then used that matrix to use community identification based on edge betweenness centrality to cluster people. The suggested system will then propose movies to new members based on the preferences of the existing users in the group. Utilizing MAE, we demonstrate the superiority of the suggested technique.
15. Chapter 15 shows that the financial and emotional toll of cardiovascular disease is growing. As a result, we created a model for predicting comprehensive healthcare resource use (Adherence Score for Healthcare Resource Outcome, ASHRO), which includes patient health behaviors, and investigated its relationship with clinical outcomes, with the goal of improving the economy as a whole and the quality of the healthcare system. Data from a massive database of health insurance claims, long-term care insurance, and health checkups were used in this investigation. Patients admitted to hospitals with cardiovascular conditions (ICD-10 I00-I99) constituted the study population. The objective variable was medical and long-term care expenses, while the explanatory variable was a broadly defined composite adherence measure. Multiple regression analysis and random forest learning (AI) were utilized to calibrate predictive models, which were then used to generate ASHRO ratings. The prediction model's discriminatory and evaluative abilities were measured using the area under the curve and the Hosmer-Lemeshow test, respectively. Over a 48-month follow-up, we used propensity score matching to examine the overall mortality of the two ASHRO 50% cut-off groups after adjusting for clinical risk variables. Out of a total sample size of 48,456 patients, 61.9% were men, and the mean age at hospital release was 68.3 9.9 years for those with cardiovascular disease. Machine learning was used to adjust eight factors (secondary mitigation, rehabilitation intensity, direction, proportion of days addressed, overlapping outpatient visits/clinical laboratory and physiological tests, medical attendance, and generic drug rate) into a single index that served as the adherence score classification. The total coefficient of determination from the multiple regression analysis was 0.313 (p 0.001). The total coefficient of determination in a logistic regression analysis with 50% and 25%/75% cut-off values for medical and long-term care expenses was statistically significant (p 0.001). There was a statistically significant correlation between ASHRO score and death rate at the 50% cutoff (2% vs. 7%; p 0.001).
16. Chapter 16 shows the goal of this research is to examine how e-commerce and web-based businesses might benefit from the use of Big Data Analytics in managerial decision making. The data used in this analysis comes from a single e-commerce website's database. User interactions with the website, such as page views, product additions, and online purchases, would be recorded. Association Rule Mining Algorithm (APRIORI), K-Means Clustering, and Pearson's Correlation Coefficient are only a few of the algorithms that will be utilized to evaluate the dataset. The information will be analyzed and used to provide insights into users' interactions with the website, enabling the identification of patterns that may inform future actions. For instance, which product receives the most attention and sales, how many pages are viewed before a purchase is made, what percentage of customers buy the product again, and so on. This study would also determine whether the company's current Big Data implementation can be enhanced and whether doing so would be a smart use of resources.
17. In Chapter 17, with the help of a deep artificial neural network (ANN; i.e. deep learning), a new method is provided to simulate and reconstruct yearly surface mass balance (SMB) data across glaciers. An open-source regional glacier evolution model now includes this technique as its SMB component. While conventional glacier models increasingly include physical processes, we instead use data science to create a parameterized model. Deep learning or Lasso (least absolute shrinkage and selection operator; regularized multilinear regression) can be used to model annual glacier-wide SMBs from topo-climatic variables, while a glacier-specific parameterization is used to update the glacier's shape. On a dataset of 32 French Alpine glaciers, we evaluate and cross-validate our nonlinear deep learning SMB model against other typical linear statistical techniques. Results show that deep learning is superior to linear approaches, with an estimated r 2 of 0.77 and a root-mean-square error (RMSE) of 0.51 m w.e., thanks to increased accuracy (up to +47% in space and +58% in time) and explained variance (up to +64% in space and +108% in time). The temporal dimension accounts for around 35% of the nonlinear behavior recorded by deep learning. The key unknowns in the evolution of glacier geometry stem from the initial ice thickness measurements. These findings support the application of deep learning in glacier modeling as a potent nonlinear tool for reconstructing or simulating SMB time series for individual glaciers across a region for past and future climates.
18. Chapter 18 shows the rate at which remote sensing technology is advancing, which means that our access to remote sensing data is better than before. The age of big data is here. Data collected using remote sensing exhibit hallmarks of Big Data, including hyperspectral features, high spatial resolution, and high temporal resolution. Using geographical feature and remote sensing data, this work provides a feature-supporting, marketable, and efficient data cube for time-series analysis and conducts a comparative assessment of water cover and vegetation change. This study defines remote sensing data cube (SRSDC) with a focus on spatial features. The purpose of this data cube is to offer a fast, flexible, and scalable way to analyze massive amounts of RS information using spatial features. It gives a general summary of the SRSDC's structure. The SRSDC provides feature translation to transform spatial feature information into query operations and spatial feature repositories to store and manage vector feature data. This article explains how a feature data cube and distributed execution engine were developed for use in the SRSDC. The evaluation of a feature data cube and a distributed execution engine is carried out using the production process and long-term remote sensing analysis as examples. As a new strategic resource for humanity, big data has risen to the top of the knowledge economy's mountain range. Data analysis methods, including supervised, unsupervised, and hybrid approaches, are the backbone of knowledge discovery techniques.
19. Chapter 19 provides suggestions for integrated, cutting-edge, and efficient tools, methodologies, and technologies for accessing and processing increasingly growing volumes of data in diverse fields. Personalizing a patient's care is a challenging task that requires the doctor to sift through and make sense of massive volumes of data. The scientific community behind precision medicine might benefit greatly from a unified system that facilitates data discovery, integration, preprocessing, model construction, storage, analysis, and visualization. The software package provides researchers with a simple, quick, and adaptable method for processing data, with the ultimate goal of enabling intelligent management, analysis, and visualization of massive genomic data. Services, data sets, and databases are at their disposal, or they can supply their own information for processing.
20. Chapter 20 shows that as a result of increased competition and a decline in new clients, the global telecommunications sector is suffering from a dramatic decline in revenue. Most operators first spend a significant portion of their income on expansion in order to maintain competitive advantages and attract a large user base. A company's ability to enhance its selling, marketing, and servicing operations across all client touchpoints is greatly aided by a well-developed client Relationship Management (CRM) strategy. Predicting customers' propensity to leave is a major challenge in CRM. The purpose is to identify consumers who could be at risk of leaving based on their historical data and actions. In the study under consideration, data mining methods were used for accurate churn forecasting. In this case, we preprocess the dataset using the normalized k-means approach. After the picture has been preprocessed, attributes are chosen using the minimal Redundancy and Maximum Relevance (mRMR) method. When making a decision, it favours qualities that have low connections among themselves and a strong relationship with the class (output). The ability to classify or forecast client turnover according to the provided attributes is explored by using a Support Vector Machine and Photon Swarm Optimization (SVM with PSO). To optimise the SVM's hyperparameters, PSO is used. Additionally, the problem of discovering a local optimal solution is avoided, and the accuracy of the classification is enhanced. The experimental results show that the proposed system is superior to the current one, and the processing time.
21. Chapter 21, Multilayer perceptrons (MLPs), along with support vector machines (SVMs), are examples of TMLTs that have been utilized effectively for churn prediction in the past, but only after considerable time and energy were spent configuring the training parameters. Choosing appropriate training settings for unsupervised learning is usually an ad hoc process that relies on experimentation. When it comes to churn forecasts, deep neural networks (DNNs) have demonstrated to be much more accurate than TMLTs. Setting the instruction hyperparameters for DNNs throughout churn modelling, however, requires more time and effort because of DNNs' more complicated design and their ability to analyse vast volumes of non-linear input data. This creates extra difficulty for novice machine learning professionals and researchers. Few studies have been conducted so far to determine how various hyperparameters affect the DNN performance when used for churn prediction. When it comes to churn modelling, DNNs are not backed by much in the way of experimentally developed heuristics to help with hyperparameter selection. To better predict customer attrition in the banking sector, this work conducts an experimental analysis of the effect of adjusting DNN hyperparameters. The deep neural network (DNN) simulations beat the MLP across three separate trials, with the DNN models utilizing a rectifier functional for activation in the hidden layers and the MLP using a sigmoid in the output layer. Rems Prop training was more accurate than Adam, AdaGrad, Ad delta, and Adam ax, and it was also more effective than stochastic gradient descent (SGD). The DNN did the best when the number of batches was smaller than the total number of data points in the test set. This study provides heuristic information that may be useful to academics and practitioners when DNNs are used for predicting churn from table data for CRM in the financial services industry.
22. In chapter 22, this paper offers a fresh theoretical perspective on the issue of interpretive topic modelling. Instead of using words or n-grams as the fundamental units of analysis, as in more traditional methods, this method employs whole sentences. Specifics of the proposed method include clustering of phrase embeddings and probabilistic sentence assessments within the text corpus. Sentence frequency distributions within subjects and topic frequency distributions throughout the text are both estimated using the topic model. Since sentences, unlike words, are more meaningful and include entire grammatical and semantic structures, our method allows for explicit understanding of themes. The process for doing this automatically is also given. Sentence embeddings are obtained via the use of context embeddings built on the BERT paradigm. Our method also demonstrates the feasibility of integrating both internal and external information sources into the subject modelling process, enabling big data processing. In conventional topic modelling methods, the text corpus itself stands in for the internal knowledge source, and this source is often a single one. The BERT, a machine learning model that was first trained on a massive quantity of textual data, stands in for the external knowledge source and is responsible for producing the context-dependent sentence embeddings.
23. Chapter 23 presents that the day-to-day activities of millions of people all over the globe have been significantly altered as a direct result of the meteoric rise in the popularity of social media and online social networks such as Facebook, Twitter, Instagram, and TikTok. The ease with which data may be gathered, collected, and analyzed, in addition to the high degree of social and financial interest in doing so, has sparked the interest of a broad range of research sectors. This has resulted in increased emphasis being paid to the research being conducted in these areas. Each agent is given a decentralized control mechanism that enables them to communicate, draw conclusions from their discussions, and learn from one another. Utilizing a network topology, this method makes it easier for dynamic agent organizations to alter the geometry of agent interactions in order to meet the particulars of the situation at hand.
24. Chapter 24 shows that the number of individuals using social networking and microblogging sites has increased dramatically in recent years, providing an interesting window into the perceptions of this age. People's opinions may be gauged in large part by looking at user reviews of various products, companies, brands, individuals, forums, films, etc. Analysts saw a need to automate the categorisation of evaluations into positive and negative categories; therefore, they developed algorithms to do so. Sentiment analysis is the name given to the automatic categorization process it enables. The fundamental objective of this study is to apply the Support Vector Machine (SVM) artificial intelligence algorithm to the task of classifying feelings and texts for product evaluations. This paper will do so by doing an in-depth analysis of several datasets utilised for this purpose. The Support Vector Machine (SM) learning technique has been trained, tested, and implemented across a variety of data sets to determine the polarity of ambiguous feelings. The major goal of this work is to apply the Support Vector Machine (SVM) artificial intelligence technique to analyse several datasets for sentiment and text classification, with the end result being improved categorization of product reviews. A support vector machine training system is trained, tested, and simulated on many datasets in this work to determine the polarity of ambiguous feelings or reviews. We found that among the available classification algorithms, Support Vector Machine (SVM) produces the highest accuracy (89.98%) right off the bat. Including additional sentence forms would further improve the achieved accuracy. As a final result, it establishes that the SVM method is reliable. Application/Improvements: Models generated by the use of the Support Vector Machine algorithm for learning are evaluated for their performance. Finally, a highly accurate and powerful classification method, the Support Vector Machine (SVM), has been developed.
25. In Chapter 25, it is well recognised that data preparation is necessary to provide reliable data with which mining tools may derive useful insights. Preprocessing methods need to be modified when dealing with multiple sources of continuous data. This study suggests using active rule-based systems, particularly complex event processing (CEP) systems and engines, to aid domain experts in defining and executing pretreatment tasks for data streams. Our method's key innovation is the way it allows domain experts to easily manage temporal data by formulating preprocessing methods as identifying events rules stated in a SQL-like language. This concept is implemented in a freely accessible software package that combines a CEP processor with libraries for web-based data mining. To test the efficacy of our method, we provide three real-world examples of applications in which CEP rules preprocess data streams by incorporating new temporal information, modifying features, and handling missing values. Preprocessing activities can be expressed in a flexible, high-level manner using CEP rules, as shown by the experiments, without incurring excessive memory or time overhead. The generated streams of data not only enhance classification algorithms' predictive accuracy but also simplify decision models and shorten the time required to learn.
26. In chapter 26, call records demonstrate client interest in various businesses. Multi-dimensional attribute dependence with communication day and time may help targeted advertising. Frequent and extensive inter-service links show that consumers of one service may have opportunities in the other. Multi-granulation rough sets address prospect discovery from call record interest characteristics. Conventional intra and inter-pattern mining methods have increased the amount of processing and the vast space of statistically irrelevant patterns. This solution fixes these difficulties. The method produces food and restaurant target audiences using one month of anonymised Thai telecom service supplier call data and confirms some fascinating mathematical properties of knowledge systems.
27. In chapter 27, the rate at which remote sensing technology is advancing means that our access to remote sensing data is better than before. The age of big data is here. Data collected using remote sensing exhibits hallmarks of Big Data, including hyper spectral features, a high spatial resolution, and a high time resolution. Using geographical feature data and remote sensing data, this work offers a feature-supporting, marketable, and efficient data cube for time-series analytic application, and conducts a comparative assessment of water cover and vegetation change.This study defines the remote sensing data cube (SRSDC) with a focus on spatial features. The purpose of this data cube is to offer a fast, flexible, and scalable way to analyze massive amounts of RS information using spatial features. It gives a general summary of the SRSDC's structure.The SRSDC provides feature translation to transform spatial feature information into query operations and spatial feature repositories to store and manage vector feature data.This article explains how a feature data cube and distributed execution engine were developed for use in the SRSDC. Evaluation of a feature data cube and distributed execution engine is carried out using the production process and analysis of long-term remote sensing as examples. As a new strategic resource for humanity, big data has risen to the top of the knowledge economy's mountain range. Data analysis supervised learning methods, unsupervised learning methods, and their mixtures and modifications are the backbone of knowledge discovery techniques.
28. In chapter 28, with the help of a deep artificial neural network (ANN; i.e. deep learning), we provide a new method to simulate and reconstruct yearly surface mass balance (SMB) data across glaciers. An open-source regional glacier evolution model now includes this technique as its SMB component. While conventional glacier models increasingly include physical processes, we instead use data science to create a parameterized model. Deep learning or Lasso (least absolute shrinkage and selection operator; regularized multilinear regression) can be used to model annual glacier-wide SMBs from topoclimatic variables, while a glacier-specific parameterization updates the glacier's shape. On a dataset of 32 French Alpine glaciers, we evaluate and cross-validate our nonlinear deep learning SMB model against other typical linear statistical techniques. Results show that deep learning is superior to linear approaches, with an estimated r 2 of 0.77 and a root-mean-square error (RMSE) of 0.51 m w.e., thanks to increased accuracy (up to +47% in space and +58% in time) and explained variance (up to +64% in space and +108% in time). The temporal dimension accounts for around 35% of the nonlinear behavior recorded by deep learning. The key unknowns in the evolution of glacier geometry stem from the initial ice thickness measurements. These findings support the application of deep learning in glacier modeling as a potent nonlinear tool for reconstructing or simulating SMB time series for individual glaciers across a region for past and future climates.
29. Chapter 29 discusses that providing suggestions for integrated, cutting-edge, and efficient tools, methodologies, and technologies for accessing and processing increasingly growing volumes of data in diverse forms is a major problem in clinical data analysis and knowledge discovery. Personalizing a patient's care is a challenging task that requires the doctor to sift through and make sense of massive volumes of data. The scientific community behind precision medicine might benefit greatly from a unified system that facilitates data discovery, integration, preprocessing, model construction, storage, analysis, and visualization. The software package provides researchers with a simple, quick, and adaptable method for processing data, with the ultimate goal of enabling intelligent management, analysis, and visualization of massive genomic data. Services, data sets, and databases are at their disposal, or they can supply their own information for processing.
30. Chapter 30 shows that tourism destinations and their online and social media information have made selecting and visiting them difficult. Tourists find tourism suggestion systems attractive, but designers must be able to deliver personalised services. This study proposes a personalised tourist system of recommendations that extracts user preferences. For this, tourist social network user reviews are a rich resource of preferences. To identify visitor preferences, remarks are preprocessed, semantically grouped, and sentimentally analysed. The characteristics of attractions are extracted from all user evaluations. Finally, the proposed suggestion system semantically matches user preferences with attraction attributes to suggest the most relevant attractions. The technology also filters undesirable goods and improves recommendations based on time, location, and weather. The Python-based recommendation algorithm is tested using TripAdvisor data. The suggested system improves the f-measure.
31. In chapter 31, The primary use of forecasting short-term loads is in control centres, where it is used to investigate shifting consumer load patterns and estimate the value of the load at a future time. It's a crucial piece of equipment for building a smart grid. Several dimensions of influence impact the load parameters. In this research, we present a Residual Neural Network (ResNet)/Long Short-Term Memory (LSTM) hybrid model for load forecasting, which can better take advantage of the time series properties of load data and lead to more reliable predictions. Before feeding the data into the ResNeT network for the extraction of features, it is first rebuilt using numerous feature parameters. The second step in short-term load forecasting using LSTM is to feed the feature that was extracted vector into the network. Finally, the technique is compared to other models using a real example, demonstrating that the proposed combination method has better prediction accuracy and confirming the practicality & superiority of input parameters feature extraction. Additionally, this study conducts studies in weather prediction based on a variety of elements and characteristics.
32. Chapter 32 suggest that data mining has become an increasingly significant method for conducting data analysis as a result of the rapid increase of databases used by a large number of contemporary businesses. The community of people who study operations research has made major contributions to this discipline, particularly by formulating and solving a large number of data mining problems as optimization problems. Additionally, data mining techniques may be used to solve a number of applications that study operations research. The purpose of this study is to offer an overview of the relationship between operations research and data mining. The basic objectives of the study are to highlight the spectrum of interactions between the two areas, present specific instances of significant research effort, and provide extensive references to additional significant work in the area. The purpose of this study is to examine not only the many optimization techniques that may be used for data mining, but also the process of data mining itself, as well as the ways in which operations research techniques can be utilized at almost every stage of this process. The report also identifies many potentially fruitful avenues for further investigation throughout its body. In the last part of the study, many applications connected to the administration of electronic services, including customer relationship management and customization, are investigated.
33. Chapter 33 shows that the field of Recommender Systems (RS) research has expanded to include a broad range of AI methods, from simpler ones like Matrix Factorization (MF) to more advanced ones like Deep Neural Networks (DNN) in recent years. Because it only takes into account a linear combination of user and item vectors, traditional Collaborative Filtering (CF) recommendation algorithms like MF have limited learning potential. Neural collaborative filtering (NCF) uses deep neural networks (DNN) in combination with collaborative filtering (CF) to learn non-linear correlations. CF approaches still have issues with cold start and data sparsity, though. In order to increase recommendation accuracy, deal with cold starts, and fill in gaps in data sparsity, this research offers a new hybrid-based RS called Neural Matrix Factorization++ (NeuMF++). We propose NeuMF++, which improves upon NeuMF by adding Stacked Denoising Autoencoders (SDAE) for more accurate latent representation. and MLP++ can be combined to form NeuMF++. By combining Generalized Matrix Factorization (GMF) with Multilayer Perceptrons (MLP), NeuMF provides a robust NCF architecture. By combining the linearity of GMF with the non-linearity of MLP, NeuMF is able to produce state-of-the-art results. At the same time, GMF++ and MLP++ have been developed due to the successful integration of latent representations into the original GMF and MLP. NeuMF++'s learning capacity is greatly improved by the latent representation it obtains from the SDAEs' latent space, which enables it to accurately learn user and item characteristics. However, NeuMF++'s performance may suffer if GMF++ and MLP++ are forced to share feature extractions. Consequently, enabling GMF++ and MLP++ to independently learn features increases its adaptability and dramatically boosts its performance. The experimental root-mean-square error of 0.8681 obtained by NeuMF++ in a real-world dataset shows that it performs exceptionally well. Additional data, such as text or photos, can be added to NeuMF++ in later development. NeuMF++ allows for the incorporation of several neural network building elements to create a more robust recommendation model.
34. In chapter 34, The goal of this research is to examine how e-commerce and web-based businesses might benefit from the use of Big Data Analytics in managerial decision making. The data used in this analysis comes from a single e-commerce website's database. User interactions with the website, such as page views, product additions, and online purchases, would be recorded. Association Rule Mining Algorithm (APRIORI), K-Means Clustering, and Pearson's Correlation Coefficient are only few of the algorithms that will be utilized to evaluate the dataset. The information will be analyzed and utilized to provide insights into users' interactions with the website, allowing for the identification of patterns that might inform future actions. For instance, which product receives the most attention and sales, how many pages are viewed before a purchase is made, what percentage of customers buy the product again, and so on. This study would also determine whether or not the company's present Big Data implementation can be enhanced, as well as whether or not doing so would be a smart use of resources.
35. Chapter 35 presents that different e-learning recommendation strategies that benefit both students and teachers have been developed in recent years. In these cases, it is necessary to provide students and teachers with individualized instruction through the use of online learning systems tailored to their specific needs. In this study, we employ a split-and-conquer strategy-based clustering technique to design a smart recommender that can automatically adjust to the needs, preferences, and skill levels of each individual learner. The recommender is self-learning and does automated analyses of learner preferences and features. Using a divide-and-conquer approach, several learning modalities are grouped together for analysis. To extract the learners' functional patterns, the suggested cluster-based linear pattern mining approach is used. The machine then makes insightful suggestions by considering the ratings of common patterns. The proposed model offered critical learning tasks to learners based on their learning style, interest categorization, and talent traits, and it was tested on a variety of learner groups and datasets. The suggested cluster-based recommender was found to increase recommendation performance in experiments by leading to more lessons being finished by learners compared to those in the no-recommender cluster group. It was determined that over 65% of the students used all evaluation criteria while assessing the suggested recommendation tool. The simulation results showed that the suggested recommender achieved higher metric values for larger learners. There were statistically significant variations in the assessed measures when the number of students was more than 1000. Using a computational framework that varied with |L| (the size of the suggestion list) and the characteristics of the students, we were able to identify the reasons for the observed discrepancies. The students had similar positive reactions to the recommender's precision and quickness. Standard deviation and mean of parameters for Recall (List, User) and Ranking Score (User) measurements differ significantly from other approaches for the sample dataset studied. The developed strategy outperformed competing strategies on all relevant criteria. In contrast to many well-known approaches, our recommender achieves the lowest mean absolute error across all clusters.
36. Chapter 36 shows that by analysing the geographical, chronological, and semantic components of geographic data, it is possible to reconstruct users' real route itineraries and get insights into their preferences and behavior. To analyse tourist traffic patterns at A-level scenic spots in Jiangsu and Zhejiang across time and space, this research collects and preprocesses Weibo check-in data for these locations. The author used a temporal perspective, examining how check-in data fluctuated between 2016 and 2018 and how it differed on weekends, holidays, and weekdays. The acquired data were subjected to a spatial kernel density analysis, which revealed the most active regions. Lastly, the vacation travel mode and characteristics were uncovered by an examination of spatial and locational flows and flow orientations. The results of this study provide the groundwork for the growth of wisdom tourism.
37. Chapter 37 shows that a user's interests and goals can be deduced with the use of a Recommendation System (RS), a new form of technology that employs knowledge discovery methods. User intent analysis and corresponding suggestion triggering become increasingly challenging in the face of exponential data growth. This work proposes a unique Deep Knowledge Graph (DKG) to do the necessary data analysis and construct the RS. DKG employs a Deep Convolutional Neural Network (DCNN). Our proposed DKG explicitly models the KG's end-to-end high-order connectivities. It recursively propagates the embeddings from a node's neighbors, which can be people, things, or traits, using an attention approach to assess the relative significance of the neighbours. In terms of theory, our DKG outperforms existing KG-based recommendation methods since it does not rely on regularization or an explicit representation of high-order relations. Empirical results on public benchmarks show that KGAT outperforms state-of-the-art methods like RippleNet and Neural FM. The benefits of the attention mechanism for interpretability, as well as the efficacy of embedding propagation for high-order relation modeling, have been shown in subsequent studies.
S. Kannadhasan
Head of the Departmnet
Department of Electronics and Communication Engineering
Study World College of Engineering
Coimbatore, Tamilnadu- 641105
R. Nagarajan
Department of Electrical and Electronics Engineering,
Gnanamani College of Technology, Namakkal, Tamilnadu, India
&
Kaushik Pal
Laboratório de Biopolímeros e Sensores
Instituto de Macromoléculas
Universidade Federal do Rio de Janeiro (LABIOS/IMA/UFRJ)
Rio de Janeiro, RJ- 21941-901
Brazil