2023 Special Issue 

Table of contents 

Full issue  



Gergely Kovásznai and Imre Varga
Special Issue on Applied Informatics 
IN TODAY'S dynamic and ever-evolving digital landscape, applied informatics plays a pivotal role in shaping the future of technology. From refining algorithms for enhanced data analysis to optimizing communication networks and advancing artificial intelligence, the realm of applied informatics continues to drive innovation and transformation across industries. This current Special Issue features contributions from the 12th International Conference on Applied Informatics (ICAI 2023), which was held in Eger, Hungary on March 2-4, 2023. These research papers explore novel insights, innovative methodologies, and practical applications within the field of computer science and informatics. Each of them represents a valuable contribution to the applied informatics field and offers insights that bridge the gap between theory and practical application. They are a testament to the diversity and dynamism of our field, showcasing a wide range of research topics and applications.




Balázs Szalontai, Péter Bereczky and Dániel Horpácsi
Deep Learning-Based Refactoring with Formally Verified Training Data 
Refactoring source code has always been an active area of research. Since the uprising of various deep learning methods, there have been several attempts to perform source code transformation with the use of neural networks. More specifically, Encoder-Decoder architectures have been used to transform code similarly to a Neural Machine Translation task. In this paper, we present a deep learning-based method to refactor source code, which we have prototyped for Erlang. Our method has two major components: a localizer and a refactoring component. That is, we first localize the snippet to be refactored using a recurrent network, then we generate an alternative with a Sequence-to- Sequence architecture. Our method could be used as an extension for already existing AST-based approaches for refactoring since it is capable of transforming syntactically incomplete code. We train our models on automatically generated data sets, based on formally verified refactoring definitions and by using attribute grammar-based sampling.

DOI:  10.36244/ICJ.2023.5.1


István Fazekas , László Fórián , and Attila Barta
Deep Learning from Noisy Labels with Some Adjustments of a Recent Method 
In this paper we have used JoCoR, a fairly recent method for learning with label noise, that makes use of two neural networks with a joint loss function using an additional contrastive loss to increase the agreement between them. This method can be extended to more than two networks in a straightforward way. We have carried out experiments on the CIFAR-10 and CIFAR-100 datasets (contaminated by synthetic label noise) with this kind of extension using several contrastive losses. We have concluded that it makes a significant improvement if we use a third network, especially when we use Kullback-Leibler terms for all possible pairs of softmax outputs. Further extension also means some kind of improvement, but in the case of the CIFAR datasets, those were not so significant, maybe except the cases with lower ratio of label noise.

DOI:  10.36244/ICJ.2023.5.2



László Kovács, Erika Baksáné Varga, and Péter Mileff
Application of Neural Network Tools in Process Mining 
Dominant current technologies in process mining use schema induction approaches based on graph and au- tomaton methods. The paper investigates the application of neural network approaches in schema induction focusing on three alternative architectures: MLP, CNN and LSTM networks. The proposed neural network models can be used to discover XOR, loop and parallel execution templates. In the case of loop detection, the performed test analyses show the dominance of CNN approach where the string is represented with a two- dimensional similarity matrix. The usability of the proposed approach is demonstrated with test examples.

DOI:  10.36244/ICJ.2023.5.3



Adél Bajcsi, Anna Bajcsi, Szabolcs Pável, Ábel Portik, Csanád Sándor, Annamária Szenkovits, Orsolya Vas, Zalán Bodó, and Lehel Csató
Comparative Study of Interpretable Image Classification Models 
Explainable models in machine learning are increas- ingly popular due to the interpretability-favoring architectural features that help human understanding and interpretation of the decisions made by the model. Although using this type of model – similarly to “robustification” – might degrade prediction accuracy, a better understanding of decisions can greatly aid in the root cause analysis of failures of complex models, like deep neural networks. In this work, we experimentally compare three self-explainable image classification models on two datasets – MNIST and BDD100K –, briefly describing their operation and highlighting their characteristics. We evaluate the backbone models to be able to observe the level of deterioration of the prediction accuracy due to the interpretable module introduced, if any. To improve one of the models studied, we propose modifications to the loss function for learning and suggest a framework for automatic assessment of interpretability by examining the linear separability of the prototypes obtained.

DOI:  10.36244/ICJ.2023.5.4



Tamas Nyiri and Attila Kiss
What Can We Learn from Small Data 
Over the past decade, deep learning has profoundly transformed the landscape of science and technology, from refining advertising algorithms to pioneering self-driving vehicles. While advancements in computational capabilities have fueled this evolution, the consistent availability of high quality training data is less of a given. In this work, the authors aim to provide a bird’s eye view on topics pertaining to small data scenarios, that is scenarios in which a less than desirable quality and quantity of data is given for supervised learning. We provide an overview for a set of challenges, proposed solution and at the end tie it together by practical guidelines on which techniques are useful in specific real-world scenarios.

DOI:  10.36244/ICJ.2023.5.5



Mohammed Nsaif, Gergely Kovásznai , Ali Malik , and Ruairí de Fréin
Survey of Routing Techniques-Based Optimization of Energy Consumption in SD-DCN 
The increasing power consumption of Data Center Networks (DCN) is becoming a major concern for network operators. The object of this paper is to provide a survey of state-of-the-art methods for reducing energy consumption via (1) enhanced scheduling and (2) enhanced aggregation of traffic flows using Software-Defined Networks (SDN), focusing on the advantages and disadvantages of these approaches. We tackle a gap in the literature for a review of SDN-based energy saving techniques and discuss the limitations of multi-controller solutions in terms of constraints on their performance. The main finding of this survey paper is that the two classes of SDNbased methods, scheduling and flow aggregation, significantly reduce energy consumption in DCNs. We also suggest that Machine Learning has the potential to further improve these classes of solutions and argue that hybrid ML-based solutions are the next frontier for the field. The perspective gained as a consequence of this analysis is that advanced ML-based solutions and multi-controller-based solutions may address the limitations of the state-of-the-art, and should be further explored for energy optimization in DCNs.

DOI:  10.36244/ICJ.2023.5.6



Djamila Talbi, and Zoltan Gal
Decomposition Based Congestion Analysis of the Communication in B5G/6G TeraHertz High-Speed Networks 
The New MAC mechanism plays a key role in achieving the needed requirements of the B5G/6G radio technology and helps to avoid high-speed frequency issues and limitations. With the help of the ns-3 simulator, we generated 42 different cases for the purpose of analyzing the impact of the network load on the overall effective transmission rate. Therefore, the use of the data-adaptive decomposition method the Empirical Mode Decomposition (EMD) on our non-stationary system benefits in the extraction of the important meaningful components. However, due to the highlighted direction dependency finding of EMD, Ensembled EMD (EEMD) being direction independent shows better performance on our data series. The extracted trend based on the proposed method matches the fitting curve, while the fitting curve parameters can be clusterized into 2 main clusters congested and non-congested cases of the radio channel throughput signal.

DOI:  10.36244/ICJ.2023.5.7




Gereltsetseg Altangerel and Máté Tejfel
In-network DDoS detection and mitigation using INT data for IoT ecosystem  
Due to the limited capabilities and diversity of Internet of Things (IoT) devices, it is challenging to implement robust and unified security standards for these devices. Additionally, the fact that vulnerable IoT devices are beyond the network’s control makes them susceptible to being compromised and used as bots or part of botnets, leading to a surge in attacks involving these devices in recent times. We proposed a real-time IoT anomaly detection and mitigation solution at the programmable data plane in a Software-Defined Networking (SDN) environment using Inband Network telemetry (INT) data to address this issue. As far as we know, it is the first experiment in which INT data is used to detect IoT attacks in the programmable data plane. Based on our performance evaluation, the detection delay of our proposed approach is much lower than the results of previous Distributed Denial-of-Service (DDoS) research, and the detection accuracy is similarly high.

DOI:  10.36244/ICJ.2023.5.8



Péter Marjai, Máté Nagy-Sándor, and Attila Kiss
The performance of modern centrality measures on different information models and networks 
For the last few years networks became integral parts of our everyday life. They are used in communication, transportation, marketing, and the list goes on. They are also becoming bigger, and more complex and dynamic networks also start to appear more. In light of this, the problem of finding the most influential node in the network remains of high interest however, it is getting more and more difficult to find these nodes. It is hard to grasp the true meaning of what is really being the most influential node means. There are several approaches to define what the most vital nodes are like having the most edges connected to them or having the shortest paths running through them. They can be also identified by calculating the influence of their neighbors, or evaluating how they contribute to the whole of the network. Over recent years various new centrality measures were proposed to order the importance of the nodes of a network. In this paper, we evaluate the performance of three modern centrality measures, namely Local Fuzzy Information Centrality (LFIC), Local Clustering H-index Centrality (LCH), and Global Structure Model (GSM) on different information models, and compare them with conventional centrality measures. In our experiments, we investigate the similarity between the top-n ranking nodes of the measures, the influential capacity of these nodes as well as the frequency of the nodes with the same centrality value.

DOI:  10.36244/ICJ.2023.5.9



Zijian Győző Yang, and Noémi Ligeti-Nagy
Improve Performance of Fine-tuning Language Models with Prompting 
This paper explores the effectiveness of prompt programming in the fine-tuning process of a Hungarian language model. The study builds on the prior success of prompt engineering in natural language processing tasks and employs the prompting method to enhance the fine-tuning performance of a huBERT model on several benchmark datasets of HuLU. The experimentation involves testing 45 prompt combinations for the HuCoPA dataset and 15 prompt variations for the HuRTE and HuWNLI datasets. The findings reveal that the addition of an instructional text consistently produces the best results across all winning cases, and that the [CLS] token produces the best results in the separator token experiments. The most significant enhancement was observed in the HuWNLI dataset, with an increase in accuracy from 65% to 85%. These results demon- strate that the addition of instruct text is crucial and sufficient in enabling the language model to effectively interpret and solve the Winograd Schemata problem. These results showcase the potential of prompt programming in enhancing the performance of language models in fine-tuning tasks, and highlight the importance of incorporating task-specific instructions to improve model interpretability and accuracy.

DOI:  10.36244/ICJ.2023.5.10



Baasanjargal Erdenebat, Bayarjargal Bud, and Tamás Kozsik
Challenges in service discovery for microservices deployed in a Kubernetes cluster – a case study 
With Kubernetes emerging as one of the most popular infrastructures in the cloud-native era, the utilization of containerization and tools alongside Kubernetes is steadily gaining traction. The main goal of this paper is to evaluate the service discovery mechanisms and DNS management (CoreDNS) of Kubernetes, and to present a general study of an experiment on service discovery challenges. In large scale Kubernetes clusters, running pods, services, requests, and workloads can be substantial. The increased number of HTTP-requests often result in resource utilization concerns, e.g., spikes of errors [24], [25]. This paper investigates potential optimization strategies for enhancing the performance and scalability of CoreDNS in Kubernetes. We propose a solution to address the concerns related to CoreDNS and provide a detailed explanation of how our implementation enhances service discovery functionality. Experimental results in a real-world case show that our solution for the CoreDNS ensures consistency of the workload. Compared with the default CoreDNS configuration, our customized approach achieves better performance in terms of number of errors for requests, average latency of DNS requests, and resource usage rate. 

DOI:  10.36244/ICJ.2023.5.11

Technical Co-Sponsors





National Cooperation Fund, Hungary