Duc Q. Nguyen, who is also known as Martin Nguyen, recently graduated with a Bachelor of Engineering in Computer Science. Now he is working as Research Assistant and Teaching Assistant under the supervision of Assoc. Prof. Tho Quan. His research interests include Generative Models, Graph Representation Learning, and Probabilistic Machine Learning.
His biggest ambition is to make human life better and better using his talent and experiences.
Download his resumé .
BEng in Computer Science, 2022
Ho Chi Minh City University of Technology
Proficiency
95%
Intermediate
Subgraph matching is a challenging problem with a wide range of applications in drug discovery, social network analysis, biochemistry, and cognitive science. It involves determining whether a given query graph is present within a larger target graph. Traditional graph-matching algorithms provide precise results but face challenges in large graph instances due to the NP-complete nature of the problem, limiting their practical applicability. In contrast, recent neural network-based approximations offer more scalable solutions but often lack interpretable node correspondences. To address these limitations, this article presents a multi-task learning framework called xNeuSM: Explainable Neural Subgraph Matching, which introduces Graph Learnable Multi-hop Attention Networks (GLeMA) that adaptively learn the parameters governing the attention factor decay for each node across hops rather than relying on fixed hyperparameters. Our framework jointly optimizes both subgraph matching and finding subgraph-isomorphism mappings. We provide a theoretical analysis establishing error bounds for GLeMA’s approximation of multi-hop attention as a function of the number of hops. Additionally, we prove that learning distinct attention decay factors for each node leads to a correct approximation of multi-hop attention. Empirical evaluation on real-world datasets shows that xNeuSM achieves substantial improvements in prediction F1 score of up to 34% compared to approximate baselines and, notably, at least a seven-fold faster query time than exact algorithms. With these results, xNeuSM can be applied to solve matching problems in various domains spanning from biochemistry to social science
Recent advancements in large language models (LLMs) have underscored their importance in the evolution of artificial intelligence. However, despite extensive pretraining on multilingual datasets, available open-sourced LLMs exhibit limited effectiveness in processing Vietnamese. The challenge is exacerbated by the absence of systematic benchmark datasets and metrics tailored for Vietnamese LLM evaluation. To mitigate these issues, we have finetuned LLMs specifically for Vietnamese and developed a comprehensive evaluation framework encompassing 10 tasks and 31 metrics. We observe that finetuning can help LLMs transfer knowledge across languages, serving as an efficient way to bolster their capabilities in non-English languages. Moreover, our analysis indicates that larger models can introduce more biases and uncalibrated outputs and the key factor influencing LLM performance is the quality of the training or finetuning datasets. These insights underscore the significance of meticulous finetuning with high-quality datasets in enhancing LLM performance.
From the end of 2019, one of the most serious and largest spread pandemics occurred in Wuhan (China) named Coronavirus (COVID-19). As reported by the World Health Organization, there are currently more than 100 million infectious cases with an average mortality rate of about five percent all over the world. To avoid serious consequences on people’s lives and the economy, policies and actions need to be suitably made in time. To do that, the authorities need to know the future trend in the development process of this pandemic. This is the reason why forecasting models play an important role in controlling the pandemic situation. However, the behavior of this pandemic is extremely complicated and difficult to be analyzed, so that an effective model is not only considered on accurate forecasting results but also the explainable capability for human experts to take action pro-actively.
With the recent advancement of Artificial Intelligence (AI) techniques, the emerging Deep Learning (DL) models have been proving highly effective when forecasting this pandemic future from the huge historical data. However, the main weakness of DL models is lacking the explanation capabilities. To overcome this limitation, we introduce a novel combination of the Susceptible-Infectious-Recovered-Deceased (SIRD) compartmental model and Variational Autoencoder (VAE) neural network known as BeCaked. With pandemic data provided by the Johns Hopkins University Center for Systems Science and Engineering, our model achieves $0.98$ $R^2$ and $0.012$ $MAPE$ at world level with $31$-step forecast and up to $0.99$ $R^2$ and $0.0026$ $MAPE$ at country level with $15$-step forecast on predicting daily infectious cases. Not only enjoying high accuracy, but BeCaked also offers useful justifications for its results based on the parameters of the SIRD model. Therefore, BeCaked can be used as a reference for authorities or medical experts to make on time right decisions.