INTELLIGENT MACHINES LAB (iML)
We perform fundamental and applied research in machine learning, deep learning and related areas in computer vision and natural language processing.
More specifically, our research focuses on multimodal large language models, vision-language models and agentic AI, with an emphasis on interpretability, robustness, and data-efficient transfer learning. Beyond technical advances, we consider the societal impact of these systems, aiming to develop intelligent models that are transparent, trustworthy, and beneficial to society.
Research
Keep up to date with what we're working on!
Explainable AI and NLP to help better understand issues around misinformation and vaccine hesitancy in social media
Collaboration with the Department of law and Legal Studies and the School of Journalism and Communication at Carleton U

Explainable AI and NLP for assessment of functional limitations and disability services for postsecondary education
Collaboration with the Readi initiative, Accessibility Institute and The Paul Menton Centre (PMC) for Students with Disabilities.

Deep learning and computer vision for communicating graphical information to visually impaired or blind individuals

Explainable AI for predicting chronic homelessness
Collaboration with the City of Ottawa

Artificial Intelligence in real-time perioperative Electrocardiogram (ECG) monitoring
Collaboration with the Ottawa hospital

Explainable AI for predictive analytics in employee benefits insurance

Explainable AI for analyzing EMR data
Collaboration with the Institute of Mental Health Research, Ottawa

Past Projects
Biometrics, spoofing attacks and countermeasures
ML for analyzing brain signals (EEG)
ML for unobtrusive monitoring of vital physiologic parameters
ML for analyzing brain MRI images in patients with ASD
Digital tools for revitalizing endangered languages (ELK-Tech)









Our team
Small team. Big hearts.
Our focus is always on finding the best people to work with. Our bar is high, but you look ready to take on the challenge.
Majid Komeili
Director
Abbas Akkasi
Postdoctoral Fellow (with Boris Vukovic and Kathleen Fraser)
Mohammad Reza Zarei
PhD (with Frank Dehne)
Adnan Khan
PhD
Hoda Vafaeesefat
MCS
Youssef Fahmani
MCS (with Adrian Chan)
Wooseok Kim
MCS (from Fall 2025)
Alireza Choubineh
RA
Past Grad Students
Alireza Choubineh, MCS, Fall 2024
Rakshil Kevadiya, MCS, Fall 2024 (with Boris Vukovic and Kathleen Fraser), moved on to Canada Revenue Agency
Mitchell Chatterjee, MCS, Summer 2024 (with Adrian Chan), moved on to JSI
Seyed Omid Davoudi, PhD, Winter 2024, (with Frank Dehne), moved on to Larus Technologies
Aatreyi Pranavbhai Mehta, Winter 2023, MCS, moved on to Razor Sharp Consulting
Galen O'Shea, Winter 2023, MCS, moved on to Mission Control
Mohammad Mahdi Heydari Dastjerdi, Summer 2022, MCS, moved on to Paphus Solutions
Mohammad Nokhbeh Zaeem, Winter 2021, MCS, moved on to SoundHound Inc
Siraj Ahmed, Fall 2020, MCS, U Ottawa, Co-supervised with Prof. J. Park, moved on to Braiyt AI Inc
Abhijeet Chauhan, 2020, MCS, moved on to IMRSV Data Labs
Past Undergrad Students
Mir Hassan, Winter 2025, Honors Project
Saurabh Gummaraj Kishore, Winter 2024, Honors Project
David Hobson, Winter 2023, Honors Thesis
Kailash Balakrishnan, Winter 2023, Honors Project
Jesse Mendoza, Honors Project
Hilaire Djani, Honors Thesis
Tim Elliott, Honors Project
Juntong He, Honors Project
Qixiang Luan, Honors Project
M. Kazman, Fall 2021, Honors Project.
A. Ong, Fall 2021, Honors Project.
J. Woo, Summer 2021, Honors Project.
I. Nicolaev, Summer 2021, Honors Project.
M. Kazman, Summer 2021, Honors Project.
J. Geng, Winter 2021, Honors Project.
Y. Song, Winter 2021, Honors Project.
K. Zhen, Winter 2021, Honors Project.
H. Le, Fall 2020, Honors Project.
Y. Gao, Fall 2020, Honors Project.
T. Cao, Fall 2020, Honors Project.
Y. Chen, Fall 2020, Honors Project.
V. Nguyen, Summer 2020, Honors Project.
J. Danovitch, Winter 2020, Honors Thesis.
M. Kuzmenko, Winter 2020, Honors Project.
L. Wise, Winter 2020, Honors Project.
L. Koftinow-Mikan, Fall 2019, Honors Project.
X. Liu, Fall 2019, Honors Project.
G. O'Shea, Summer 2019, Honors Project.
L. Colwell, Summer 2019, DSRI internship.
K. Causton, Summer 2019, Honors Project (with Oliver).
Y. Yamanaka, Winter 2019, Honors Project.
S. Kudolo, Winter 2019, Honors Project.
L. Gruska, Winter 2019, Honors Project.
L. He, Winter 2019, Honors Project.
Joining/Volunteering
APPLYING FOR MSC OR PHD:
MSc and PhD applicants who are interested in my research are encouraged to contact me via email.
Prerequisites: A good candidate should have background in probability and linear algebra, and have had courses in Machine Learning or related areas including Computer Vision and Natural Language Processing.
Prospective MSc and PhD students who are applying to the School of Computer Science at Carleton University and are interested in my research are encouraged to indicate my name as their preferred research supervisor.
Please note that due to the volume of emails I receive, I am not able to respond to all.
Undergrad students at Carleton University who are interested in doing their Honours project/thesis with me, are encouraged to contact me via email.
Publications
Histological analysis is a cornerstone of preclinical respiratory disease research, enabling assessment of pathology, therapeutic effects, and mechanisms. However, conventional approaches rely on manual scoring, which is subjective, time-consuming, and difficult to scale due to low throughput and inter-observer variability. Artificial intelligence (AI), particularly deep learning, offers potential to automate histology workflows, but its use and evaluation in preclinical respiratory models have not been synthesized. We conducted a scoping review following Joanna Briggs Institute guidelines, searching MEDLINE and Embase (inception–January 2025) for preclinical studies using AI to analyze histology in respiratory disease models. Screening, full-text review, and data extraction were performed in duplicate. Of 6271 studies screened, 29 met inclusion criteria. Most used murine models (76%) and investigated lung cancer (28%), pulmonary fibrosis (24%), or tuberculosis (17%). Hematoxylin and eosin was the most common stain (48%), with others targeting collagen or immune markers. AI tasks included image classification (n=20), segmentation (n=10), and object detection (n=4), predominantly using convolutional neural networks (69%). Preprocessing methods (e.g., stain normalization) were common, but annotation and training practices were inconsistently reported. Performance was generally high (accuracy ≥90%; 7 studies) though validation metrics varied, and external validation was absent. Most studies used “black box” models, with minimal application of explainability techniques. Reproducibility measures, such as sharing datasets or code were rarely reported. AI tools are poised to transform histological analysis in preclinical respiratory research. By addressing gaps in validation, transparency, and standardization, the field can harness these technologies to deliver robust, efficient, and scalable workflows. Registration: Open Science Framework https://doi.org/10.17605/OSF.IO/NM94E
Multimodal large language models (MLLMs) have demonstrated impressive capabilities in integrating visual and textual information, with visual question answering (VQA) serving as a central task for evaluation. However, existing VQA datasets primarily target inference from static visual cues or factual content, leaving temporal and procedural reasoning underexplored. While video question answering allows for temporal understanding by providing access to full video sequences, in many real-world scenarios only a single image is available. To address this gap, we introduce TemporalCook, a new benchmark constructed from the YouCookII instructional video dataset that requires models to predict procedural and temporal outcomes based on static images in the cooking domain. To further augment temporal reasoning capabilities, we supplement the benchmark with external instructional videos. We also present two retrieval-augmented generation (RAG) baselines, leveraging either curated video knowledge source or open-domain retrieval from online video resources. We report a benchmark scoreboard for leading commercial and open-source multimodal models on TemporalCook, and evaluate the effectiveness of retrieval-augmented generation baselines. TemporalCook and these baselines provide a foundation for future research in temporal VQA and open new directions for developing retrieval-augmented solutions in temporally grounded multimodal tasks. The benchmark and evaluation code are publicly available at https://github.com/mrzarei5/TemporalCook.
AI-driven histopathology analysis promises faster, more accurate diagnostics, but relies on high-quality, manually annotated data, which is labor-intensive and prone to variability. In acute respiratory distress syndrome (ARDS) research, where preclinical acute lung injury (ALI) models are vital due to the lack of curative therapies, these challenges are pronounced. We introduce LungInsightAnnotation, a tool to standardize and streamline the histopathology workflow, from data curation to AI model training. With random field selection, centralized storage, and intuitive interfaces, it enhances annotation efficiency and consistency. We demonstrate its efficacy in accurately predicting intra-alveolar neutrophils for ALI scoring in ARDS models, advancing reliable, AI-driven clinical research.
Zero-shot visual question answering (VQA) poses a formidable challenge at the intersection of computer vision and natural language processing. Traditionally, this problem has been tackled using end-to-end pre-trained vision-language models (VLMs). However, recent advancements in large language models (LLMs) demonstrate their exceptional reasoning and comprehension abilities, making them valuable assets in multi-modal tasks, including zero-shot VQA. LLMs have been previously integrated with VLMs to solve zero-shot VQA in a conversation-based approach. However, while the focus in VQA tasks is often on specific regions rather than the entire image, this aspect has been overlooked in previous approaches. Consequently, the overall performance of the framework relies on the ability of the pre-trained VLM to locate the region of interest that is relevant to the requested visual information within the entire image. To address this challenge, this paper proposes Grounded Multi-modal Conversation for Zero-shot Visual Question Answering (GMC-VQA), a region-based framework that leverages the complementary strengths of LLMs and VLMs in a conversation-based approach. We employ a grounding mechanism to refine visual focus according to the semantics of the question and foster collaborative interaction between VLM and LLM, effectively bridging the gap between visual and textual modalities and enhancing comprehension and response generation for visual queries. We evaluate GMCVQA across three diverse VQA datasets, achieving substantial average improvements of 10.04% over end-to-end VLMs and 2.52% over the state-of-the-art VLM-LLM communicationbased framework, respectively. Our code is publicly available at https://github.com/mrzarei5/GMC-VQA.
Background: Perioperative electrocardiographic monitoring can offer immediate detection of myocardial ischaemia, yet its application in perioperative and remote monitoring settings is hampered by frequent false alarms and signal contamination. We performed a scoping review for the current state of artificial intelligence (AI) in perioperative ECG interpretation. Methods: A literature search in Ovid MEDLINE, EMBASE, Compendex, and CINAHL databases was performed from inception to May 10, 2023. All original research of ECG monitoring for myocardial ischaemia, myocardial infarction, or both was included. Results: A total of 182 original research articles published between 1991 and 2023 were included. Most studies (n=132) used a pre-existing ECG database to develop AI algorithms retrospectively, and the rest did not specify their sources. Processing filters were used in 58% of the studies to remove ECG noises/artifacts before AI algorithm development. Amongst the AI technologies used, ResNet demonstrated the highest median sensitivity, precision, and specificity at 98.4%, 99.8%, and 99.1%, respectively. There are only five studies with intermittent prospective ECG collection on STsegment elevation myocardial infarction. No studies prospectively collected continuous ECG perioperatively, with frequent false alarms and signal contamination. Conclusions: AI technology can achieve high diagnostic accuracy for myocardial ischaemia detection in clean intermittent electrocardiograms. However, almost all these algorithms were developed from a few open-source clean ECG databases without testing on ‘noisy data’, which limited their clinical applicability in the perioperative setting where signal contamination is frequent. AI algorithms on perioperative electrocardiography, tested in a noisy perioperative and remote monitoring environment, including wearable devices, are needed.
Tactile graphics are essential for providing access to visual information for the 43 million people globally living with vision loss. Traditional methods for creating these graphics are labor-intensive and cannot meet growing demand. We introduce TactileNet, the first comprehensive dataset and AI-driven framework for generating embossing-ready 2D tactile templates using text-to-image Stable Diffusion (SD) models. By integrating Low-Rank Adaptation (LoRA) and DreamBooth, our method fine-tunes SD models to produce high-fidelity, guideline-compliant graphics while reducing computational costs. Quantitative evaluations with tactile experts show 92.86% adherence to accessibility standards. Structural fidelity analysis revealed near-human design similarity, with an SSIM of 0.538 between generated graphics and expert-designed tactile images. Notably, our method preserves object silhouettes better than human designs (SSIM = 0.259 vs. 0.215 for binary masks), addressing a key limitation of manual tactile abstraction. The framework scales to 32,000 images (7,050 high-quality) across 66 classes, with prompt editing enabling customizable outputs (e.g., adding or removing details). By automating the 2D template generation step-compatible with standard embossing workflows-TactileNet accelerates production while preserving design flexibility. This work demonstrates how AI can augment (not replace) human expertise to bridge the accessibility gap in education and beyond. Code, data, and models will be publicly released to foster further research.
Tactile graphics enable individuals with visual impairment to interpret visual information through touch, supporting navigation, education, and social engagement. However, manually designing tactile graphics is costly, labor-intensive, and difficult to scale. This work introduces a text-guided image-to-image translation approach to generate tactile maps from RGB maps. By leveraging natural language prompts, the method allows control over map elements such as lakes, rivers, and cities, enabling customization based on specific needs. To train the model, we created a custom dataset consisting of 1,845 RGB maps of Canadian provinces, each paired with multiple tactile variations reflecting different levels of detail. Corresponding text prompts were designed to describe these variations, forming a dataset of 9,800 triplets (RGB map, tactile map, prompt). Human expert assessments demonstrated that the proposed method outperforms a baseline model, with 47% of the outputs requiring minimal adjustments. The results highlight a scalable and efficient solution for tactile map generation, ensuring high-quality outputs while maintaining adaptability through text-based control.
Vision–language models (VLMs) pre-trained on large-scale image–text pairs have shown impressive results in zero-shot vision tasks. Knowledge transferability of these models can be further improved with the help of a limited number of samples. Feature adapter tuning is a prominent approach employed for efficient transfer learning (ETL). However, most of the previous ETL models focus on tuning either prior-independent or prior-dependent feature adapters. We propose a novel ETL approach that leverages both adapter styles simultaneously. Additionally, most existing ETL models rely on using textual prompts constructed by completing general pre-defined templates. This approach neglects the descriptive knowledge that can assist VLM by presenting an informative prompt. Instead of pre-defined templates for prompt construction, we use a pre-trained LLM to generate attribute-specific prompts for each visual category. Furthermore, we guide the VLM with context-aware discriminative information generated by the pre-trained LLM to emphasize features that distinguish the most probable candidate classes. The proposed ETL model is evaluated on 11 datasets and sets a new state of the art. Our code and all collected prompts are publicly available at https://github.com/mrzarei5/DATViL.
This paper addresses the challenge of automating the process of generating personalized follow-up questions (FQs) for students with disabilities based on their responses to the WHODAS 2.0 questionnaire. Given the diverse nature of FQs generated by disability service providers, our research aims to cluster these questions using advanced language models and ensemble clustering techniques. We utilized three different Sentence-Transformers embedding models (RoBERTa, MiniLM and MPNet) combined with clustering algorithms such as HDBSCAN, K-Means, BIRCH, Spectral Clustering, and Gaussian Mixture Models. Furthermore, an Adaptive Clustering Ensemble (ACE) method was employed to improve clustering performance. The results indicate that the ensemble method achieves greater stability and accuracy in clustering compared to individual models. Our findings demonstrate the potential of using AI to streamline the process of assessing and supporting students with disabilities in postsecondary education settings.
Few-shot learning (FSL) presents a challenging learning problem in which only a few samples are available for each class. Decision interpretation is more important in few-shot classification due to a greater chance of error compared to traditional classification. However, the majority of the previous FSL methods are black-box models. In this paper, we propose an inherently interpretable model for FSL based on human-friendly attributes. Previously, human-friendly attributes have been utilized to train models with the potential for human interaction and interpretability. However, such approaches are not directly extendible to the few-shot classification scenario. Moreover, we propose an online attribute selection mechanism to effectively filter out irrelevant attributes in each episode. The attribute selection mechanism improves accuracy and helps with interpretability by reducing the number of attributes that participate in each episode. We further propose a mechanism that automatically detects the episodes where the pool of available human-friendly attributes is insufficient, and subsequently augments it by engaging some learned unknown attributes. We demonstrate that the proposed method achieves results on par with black-box few-shot learning models on four widely used datasets. We also empirically evaluate the level of decision alignment between different models and human understanding and show that our model outperforms the comparison methods based on this criterion.
Blindness and visual impairments affect many people worldwide. For help with navigation, people with visual impairments often rely on tactile maps that utilize raised surfaces and edges to convey information through touch. Although these maps are helpful, they are often not widely available and current tools to automate their production have similar limitations including only working at certain scales, for particular world regions, or adhering to specific tactile map standards. To address these shortcomings, we train a proof-of-concept model as a first step towards applying computer vision techniques to help automate the generation of tactile maps. We create a first-of-its-kind tactile maps dataset of street-views from Google Maps spanning 6500 locations and including different tactile line- and area-like features. Generative adversarial network (GAN) models trained on a single zoom successfully identify key map elements, remove extraneous ones, and perform inpainting with median F1 and intersection-over-union (IoU) scores of better than 0.97 across all features. Models trained on two zooms experience only minor drops in performance, and generalize well both to unseen map scales and world regions. Finally, we discuss future directions towards a full implementation of a tactile map solution that builds on our results.
Tactile graphics are an essential tool for conveying visual information to visually impaired individuals. However, translating 2D plots, such as Bezier curves, polygons, and bar charts, into an effective tactile format remains a challenge. This paper presents a novel, two-stage deep learning pipeline for automating this conversion process. Our method leverages a Pix2Pix architecture, employing a U-Net++ generatornetwork for robust image generation. To improve the perceptual quality of the tactilerepresentations, we incorporate an adversarial perceptual loss function alongside agradient penalty. The pipeline operates in a sequential manner: firstly, convertingthe source plot into a grayscale tactile representation, followed by a transformationinto a channel-wise equivalent. We evaluate the performance of our model on a comprehensive synthetic datasetconsisting of 20,000 source-target pairs encompassing various 2D plot types. Toquantify performance, we utilize fuzzy versions of established metrics like pixel accuracy, Dice coefficient, and Jaccard index. Additionally, a human study is conductedto assess the visual quality of the generated tactile graphics. The proposed approach demonstrates promising results, significantly streamliningthe conversion of 2D plots into tactile graphics. This paves the way for the development of fully automated systems, enhancing accessibility of visual information forvisually impaired individuals.
The task of textual entailment holds significant importance when dealing with clinical data, as it serves as a foundational component for extracting and synthesizing medical information from vast amounts of unstructured text. To investigate the consistency with which Natural Language Inference (NLI) models capture semantic phenomena critical for intricate inference within clinical NLI contexts, SemEval−2024 has organized a shared task focused on NLI for Clinical Trials (NLI4CT). This task provides participants with a dataset annotated by humans for the purpose of model training and requires the submission of the results on test data for evaluation. We engaged in this shared task2 at SemEval−2024, employing a diverse set of solutions, with a particular emphasis on leveraging a Large Language Model (LLM) based zero-shot inference approach to address the challenge.
Interpretability in machine learning has become increasingly important as machine learning is being used in more and more applications, including those with high-stakes consequences such as healthcare where Interpretability has been regarded as a key to the successful adoption of machine learning models. However, using confounding/irrelevant information in making predictions by deep learning models, even the interpretable ones, poses critical challenges to their clinical acceptance. That has recently drawn researchers’ attention to issues beyond the mere interpretation of deep learning models. In this paper, we first investigate application of an inherently interpretable prototype-based architecture, known as ProtoPNet, for breast cancer classification in digital pathology and highlight its shortcomings in this application. Then, we propose a new method that uses more medically relevant information and makes more accurate and interpretable predictions. Our method leverages the clustering concept and implicitly increases the number of classes in the training dataset. The proposed method learns more relevant prototypes without any pixel-level annotated data. To have a more holistic assessment, in addition to classification accuracy, we define a new metric for assessing the degree of interpretability based on the comments of a group of skilled pathologists. Experimental results on the BreakHis dataset show that the proposed method effectively improves the classification accuracy and interpretability by respectively 8 % 8 % and 18 % 18 % . Therefore, the proposed method can be seen as a step toward implementing interpretable deep learning models for the detection of breast cancer using histopathology images.
Cardiovascular diseases are the primary cause of death globally. With the prevalence of electrocardiogram machines both within and outside the clinical environment, it is now possible to passively monitor a patient’s heartbeat for cardiovascular diseases long before they become a cause of concern. However, the most significant problem currently prohibiting the wide-scale deployment of automated electrocardiogram systems is the potential for false alarms, leading to a condition known as “alarm fatigue”. Of the major culprits causing such issues, noise in electrocardiogram data can often masquerade as instances of acute cardiovascular diseases. Moreover, incorrect labels provided by domain experts can bias models to repeat the same mistakes learned during training. Recently, as substantial amounts of unlabelled electrocardiogram data have become publicly available, self-supervision has emerged as an increasingly viable part of the pre-training process. This work begins by examining the importance of self-supervised learning for arrhythmia detection, demonstrating significant performance improvements as it reduces overfitting to class imbalance and noise. A new method for self-supervised pre-training on electrocardiogram data is proposed, obtaining SOTA results while simultaneously reducing the pre-training time by one-fifth and increasing the model’s capacity by a factor of 14, providing a new foundational model for future research. Finally, this work investigates multiple solutions for addressing the significant noise and class imbalance concerns in the electrocardiogram data and label set.
Explainable Reinforcement Learning is key in bringing the current neural network-based state-of-the-art reinforcement learning methods to real-world environments. In particular, explaining the risks of the agent’s decision-making process is critical to deploying such models in safety-critical tasks. The previous feature-based methods for characterizing risk in reinforcement learning settings were not well suited for handling situations when there are multiple sources of risk. This work attempts to address this shortcoming by providing a post-hoc method that can explain multiple sources of risk. Our experiments show that the proposed method can provide more insights into the workings of the agent while avoiding the issues faced by the previous work in multi-risk environments.
The accurate recognition of symptoms in clinical reports is significantly important in the fields of healthcare and biomedical natural language processing. These entities serve as essential building blocks for clinical information extraction, enabling retrieval of critical medical insights from vast amounts of textual data. Furthermore, the ability to identify and categorize these entities is fundamental for developing advanced clinical decision support systems, aiding healthcare professionals in diagnosis and treatment planning. In this study, we participated in SympTEMIST, a shared task on the detection of symptoms, signs and findings in Spanish medical documents. We combine a set of large language models fine-tuned with the data released by the organizers.
With the continuous advancement in unsupervised learning methodologies, text generation has become increasingly pervasive. However, the evaluation of the quality of the generated text remains challenging. Human annotations are expensive and often show high levels of disagreement, in particular for certain tasks characterized by inherent subjectivity, such as translation and summarization. Consequently, the demand for automated metrics that can reliably assess the quality of such generative systems and their outputs has grown more pronounced than ever. In 2023, Eval4NLP organized a shared task dedicated to the automatic evaluation of outputs from two specific categories of generative systems: machine translation and summarization. This evaluation was achieved through the utilization of prompts with Large Language Models. Participating in the summarization evaluation track, we propose an approach that involves prompting LLMs to evaluate six different latent dimensions of summarization quality. In contrast to many previous approaches to summarization assessments, which emphasize lexical overlap with reference text, this method surfaces the importance of correct syntax in summarization evaluation. Our method resulted in the second-highest performance in this shared task, demonstrating its effectiveness as a reference-free evaluation.
Part-prototype networks have recently become methods of interest as an interpretable alternative to many of the current black-box image classifiers. However, the interpretability of these methods from the perspective of human users has not been sufficiently explored. In addition, previous works have had major issues with following proper experiment design and task representation that limit their reliability and validity. In this work, we have devised a framework for evaluating the interpretability of part-prototype-based models from a human perspective that solves these issues. The proposed framework consists of three actionable metrics and experiments. The results of these experiments will reveal important and reliable interpretability related properties of such models. To demonstrate the usefulness of our framework, we performed an extensive set of experiments using Amazon Mechanical Turk. They not only show the capability of our framework in assessing the interpretability of various part-prototype-based models, but they also are, to the best of our knowledge, the most comprehensive work on evaluating such methods in a unified framework.
Gaze estimation is a valuable tool with a broad range of applications in various fields, including medicine, psychology, virtual reality, marketing, and safety. Therefore, it is essential to have gaze estimation software that is cost-efficient and high-performing. Accurately predicting gaze remains a difficult task, particularly in real-world situations where images are affected by motion blur, video compression, and noise. Super-resolution (SR) has been shown to remove these degradations and improve image quality from a visual perspective. This work examines the usefulness of super-resolution for improving appearance-based gaze estimation and demonstrates that not all SR models preserve the gaze direction. We propose a two-step framework for gaze estimation based on the SwinIR super-resolution model. The proposed method consistently outperforms the state-of-the-art, particularly in scenarios involving low-resolution or degraded images. Furthermore, we examine the use of super-resolution through the lens of self-supervised learning for gaze estimation and propose a novel architecture “SuperVision” by fusing an SR backbone network to a ResNet18. While only using 20% of the data, the proposed SuperVision architecture outperforms the state-of-the-art GazeTR method by 15.5%.
The accurate recognition of symptoms in clinical reports is significantly important in the fields of healthcare and biomedical natural language processing. These entities serve as essential building blocks for clinical information extraction, enabling retrieval of critical medical insights from vast amounts of textual data. Furthermore, the ability to identify and categorize these entities is fundamental for developing advanced clinical decision support systems, aiding healthcare professionals in diagnosis and treatment planning. In this study, we participated in SympTEMIST – a shared task on detection of symptoms, signs and findings in Spanish medical documents. We combine a set of large language models finetuned with the data released by the task's organizers.
Part-prototype networks have recently become methods of interest as an interpretable alternative to many of the current black-box image classifiers. However, the interpretability of these methods from the perspective of human users has not been sufficiently explored. In this work, we have devised a framework for evaluating the interpretability of part-prototype-based models from a human perspective. The proposed framework consists of three actionable metrics and experiments. To demonstrate the usefulness of our framework, we performed an extensive set of experiments using Amazon Mechanical Turk. They not only show the capability of our framework in assessing the interpretability of various part-prototype-based models, but they also are, to the best of our knowledge, the most comprehensive work on evaluating such methods in a unified framework.
Vaccine hesitancy continues to be a main challenge for public health officials during the COVID-19 pandemic. As this hesitancy undermines vaccine campaigns, many researchers have sought to identify its root causes, finding that the increasing volume of anti-vaccine misinformation on social media platforms is a key element of this problem. We explored Twitter as a source of misleading content with the goal of extracting overlapping cultural and political beliefs that motivate the spread of vaccine misinformation. To do this, we have collected a data set of vaccine-related Tweets and annotated them with the help of a team of annotators with a background in communications and journalism. Ultimately we hope this can lead to effective and targeted public health communication strategies for reaching individuals with anti-vaccine beliefs. Moreover, this information helps with developing Machine Learning models to automatically detect vaccine misinformation posts and combat their negative impacts. In this paper, we present Vax-Culture, a novel Twitter COVID-19 dataset consisting of 6373 vaccine-related tweets accompanied by an extensive set of human-provided annotations including vaccine-hesitancy stance, indication of any misinformation in tweets, the entities criticized and supported in each tweet and the communicated message of each tweet. Moreover, we define five baseline tasks including four classification and one sequence generation tasks, and report the results of a set of recent transformer-based models for them. The dataset and code are publicly available at https://github.com/mrzarei5/Vax-Culture.
Gaze tracking is a valuable tool with a broad range of applications in various fields, including medicine, psychology, virtual reality, marketing, and safety. Therefore, it is essential to have gaze tracking software that is cost-efficient and high-performing. Accurately predicting gaze remains a difficult task, particularly in real-world situations where images are affected by motion blur, video compression, and noise. Super-resolution has been shown to improve image quality from a visual perspective. This work examines the usefulness of super-resolution for improving appearancebased gaze tracking. We show that not all SR models preserve the gaze direction. We propose a two-step framework based on SwinIR super-resolution model. The proposed method consistently outperforms the state-of-the-art, particularly in scenarios involving low-resolution or degraded images. Furthermore, we examine the use of superresolution through the lens of self-supervised learning for gaze prediction. Self-supervised learning aims to learn from unlabelled data to reduce the amount of required labeled data for downstream tasks. We propose a novel architecture called “SuperVision” by fusing an SR backbone network to a ResNet18 (with some skip connections). The proposed SuperVision method uses 5x less labeled data and yet outperforms, by 15%, the state-of-the-art method of GazeTR which uses 100% of training data. We will make our code publicly available upon publication.
Few-shot learning (FSL) is a challenging learning problem in which only a few samples are available for each class. Decision interpretation is more important in few-shot classification since there is a greater chance of error than in traditional classification. However, most of the previous FSL methods are black-box models. In this paper, we propose an inherently interpretable model for FSL based on human-friendly attributes. Moreover, we propose an online attribute selection mechanism that can effectively filter out irrelevant attributes in each episode. The attribute selection mechanism improves the accuracy and helps with interpretability by reducing the number of participated attributes in each episode. We demonstrate that the proposed method achieves results on par with black-box few-shot-learning models on four widely used datasets. To further close the performance gap with the black-box models, we propose a mechanism that trades interpretability for accuracy. It automatically detects the episodes where the provided humanfriendly attributes are not adequate, and compensates by engaging learned unknown attributes.
Few-shot learning aims at recognizing new instances from classes with limited samples. This challenging task is usually alleviated by performing meta-learning on similar tasks. However, the resulting models are black-boxes. There has been growing concerns about deploying black-box machine learning models and FSL is not an exception in this regard. In this paper, we propose a method for FSL based on a set of human-interpretable concepts. It constructs a set of metric spaces associated with the concepts and classifies samples of novel classes by aggregating concept-specific decisions. The proposed method does not require concept annotations for query samples. This interpretable method achieved results on a par with six previously state-of-the-art black-box FSL methods on the CUB fine-grained bird classification dataset.
Recent advances in machine learning have brought opportunities for the ever-increasing use of AI in the real world. This has created concerns about the black-box nature of many of the most recent machine learning approaches. In this work, we propose an interpretable neural network that leverages metric and prototype learning for classification tasks. It encodes its own explanations and provides an improved case-based reasoning through learning prototypes in an embedding space learned by a probabilistic nearest neighbor rule. Through experiments, we demonstrated the effectiveness of the proposed method in both performance and the accuracy of the explanations provided.
The advent of recent high throughput sequencing technologies resulted in unexplored big data of genomics and transcriptomics that might help to answer various research questions in Parkinson’s disease (PD) progression. While the literature has revealed various predictive models that use longitudinal clinical data for disease progression, there is no predictive model based on RNA-Sequence data of PD patients. This study investigates how to predict the PD Progression for a patient’s next medical visit by capturing longitudinal temporal patterns in the RNA-Seq data. Data provided by Parkinson Progression Marker Initiative (PPMI) includes 423 PD patients without revealing any race, sex, or age information with a variable number of visits and 34,682 predictor variables for 4 years. We propose a predictive model based on deep Recurrent Neural Network (RNN) with the addition of dense connections and batch normalization into RNN layers. The results show that the proposed architecture can predict PD progression from high dimensional RNA-seq data with a Root Mean Square Error (RMSE) of 6.0 and a rank-order correlation of (r = 0.83, p < 0.0001) between the predicted and actual disease status of PD.
In many scenarios, human decisions are explained based on some high-level concepts. In this work, we take a step in the interpretability of neural networks by examining their internal representation or neuron’s activations against concepts. A concept is characterized by a set of samples that have specific features in common. We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes. While the previous methods focus on the importance of a concept to a task class, we go further and introduce four measures to quantitativTITLEely determine the order of causality. Moreover, we propose a method for constructing a hierarchy of concepts in the form of a conceptbased decision tree which can shed light on how various concepts interact inside a neural network towards predicting output classes. Through experiments, we demonstrate the effectiveness of the proposed method in explaining the causal relationship between a concept and the predictive behaviour of a neural network as well as determining the interactions between different concepts through constructing a concept hierarchy.
Growing concerns regarding the operational usage of AI models in the real-world has caused a surge of interest in explaining AI models’ decisions to humans. Reinforcement Learning is not an exception in this regard. In this work, we propose a method for offering local explanations on risk in reinforcement learning. Our method only requires a log of previous interactions between the agent and the environment to create a state-transition model. It is designed to work on RL environments with either continuous or discrete state and action spaces. After creating the model, actions of any agent can be explained in terms of the features most influential in increasing or decreasing risk or any other desirable objective function in the locality of the agent. Through experiments, we demonstrate the effectiveness of the proposed method in providing such explanations
The population is aging, and becoming more tech-savvy. The United Nations predicts that by 2050, one in six people in the world will be over age 65 (up from one in 11 in 2019), and this increases to one in four in Europe and Northern America. Meanwhile, the proportion of American adults over 65 who own a smartphone has risen 24 percentage points from 2013-2017, and the majority have Internet in their homes. Smart devices and smart home technology have profound potential to transform how people age, their ability to live independently in later years, and their interactions with their circle of care. Cognitive health is a key component to independence and well-being in old age, and smart homes present many opportunities to measure cognitive status in a continuous, unobtrusive manner. In this article, we focus on speech as a measurement instrument for cognitive health. Existing methods of cognitive assessment suffer from a number of limitations that could be addressed through smart home speech sensing technologies. We begin with a brief tutorial on measuring cognitive status from speech, including some pointers to useful open-source software toolboxes for the interested reader. We then present an overview of the preliminary results from pilot studies on active and passive smart home speech sensing for the measurement of cognitive health, and conclude with some recommendations and challenge statements for the next wave of work in this area, to help overcome both technical and ethical barriers to success.
In many scenarios, human decisions are explained based on some high-level concepts. In this work, we take a step in the interpretability of neural networks by examining their internal representation or neuron’s activations against concepts. A concept is characterized by a set of samples that have specific features in common. We propose a framework to check the existence of a causal relationship between a concept (or its negation) and task classes. While the previous methods focus on the importance of a concept to a task class, we go further and introduce four measures to quantitatively determine the order of causality. Through experiments, we demonstrate the effectiveness of the proposed method in explaining the relationship between a concept and the predictive behaviour of a neural network.
We propose a differentiable loss function for learning an embedding space by minimizing the upper bound of the leave-one-out classification error rate of 1-nearest neighbor classification error in the latent space. To evaluate the resulting space, in addition to the classification performance, we examine the problem of finding subclasses. In many applications, it is desired to detect unknown subclasses that might exist within known classes. For example, discovering subtypes of a known disease may help develop customized treatments. Analogous to the hierarchical clustering, subclasses might exist on different scales. The proposed method provides a mechanism to target subclasses in different scales.
—In many real-world scenarios, data from multiple modalities (sources) are collected during a development phase. Such data are referred to as multiview data. While additional information from multiple views often improves the performance, collecting data from such additional views during the testing phase may not be desired due to the high costs associated with measuring such views or, unavailability of such additional views. Therefore, in many applications, despite having a multiview training data set, it is desired to do performance testing using data from only one view. In this paper, we present a multiview feature selection method that leverages the knowledge of all views and use it to guide the feature selection process in an individual view. We realize this via a multiview feature weighting scheme such that the local margins of samples in each view are maximized and similarities of samples to some reference points in different views are preserved. Also, the proposed formulation can be used for cross-view matching when the view-specific feature weights are pre-computed on an auxiliary data set. Promising results have been achieved on nine real-world data sets as well as three biometric recognition applications. On average, the proposed feature selection method has improved the classification error rate by 31% of the error rate of the state-of-the-art.
Language is one the earliest capacities affected by cognitive change. To monitor that change longitudinally, we have developed a web portal for remote linguistic data acquisition, called Talk2Me, consisting of a variety of tasks. In order to facilitate research in different aspects of language, we provide baselines including the relations between different scoring functions within and across tasks. These data can be used to augment studies that require a normative model; for example, we provide baseline classification results in identifying dementia. These data are released publicly along with a comprehensive open-source package for extracting approximately two thousand lexico-syntactic, acoustic, and semantic features. This package can be applied arbitrarily to studies that include linguistic data. To our knowledge, this is the most comprehensive publicly available software for extracting linguistic features. The software includes scoring functions for different tasks.
Fingerprint has been extensively used for biometric recognition around the world. However, fingerprints are not secrets and an adversary can synthesis a fake finger to spoof the biometric system. The mainstream of the current fingerprint spoof detection methods are basically binary classifier trained on some real and fake samples. While they perform well on detecting fake samples created by using the same methods used for training, their performance degrades when encountering fake samples created by a novel spoofing method. In this paper, we approach the problem from a different perspective by incorporating ECG. Compare with the conventional biometrics, stealing someone’s ECG is far more difficult if not impossible. Considering that ECG is a vital signal and motivated by its inherent liveness, we propose to combine it with a fingerprint liveness detection algorithm. The combination is natural as both ECG and fingerprint can be captured from fingertips. In the proposed framework, ECG and fingerprint are combined not only for authentication purpose but also for liveness detection. We also examine automatic template updating using ECG and fingerprint. In addition, we propose a stopping criterion that reduces the average waiting time for signal acquisition. We have performed extensive experiments on LivDet2015 database which is presently the latest available liveness detection database and compare the proposed method with six liveness detection methods as well as twelve participants of LivDet2015 competition. The proposed system has achieved a liveness detection EER of 4.2% incorporating only 5 seconds of ECG. By extending the recording time to 30 seconds, liveness detection EER reduces to 2.6% which is about 4 times better than the best of six comparison methods. This is also about 2 times better than the best results achieved by participants of LivDet2015 competition.
ECG and TEOAE are among the physiological signals that have attracted significant interest in biometric community due to their inherent robustness to replay and falsification attacks. However, they are time-dependent signals and this makes them hard to deal with in across-session human recognition scenario where only one session is available for enrollment. This paper presents a novel feature selection method to address this issue. It is based on an auxiliary dataset with multiple sessions where it selects a subset of features that are more persistent across different sessions. It uses local information in terms of sample margins while enforcing an across-session measure. This makes it a perfect fit for aforementioned biometric recognition problem. Comprehensive experiments on ECG and TEOAE variability due to time lapse and body posture are done. Performance of the proposed method is compared against seven state-of-the-art feature selection algorithms as well as another six approaches in the area of ECG and TEOAE biometric recognition. Experimental results demonstrate that the proposed method performs noticeably better than other algorithms.
The objective of a continuous authentication system is to continuously monitor the identity of subjects using biometric systems. In this paper, we proposed a novel feature extraction and a unique continuous authentication strategy and technique. We proposed One-Dimensional Multi-Resolution Local Binary Patterns (1DMRLBP), an online feature extraction for one-dimensional signals. We also proposed a continuous authentication system, which uses sequential sampling and 1DMRLBP feature extraction. This system adaptively updates decision thresholds and sample size during run-time. Unlike most other local binary patterns variants, 1DMRLBP accounts for observations’ temporal changes and has a mechanism to extract one feature vector that represents multiple observations. 1DMRLBP also accounts for quantization error, tolerates noise, and extracts local and global signal morphology. This paper examined electrocardiogram signals. When 1DMRLBP was applied on the University of Toronto database (UofTDB) 1,012 single session subjects database, an equal error rate (EER) of 7.89% was achieved in comparison to 12.30% from a state-of-the-art work. Also, an EER of 10.10% was resulted when 1DMRLBP was applied to UofTDB 82 multiple sessions database. Experiments showed that using 1DMRLBP improved EER by 15% when compared with a biometric system based on raw time-samples. Finally, when 1DMRLBP was implemented with sequential sampling to achieve a continuous authentication system, 0.39% false rejection rate and 1.57% false acceptance rate were achieved.
Patents
M. Komeili, N. Armanfard, D. Hatzinakos , “An Expert System for Fingerprint Spoof Detection”, International application number CA2019050141, Patent Cooperation Treaty (PCT), Feb. 2019,
N. Armanfard, M. Komeili, J. P. Reilly, John F. Connolly , “Expert System for Automatic, Continuous Coma Patient Assessment and Outcome Prediction”, U.S. Provisional Patent, USPTO serial no. 62/509,986, May 2017,
Contact details
613-520-2600 ext. 6098
majidkomeili@cunet.carleton.ca
5422 Herzberg Laboratories, Carleton University 1125 Colonel By Drive, Ottawa Ontario, Canada, K1S 5B6
