Events

Departmental events are listed below. Please see the left column for other special event pages.

April 9th, 2025 - Gansen Deng's PhD thesis public lecture

Supervisors: Dr. Wenqing He & Dr. Dinesh Kumbhare
Time: April 9th, 2025, 9:30 AM - 10:30 AM

Location: 深夜福利站 Science Centre 248

Title: Statistical Learning Methods for Challenges Arising from Self-Reported Data, with Applications to Chronic Pain Studies

Abstract: 

This thesis focuses on developing advanced clustering methods and analyzing data arised from chronic pain (CP) studies, with a particular emphasis on the unique challenges posed by self-reported (SR) data. Latent class analysis (LCA) is explored in the early stages of this work to cluster patients, and the clusters are compared to find features that are significantly different among clusters. While LCA is effective for categorical variables, it fails to address the mixed data types and subjective biases inherent in SR data. To overcome these limitations, we propose a novel distance metric tailored specifically for SR questionnaire data. This distance incorporates the correlation distance with other elementary distances for clustering data of mixed type, which outperforms existing metrics in handling mixed data when SR variables are present. Additionally, interpretable clustering techniques are utilized to generate simple, actionable rules that can be applied in clinical practice.

To integrate the domain knowledge of CP experts into the clustering process, a semi-supervised clustering algorithm is introduced, allowing the distance metric to be adjusted using pairwise constraints provided by CP experts. We develop a two-step active learning query strategy to identify and query the most informative patient cases, enhancing query efficiency and minimizing the number of interactions required between experts and the algorithm.

In addition to clustering, we analyze data arised from CP studies and explore predictive modeling. Canonical correlation analysis (CCA) is applied to investigate relationships among CP measurements, revealing important connections between pain characteristics and psychological factors. Furthermore, multiple classification models are used to predict nociplastic pain, and the best cut of each predictor is investigated using the prediction model.

Overall, we made significant contributions to the field of CP studies by introducing novel methods for clustering CP patients and analyzing complex data relationships. The proposed approaches emphasize clinical applicability, interpretability, and the integration of domain knowledge, offering practical solutions for real-world challenges in CP management. These advancements provide a foundation for further exploration of personalized treatment strategies and an improved understanding of chronic pain mechanisms.

April 8th, 2025 - Dr. Patrick Brown's Talk

Time: April 8th, 2025, 2:30 PM - 3:30 PM
Location: 深夜福利站 Interdisciplinary Research Building 1170
Speaker: Prof. Patrick Brown - Department of Statistical Sciences, The University of Toronto

Dr. Patrick Brown's research focuses on models and inference methodologies for spatio-temporal data, motivated by problems in spatial epidemiology and the environmental sciences. Current statistical methods research involves Bayesian inference for non-Gaussian spatial data, and non-parameteric methods for spatially aggregated and censored locations.

Title: Daily air pollution and mortality: putting it all together

Abstract: Quantifying the short term health effects of air pollution is a task with many steps: uncertainty and gap-filling in exposures, adjusting for temporal dependence and trends in health outcomes, creating flexible yet stable models for exposure-response functions, and combining data from multiple cities.  In collaboration with Health Canada, our group has created a comprehensive and innovative methodology for this problem, with the last pieces currently being put together.  The core of the methodology is the case-crossover model, which has a non-standard likelihood requiring specialised model-fitting tools.  The latest development is the creation of a hierarchical functional model, which improves on the current 'separate and combine' method by producing stable estimates in smaller cities.  A near-monotone smoothing model is next on the agenda and preliminary results will be shown.

April 3rd, 2025 - Xinyi Zeng's PhD thesis proposal public lecture

Supervisors: Dr. Shu Li
Time: April 3rd, 2025, 9:30 AM - 10:10 AM
Location: 深夜福利站 Science Centre 187

Title: The structure of the number of claims until first passage times

Abstract: This thesis focuses on the first-passage problem in insurance risk models, and mainly we extend the traditional ruin theory and two-sided exit problem by examining the corresponding discrete features and their applications. We study the structural characteristics of three critical discrete random variables of interest in the risk process. For instance, under the compound Poisson risk model, we prove that the distribution of the number of claims until first up-crossing time is of a compound Poisson structure, where the primary parameter and secondary distribution are identified explicitly. We illustrate the computational advantage of the identified structures using numerical examples. We subsequently conduct a parallel analysis in a discrete-time model. Our study shows how such a discrete setting offers valuable insights while preserving theoretical connections to the continuous-time framework, particularly regarding the distributional structures of the number of claims. The discrete-time model provides not only theoretical elegance but also computational advantages for practical implementation. By generalizing to a discrete-time Sparre Anderson risk model, we further enhance the applicability of our results. Based on these foundational structures, our research moves towards the practical applications by introducing a novel two-sided risk measure. By expanding beyond traditional ruin-based risk measures, our methodology provides a deeper understanding on risk assessment and risk management.

March 25th, 2025 - The second Graduate Colloquium

Date: March 25th, 2025
Time: 2:30-3:30 p.m. 
Location: Kresge Building 106 (K106)
Colloquium Chair: Rika Fitriani

The talks are given in the alphabetic order of the speakers' last names.

1. Jiaxuan Lu - Supervisor: Dr. Hyukjun Gweon
Title: Random k conditional nearest neighbor for high-dimensional data
Abstract: The k-nearest neighbor (kNN) algorithm is simple and effective for classification, with many variants developed based on the kNN. One of the limitations of kNN is that the method may be less effective when data contains many noisy features due to their non-informative influence in calculating distance. Additionally, information derived from nearest neighbors may be less meaningful in high-dimensional data.  To address the limitation of nearest-neighbor based approaches in high-dimensional data, we extend the k conditional nearest neighbor (kCNN) method, an effective kNN variant. The proposed approach aggregates multiple kCNN classifiers, each built from a randomly sampled feature subset, and assigns weights using a score metric based on the level of separation of the feature subsets. We evaluate its properties through simulations, and experiments on gene expression datasets demonstrate its predictive performance.
 
2. Nathaniel Phelps - Supervisors: Dr. Douglas Woolford and Dr. Dan Lizotte
Title: Bias in Decision Trees for Imbalanced Classification
Abstract: There is a widespread and longstanding belief that machine learning models are biased towards the majority (or negative) class when learning from imbalanced data, leading them to neglect or ignore the minority (or positive) class. However, we demonstrate that this belief is not necessarily correct for decision trees, and that their bias can actually be in the opposite direction. First, we conduct a simulation study that shows that decision trees overpredict the number of positive cases in imbalanced datasets, and that this overprediction tends to increase as the imbalance increases. We then prove that, under specific conditions related to the predictors, decision trees fit to purity and trained on a dataset with only one positive case are biased towards the minority class. Our findings have implications on the use of popular tree-based models, such as random forests.
 
3. Zixuan Yang - Supervisor: Dr. Douglas Woolford
Title: Short-term Forecasting of Fire Occurrence in Ontario, Canada
Abstract: This talk presents our project related to the short-term forecasting (1 – 4 days ahead) of human-caused wildland fire occurrence in the Province of Ontario, Canada. We first demonstrate that incorporating the forecasting errors in the Fine Fuel Moisture Code (FFMC)—which is a key predictor for human-caused fire occurrence—leads to more accurate estimators of model parameters compared to a model that simply uses the FFMC forecasts as a predictor itself. That method uses an errors in variables approach where the forecasting error distribution is represented by normal mixtures, and modified simulation-extrapolation (SIMEX) is used for model fitting. Through simulation, we demonstrate that ignoring the measurement errors in a predictor result in biased forecasts, while the SIMEX estimators lead to improvements. However, this approach was not spatial. We also present ongoing work that is extending this errors in variables approach to develop methodology for spatially explicit, fine scale short-term fire occurrence forecasting. We divide a larger study region in Ontario into a set of fine-scale spatial cells and develop spatially explicit models for short-term forecasts of daily human-caused fire occurrences that predict not only the expected number of fires 1 – 4 days into the future, but also quantify the uncertainty of such forecasts by producing prediction intervals and produce colour-coded maps that highlight areas where such fires are more likely to occur.
 
4. Xiaotian Zhu - Supervisors: Dr. Lars Stentoft and Dr. Mark Reesor
Title: Solving Financial Stochastic Optimal Control Problems using Regression-based Monte Carlo Methods
Abstract: Stochastic optimal control plays an important role in financial decision-making problems, such as portfolio optimization, asset-liability management, and options pricing. However, solving such problems is challenging due to high-dimensional state spaces and complex market dynamics. This talk presents the Regression-based Monte Carlo method and its extensions as effective approaches to approximate the value function and optimal controls. Emphasis is placed on applications of dynamic portfolio optimization and the valuation of American-style options where some numerical results are provided. The talk concludes with discussions on multiple controls, convergence properties, and potential research directions in exploring financial stochastic control problems.

March 14th, 2025 - Gansen Deng's PhD thesis proposal public lecture

Supervisors: Dr. Wenqing He & Dr. Dinesh Kumbhare
Time: March 14th, 2025, 10:00 AM - 11:00 AM

Location: (Waiting room activated. No passcode. After the public lecture, except the examination committee, all other participants are asked to leave the room so that the examination can start)

Title: Statistical Learning Methods for Challenges Arised from Self-Reported Data, with Applications to Chronic Pain Studies

Abstract: This thesis focuses on developing advanced clustering methods and analyzing data arised from chronic pain (CP) studies, with a particular emphasis on the unique challenges posed by self-reported (SR) data. Latent class analysis (LCA) is explored in the early stages of this work to cluster patients, and the clusters are compared to find features that are significantly different among clusters. While LCA is effective for categorical variables, it fails to address the mixed data types and subjective biases inherent in SR data. To overcome these limitations, we propose a novel distance metric tailored specifically for SR questionnaire data. This distance incorporates the correlation distance with other elementary distances for clustering data of mixed type, which outperforms existing metrics in handling mixed data when SR variables are present. Additionally, interpretable clustering techniques are utilized to generate simple, actionable rules that can be applied in clinical practice. To integrate the domain knowledge of CP experts into the clustering process, a semi-supervised clustering algorithm is introduced, allowing the distance metric to be adjusted using pairwise constraints provided by CP experts. We develop a two-step active learning query strategy to identify and query the most informative patient cases, enhancing query efficiency and minimizing the number of interactions required between experts and the algorithm.In addition to clustering, we analyze data arised from CP studies and explore predictive modeling. Canonical correlation analysis (CCA) is applied to investigate relationships among CP measurements, revealing important connections between pain characteristics and psychological factors. Furthermore, multiple classification models are used to predict nociplastic pain, and the best cut of each predictor is investigated using the prediction model. Overall, we made significant contributions to the field of CP studies by introducing novel methods for clustering CP patients and analyzing complex data relationships. The proposed approaches emphasize clinical applicability, interpretability, and the integration of domain knowledge, offering practical solutions for real-world challenges in CP management. These advancements provide a foundation for further exploration of personalized treatment strategies and an improved understanding of chronic pain mechanisms.

March 7th, 2025 - FM Power Hour - Dr. Florian Bourgey's Talk

Time: March 7th, 2025, 11:00 AM - 12:00 PM
Location: 深夜福利站 Science Centre 248
Speaker: Dr. Florian Bourgey - Bloomberg

Dr. Florian Bourgey is a researcher in the Quantitative Research team in the Office of the CTO at Bloomberg in New York. His research focuses on Monte Carlo simulations, stochastic approximations, climate risk, volatility modeling, and machine learning. He holds a Ph.D. in applied mathematics from Ecole Polytechnique, France.

Title: Smile Dynamics and Rough Volatility

Abstract: We investigate the dynamic properties of various stochastic, and notably rough, volatility models, with an emphasis on the dynamics of implied volatilities. While recent literature has extensively analyzed static properties, such as a model's calibration power or the term structure of at-the-money skews, dynamic features have received less attention. We focus on the Skew-Stickiness Ratio (SSR), an industry-standard indicator of joint spot price and implied volatility dynamics, pursuing the analysis of [Bergomi, Smile Dynamics IV, Risk, 2009] and extending it to rough volatility models. Using different numerical estimators, we compare the behavior of the SSR generated by several models (not limited to the affine framework) with the empirical market SSR estimated for the SPX Index. Interestingly, we observe that different forward variance models---two-factor Bergomi, rough Bergomi, rough Heston, Heston - calibrated as best as possible to the same SPX smile generate SSR that (i) are close to one another, and (ii) display significant deviations from market data, failing to reproduce the term structure observed for the empirical SSR. These observations suggest a certain rigidity within the stochastic volatility family under consideration and indicate that rough volatility alone does not significantly alter the joint spot-implied volatility dynamics.

March 4th, 2025 - Dr. Silvana Pesenti's Talk

Time: March 4th, 2025, 1:30 PM - 2:30 PM
Location: 深夜福利站 Interdisciplinary Research Building 1170
Speaker: Prof. Silvana Pesenti - Department of Statistical Sciences, The University of Toronto

Dr. Silvana Pesenti is an Assistant Professor in Insurance Risk Management at the Department of Statistical Sciences at the University of Toronto. She was named the 2022 Rising Star in Quant Finance by Risk.net. She received the 2020 Peter Clark Best Paper Prize from the Institute and Faculty of Actuaries (IFoA). In 2019, she was awarded the  Dorothy Shoichet Women Faculty Science Award of Excellence. She is an Associate Editor of Computational and Applied Mathematics (since 2022) an Associate Editor of Annals of Actuarial Science (since 2023), and on the Editorial Board of Applied Mathematical Finance (since 2023) and of ASTIN Bulletin (since 2023).

Title: Risk budgeting allocation for dynamic risk measures

Abstract: We define and develop an approach for risk budgeting allocation - a risk diversification portfolio strategy - where risk is measured using a dynamic time-consistent risk measure. For this, we introduce a notion of dynamic risk contributions that generalise the classical Euler contributions and which allow us to obtain dynamic risk contributions in a recursive manner. We prove that, for the class of coherent dynamic distortion risk measures, the risk allocation problem may be recast as a sequence of strictly convex optimisation problems. Moreover, we show that self-financing dynamic risk budgeting strategies with initial wealth of 1 are scaled versions of the solution of the sequence of convex optimisation problems. Furthermore, we develop an actor-critic approach, leveraging the elicitability of dynamic risk measures,  to solve for risk budgeting strategies using deep learning.

February 7th, 2025 - FM Power Hour - Dr. Junhe Chen's Talk

Time: February 7th, 2025, 11:00 AM - 12:00 PM
Location: 深夜福利站 Science Centre 248
Speaker: Dr. Junhe Chen, Associate Director, Risk Modelling at RBC

Dr. Junhe Chen completed the PhD in Financial Modeling at 深夜福利站 Universtiy in 2021. His research areas are differential games in energy finance and portfolio optimization. After his graduation, he worked at CIBC, BMO as two contractors, and is mainly working on counterparty credit risks (CCR) and fundamental review of the trading books (FRTB) — default risk capital (DRC) at RBC. 

Title: CCR Simulation and Exposure Generation

Abstract: Counterparty credit risk (CCR) is the risk that the counterparty to a transaction could default before the final settlement of the transaction's cash flows. This talk is to introduce the CCR and its implementation. A Hull-White model for interest rates and a Geometric Brownian model for FX rates are used to simulate the future IR and FX risk factors. With those simulated risk factors, the future price (or mark to market) of trades can be calculated based on the Swap/FX pricing models and hence the future potential exposure (PFE) can be generated.

December 20th, 2024 - Sherly Paola Alfonso S谩nchez's PhD thesis proposal public lecture

Supervisors: Dr. Cristián Bravo and Dr. Kristina Sendova
Time: December 20th, 2:00 PM - 2:40 PM

Location: (Waiting room activated. No passcode. After the public lecture, except the examination committee, all other participants are asked to leave the room so that the examination can start)

Title: Artificial Intelligence in Banking and Insurance: Optimizing Credit Limit Adjustments with Reinforcement Learning, Multi-Treatment Selection via Causal Inference, and Negotiation Pricing Analysis with a Fairness Approach

Abstract:  In this thesis, I develop a series of methodologies to improve financial management in the lending and insurance industry. I first examine innovative methodologies for addressing whether to increase or maintain a customer’s credit line and by what factor if an increase is granted. These decisions impact both customers and the company’s capital for covering expected losses. To automate an optimal policy for credit limit adjustments, I formulated this as an optimization problem that maximizes expected profit while balancing risk. Using historical data from a Latin American super-app, I trained a reinforcement learning (RL) agent through offline simulations. The results indicate that a Double Q-learning agent with optimized hyperparameters can outperform other strategies and establish a robust decision-making framework based on data-driven methods. I also explore alternative data for balance prediction, finding that such data does not always improve accuracy.

My second work, using now a diverse dataset, I framed the credit limit increase decision as a treatment selection problem, where treatments represent the increase factor and controls maintain the current limit. I included causal effect estimation to compare potential outcomes under different treatments, addressing the inadequacy of relying solely on individual treatment effects. By incorporating uncertainty measured by conditional value-at-risk and prioritizing treatments that lead to favorable post-treatment outcomes, I propose a comprehensive methodology for multitreatment selection. This approach ensures the overlap assumption is satisfied by training propensity score models before employing traditional causal models, significantly enhancing policy performance.

In the last part of the thesis, I aim to explore disparities in healthcare service pricing in light of recent regulations aimed at increasing transparency. The Transparency in Coverage (TiC) Rule and the Hospital Price Transparency Rule require health insurers and hospitals to disclose in-network negotiated rates and out-of-network allowed amounts, creating an opportunity to investigate whether these prices vary across demographic factors such as income, location, and education. By integrating healthcare pricing data, the General Social Survey (GSS), and social network analysis, I propose to determine whether significant differences exist in the cost of medical services across different regions and income levels.

In summary, this thesis delves into how modern AI and ML enhance financial service management, specifically by making tasks like credit limit adjustment more data-driven. It develops a multi-treatment selection methodology applicable to this problem. directly applicable to credit limit modifications. Finally, through multimodal analysis, it examines the nature of current healthcare negotiation pricing in the U.S.

December 9th, 2024 - Johanna de Haan-Ward's PhD public lecture

Supervisors: Dr. Simon Bonner and Dr. Douglas Woolford
Time: December 9th, 12:30 PM - 1:30 PM

Location: 深夜福利站 Science Centre 256

Title: Predicting Rare Events from Large Spatiotemporal Data: Application to Wildland Fires and Species Occupancy

Abstract: 

Subsampling of large data is commonly employed in statistical modelling with the goal of efficiency. When the event being modelled is rare, the data is imbalanced and thus sampling methods focus on preferentially subsampling the observations which represent those rare event occurrences. This thesis extends methodology for the subsampling of large data when modelling rare events, motivated by applications in environmetrics and ecology.

The first two projects present extensions to response-based sampling. The response-based sampling approach takes independent samples of event occurrence and non-occurrence, often sampling all occurrences and a small proportion of the non-occurrences. I propose a stratified sampling approach, which defines strata based on a key variable. Independent samples of occurrences and non-occurrences are then sampled from each stratum. The bias induced by this sampling must be accounted for in the logistic regression model. The first project employs sampling weights in the logistic to account for the bias induced by this sampling design. The second project instead uses stratum-specific offsets to the same end, which now allows for the model to include multiple predictors. These approaches are validated using simulation, where they are compared to existing approaches for sampling imbalanced data. I apply these methods to fine-scale human-caused fire occurrence prediction in a region of Ontario, Canada where stratifying on a measure of fire weather and sampling more extreme observations leads to more locally precise estimates of fire occurrence.

The third project presents a novel method for subsampling species detection data to fit occupancy models. When a species is rarely detected, the number of detections will be far outnumbered by the non-detections. I propose a response-based sampling method for species detection data, which allows preferential sampling of the rarer detection observations. I present a method for estimating occupancy and detection probabilities of the subsampled data, as the assumptions of traditional occupancy models no longer hold. I apply this method to detection data of Canada Warbler (Cardellina canadensis) from the Breeding Bird Survey, where we can accurately estimate the occupancy and detection parameters using just 10% of the original dataset, including estimating the effects of a habitat-related covariate.

December 5th, 2024 - Sahab Zandi's PhD thesis proposal public lecture

Supervisors: Dr. Cristián Bravo and Dr. María Óskarsdóttir
Time: December 5th, 9:00 AM - 10:00 AM

Location: 深夜福利站 Science Centre 256

Title: Deep Learning Methodologies for Complex Network-Driven Credit Risk Assessment

Abstract: In the past few years, the field of risk management has increasingly turned its attention towards enhancing the efficacy of credit scoring models. This shift has been characterized by a growing emphasis on integrating advanced machine learning methodologies and tapping into non-traditional data sources. Building on this momentum, this study is dedicated to exploring the optimal utilization of network data to bolster the benefits of credit scoring practices. The primary aim of this study is to broaden the existing body of knowledge concerning the development of sophisticated credit scoring models. Specifically, it seeks to investigate the incorporation of network data into these models through the application of machine learning techniques and social network analysis. The first project introduces an innovative model for assessing credit risk in the context of behavioral scoring. This model combines Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs) to account for borrower connections and their evolution over time, offering a dynamic perspective on creditworthiness. The second project presents a novel method for estimating credit risk among Small and Medium-sized Enterprises (SMEs) within the framework of application scoring. This multimodal model, leveraging both unstructured multilayer network data and traditional structured data, provides a holistic approach to understanding SME credit risk. Finally, the third project, which is ongoing, unveils a new approach to model explainability in credit risk modeling using Large Language Models (LLMs).

November 29th, 2024 - Dr. Devan Becker's Talk

Speaker: Prof. Devan Becker - Department of Mathematics, Wilfrid Laurier University
Time: November 29th, 2024, 2:30 PM - 3:30 PM

Location: 深夜福利站 Interdisciplinary Research Building 1170

Title: Estimating Variants of Concern Without Knowledge of Variants of Concern: Statistical Modelling and Clustering of COVID-19 Wastewater Data

Abstract: Current methods for wastewater-based epidemiology use clinical sequences to define Variants of Concern (VOCs), then these VOCs are used to detect signals in wastewater. The definitions found by clinical sequencing are inherently imperfect for wastewater, which is a problem that is compounded by the reduction in clinical sequencing. In this work, I develop Bayesian and algorithmic unsupervised clustering methods that determine definitions of VOCs according to temporally consistent mutation patterns while simultaneously estimating the prevalence of those VOCs. Additional analyses determine partial definitions of VOCs (i.e., using a small subset of mutations), which addresses the issue of VOC definitions being imperfect for wastewater detection.

November 26th, 2024 - The first Graduate Colloquium

Date: November 26th, 2024
Time: 2:30-3:30 p.m. 
Location: Kresge Building 106 (K106)
Colloquium Chair: Diba Daraei

The talks are given in the alphabetic order of the speakers' last names.

1. Rika Fitriani - Supervisors: Dr. Ricardas Zitikis
Title: Developing Inclusive Insurance in Indonesia: From Theoretical Foundations to Data Analyses
Abstract: This study explores the development of inclusive insurance in Indonesia, focusing on how economic inequality impacts insurance accessibility for low-income populations. Inclusive insurance aims to be affordable and relevant for economically vulnerable groups, covering essential risks such as health emergencies and natural disasters. The analysis highlights the significance of the Gini index and quantile-based inequality indices as tools for assessing economic disparities, which are critical for designing targeted insurance products that effectively address the needs of economically vulnerable populations. Using empirical data from Indonesia’s National Socioeconomic Survey, the study illustrates how these inequality metrics can guide inclusive insurance strategies.
 
2. Yiyao Jiao - Supervisors: Dr. Marcos Escobar-Anel
Title: Integrating Environmental, Social, and Governance (ESG)  Preferences into Investment Decisions: A Multi-Attribute Approach
Abstract: This presentation showcases the latest research findings from two papers authored by Marcos Escobar-Anel and Yiyao Jiao. The focus of both papers is on ESG modeling, utilizing a shared analytical framework. The first paper primarily delves into multi-attribute utility and optimal portfolio allocation within the context of ESG. The second paper extends this framework by incorporating ambiguity aversion into ESG modeling and explores robust portfolio optimization techniques within this domain. These papers introduce an analytical framework designed to assist investors in integrating ESG preferences into their investment strategies. Through a combination of theoretical analyses and empirical investigations, the studies highlight the benefits and practicality of adopting such approaches.
 
3. Sanghyun Jung - Supervisors: Dr. Cristian Bravo
Title: Finding paths to financial wellness
Abstract: The decision-making process of individuals is recognized as highly complex, influenced by a myriad of factors including demographics, income, macroeconomic conditions, and previous decisions. Traditional research often models this process by focusing on one or a limited number of these factors, which falls short of capturing the complexity of reality and fails to predict counterfactual outcomes of decision-making pathways accurately. With the rapid advancement of deep learning, there is now potential to leverage extensive datasets to more accurately model complex investment decision-making processes. However, deep learning models often struggle with discerning potential outcomes along specific decision paths. In response, our research introduces a novel causal deep learning model utilizing a Causal Transformer architecture. This model is designed to estimate and interpret the causal impact of investment decisions. Utilizing data such as Know Your Client (KYC) profiles, account transactions, and macroeconomic indicators, we analyze which factors significantly influence decision-making paths. Moreover, by exploring counterfactual scenarios, we determine which decision paths could lead to desired financial wellness outcomes for individuals. This approach enhances our understanding of complex decision dynamics and aids in the practical application of achieving targeted financial objectives.
 
4. Yao Li - Supervisors: Dr. Katsu Goda
Title: Hail Hazard Modeling with Uncertainty Analysis and Roof Damage Estimation of Residential Buildings in North America
Abstract: This research presents a statistical approach for hail risk modeling that incorporates the uncertainties of hail model prediction to provide insight into assessing the roof damage of a residential house in hail events. By quantifying the inherent uncertainties in evaluating hailstorm characteristics, this study extends the current existing hail models. The hail data are sourced from the Community Collaborative Rain, Hail and Snow Network (CoCoRaHS) in the U.S. In the modeling process, the largest hail diameter reported in the CoCoRaHS database serves as a primary input variable to estimate the number of observations for the largest hail diameter, hailstorm duration, and hit rate. The assessment of hail risk in this study focuses on the probability of hail damage and resultant repair costs for five types of roofs in North America (unrated roof and impact-resistant roofs with UL 2218 rating classes 1 to 4). The probability of hail damage is calculated as the failure probability by integrating all individual hailstone hits having variable diameters during a hailstorm with fragility curves, which estimate the probability that hailstones will fracture asphalt shingles (allowing water infiltration) or that they dislodge enough granules to cause visible damage requiring replacement for aesthetic reasons. The results reveal that an impact-resistant roof (impact-resistant rating classes 1 to 4) is associated with lower hail risks, with 60 % to 98 % reduction on average compared to unrated roofs. This study provides a comprehensive uncertainty modeling approach for hail hazard and risk, enabling better-informed decision-making and risk management strategies.

November 20th, 2024 - Peiheng Gao's PhD thesis proposal public lecture

Supervisors: Dr. Ricardas Zitikis
Time: November 20th, 10:00 AM - 10:45 AM

Location: 深夜福利站 Science Centre 248

Title: Anomaly detection and model construction with the focus on natural language processing in consumer complaint textual analysis

Abstract: Detecting anomalies, which could be good, bad and ugly, plays an important role in business decision-making. The detection relies on the information received, and it comes in numerical and textual forms. Much has been done in quantitative analysis of claim amounts and frequencies, but claim texts and related textual (qualitative) data are also of paramount importance. Sometimes, it is the only available source of information. In this project we concentrate on machine learning (ML) tools to detect anomalies and utilize the acquired information for the betterment of business decision-making. We also discuss the necessary for our research tools, with references to earlier studies and uses on ML in insurance and related fields.

Based on this foundation, we provide detailed analyses of textual data using complaint narratives from the Consumer Complaint Database of the Consumer Financial Protection Bureau (CFPB). A procedure is developed for detecting systematic non-meritorious consumer complaints, simply called systematic anomalies, among complaint narratives. Based on this procedure, we convert complaint narratives into quantitative data, which are then analyzed using indices to detect systematic anomalies. The research further explores involving expert opinions on classification errors to improve the performance of predicting classification errors. Finally, this research examines how diverse evaluation metrics could influence the performance of predicting classification errors.

November 18th, 2024 - CAST Seminar: Dr. Frank Harrell

This seminar is organized by CANSSI Ontario in partnership with the Department of Statistical and Actuarial Sciences at 深夜福利站. 

Nathaniel Lewis Phelps, our PhD student, is the moderator. 

The event is online via zoom. Faculty and students can watch the seminar together in 深夜福利站 Science Centre 248 on November 18th, 2024, 3:30-4:30 PM.

To join online, please .

October 29th, 2024 - FM Power Hour - Dr. Ryan Ferguson's Talk

Time: October 29th, 2024, 1:30 PM - 2:30 PM
Location: 深夜福利站 Science Centre 248
Speaker: Dr. Ryan Ferguson

Ryan is Founder and CEO at Riskfuel, a capital markets focused AI startup. Previously, Ryan was Managing Director and Head of Securitization, Credit Derivatives and XVA at Scotiabank. Prior roles have included credit correlation trading and managing the equity derivatives trading desk. Ryan began his career with positions in risk management and financial engineering. Ryan has a PhD in Physics from Imperial College, and a BASc and MASc in Electrical Engineering from the University of Waterloo.

Title: Riskfuel: Accelerating valuation and risk sensitivity calculations in the capital markets - Replacing slow numerical solvers with fast neural network inferencing

Abstract: Riskfuel uses traditional numerical solvers to train deep neural networks via the creation of large datasets. Once trained, a network can return accurate results millions of times faster than the traditional approach. This talk will cover the motivation for fast pricing and the various approaches available to accelerate pricing. We will show how neural network inferencing leverages many of these approaches simultaneously, including function approximation, hardware specialization (GPUs), and adjoint algorithmic differentiation. Examples from exotic derivatives pricing and portfolio optimization will be used to demonstrate the technique.

October 29th, 2024 - Yuhao (Jet) Zhou's PhD thesis proposal public lecture

Supervisors: Dr. Cristián Bravo Roman and Dr. Matt Davison
Time: October 29th, 2024, 1:00 PM - 1:45 PM

Location: North Campus Building 240B

Title: The financial consequences of social capital in the boardroom

Abstract: This dissertation explores the complex relationships between corporate board diversity, professional networks, and financial outcomes in North American firms, focusing on S\&P 1500-listed companies. Through a comprehensive analysis spanning multiple aspects of corporate governance, the research provides a deeper understanding of how diversity and social capital influence board composition, economic performance, and corporate behavior.

The study begins by examining trends in gender and ethnic diversity on corporate boards from 2007 to 2021. Using a bootstrapping methodology, the analysis reveals significant improvements in diversity over time but also highlights a disconnect between workforce diversity and boardroom representation. This suggests that industry-specific factors significantly shape board composition, with workforce diversity not directly translating to board diversity.

Building on this, the dissertation investigates the economic implications of diverse boards, particularly their impact on bond issuance costs. The findings indicate that firms with boards reflecting the demographic diversity of their workforce tend to incur lower underwriting fees. However, increasing female or ethnic-minority representation beyond a certain threshold does not significantly affect borrowing costs, emphasizing the complexity of the relationship between board diversity and financial outcomes.

The research further explores the role of professional networks in board appointments, focusing on gender disparities. Using social network analysis combined with deep learning models, the study reveals that women face substantial challenges in achieving board positions, needing to build broader and more influential networks than their male counterparts. The findings underscore the importance of female-to-female networks in promoting gender diversity, suggesting a pathway to overcoming existing barriers.

Finally, the dissertation examines how boardroom networks influence corporate credit ratings. By integrating multi-layer board network data with traditional financial metrics, the research demonstrates that the structure and strength of director networks can significantly impact a firm’s credit rating. This highlights the critical role of social capital in financial decision-making and corporate governance.

Overall, this dissertation provides a comprehensive analysis of how diversity and networking dynamics affect corporate boards and their associated financial outcomes. It offers valuable insights for policymakers, corporate leaders, and stakeholders aiming to enhance board diversity and improve corporate governance practices.

October 17th, 2024 - Ana Carolina da Cruz's PhD thesis proposal public lecture

Supervisors: Dr. Camila de Souza
Time: October 17th, 2024, 10:00 AM - 10:45 AM

Location: Middlesex College 204

Title: Bayesian methods for clustering change-point data and functional data analysis

Abstract: Technological advancements have made high-dimensional data, such as multiple sequences of change-point data and functional data, increasingly available. However, the complexity of such data presents significant challenges in the data analysis, necessitating efficient and reliable methodologies. Bayesian methods, which integrate prior knowledge and manage model complexity, are commonly used in data analysis. Variational inference methods, in particular, are becoming increasingly popular for estimating Bayesian models due to their efficiency and low computational cost. In this thesis proposal, I introduce my research contribution towards three novel Bayesian methods addressing different types of high-dimensional data. In the first project,  I introduce a nonparametric Bayesian model via a Dirichlet process prior to cluster sequences of observations based on their constant-wise change-point profiles via a Gibbs sampler. In the second project, I develop a variational EM algorithm for basis function selection in functional data representation accounting for within-curve correlation. Finally, the third project focuses on variable selection in functional regression, particularly scalar-on-function regression. I propose a variational Bayes algorithm for SoFR using Bernoulli latent variables to enable variable selection.

October 8th, 2024 - FM Power Hour - Dr. Sebastian Ferrando's Talk

Time: October 8th, 2024, 1:30 PM - 2:30 PM
Location: 深夜福利站 Science Centre 248
Speaker: Dr. Sebastian Ferrando - Professor, Department of Mathematics, Toronto Metropolitan University.

Title: Agent-Based Models for Two Stocks with Superhedging

Abstract: An agent-based modelling methodology for the joint price evolution of two stocks is introduced. The method models future multidimensional price trajectories reflecting how a class of agents rebalance their portfolios in an operational way by reacting to how stocks’ charts unfold. Prices are expressed in units of a third stock that acts as numeraire. The methodology is robust, in particular, it does not depend on any prior probability or analytical assumptions and it is based on constructing scenarios/trajectories. A main ingredient is a superhedging interpretation that provides relative superhedging prices between the two modelled stocks. The operational nature of the methodology gives objective conditions for the validity of the model and so implies realistic risk-rewards profiles for the agent’s operations. Superhedging computations are performed with a dynamic programming algorithm deployed on a graph data structure. The superhedging algorithm handles null sets in a rigorous and intuitive way.

September 24th, 2024 - FM Power Hour - Dr. Letitia Golubitsky's Talk

Time: September 24th, 2024, 11:30 AM - 12:30 PM
Location: 深夜福利站 Science Centre 248
Speaker: Dr. Letitia Golubitsky
 
Letitia-Golubitsky.jpg
 
Dr. Letitia Golubitsky is a mathematician by training and an experienced professional in the financial industry holding a PhD in Mathematics from 深夜福利站, a Master of Science in Mathematics from Queen's University and a Master of Science in Financial Mathematics from McMaster University. Letitia started her career as a quantitative developer in Model Development at TD bank developing in-house mathematical models for Counterparty Credit Risk. Later, she joined CIBC Model Development as a senior quantitative developer building in-house commodity models for Counterparty Credit Risk and Market Risk. Letitia has worked for the past seven years in Model Validation at Bank of Montreal and  Scotiabank as a senior specialist and senior manager responsible for vetting commodity models and  pricing derivatives used by the trading desks.  Dr. Letitia Golubitsky is the Chair of the Canadian Artificial Intelligence 2022 conference leading the Industry track. Dr. Letitia was elected to the Queen's University Council Executive Committee for a four year term.

Title: Introduction to Counterparty Credit Risk Modelling

Abstract: Counterparty risk is traditionally thought of as credit risk between derivatives counterparties. Since the credit risk crisis of 2008 and the failure of major institutions such as Lehman Brothers, Fannie Mae and Freddie Mac, counterparty risk has been considered by most market participants to be the key financial risk. In this talk I will introduce various counterparty exposure metrics with an emphasis on Monte Carlo simulations of risk factors models based on stochastic differential equations. Model calibration for most of the models relies on sophisticated optimization methods and an accurate estimation of the model parameters is key to estimating the exposure metrics with the counterparty.

September 20th, 2024 - Chengqian Xian's PhD public lecture

Supervisors: Dr. Camila de Souza, Dr. Wenqing He, and Dr. Felipe Rodrigues
Time: September 20th, 2024, 1:30 PM - 2:30 PM

Location: (Waiting room activated. No passcode. After the public lecture, except the examination committee, all other participants are asked to leave the room so that the examination can start.)

Title: Variational Bayesian inference for functional data clustering and survival data analysis

Abstract: Variational Bayesian inference is a method to approximate the posterior distribution under a Bayesian model analytically. As an alternative to Markov Chain Monte Carlo (MCMC) methods, variational inference (VI) produces an analytical solution to an approximation of the posterior but have a lower computational cost compared to MCMC methods. The main challenge of applying VI comes from deriving the equations used to update the approximated posterior parameters iteratively, especially when dealing with complex data. In this thesis, we apply the VI to the context of functional data clustering and survival data analysis. The main objective is to develop novel VI algorithms and investigate their performance under these complex statistical models.

In functional data analysis, clustering aims to identify underlying groups of curves without prior group membership information. The first project in this thesis presents a novel variational Bayes (VB) algorithm for simultaneous clustering and smoothing of functional data using a B-spline regression mixture model with random intercepts. The deviance information criterion is employed to select the optimal number of clusters.

The second project shifts focus to survival data analysis, proposing a novel mean-field VB algorithm to infer parameters of the log-logistic accelerated failure time (AFT) model. To address intractable calculations, we propose and incorporate a piecewise approximation technique into the VB algorithm, achieving Bayesian conjugacy.

The third project is motivated by invasive mechanical ventilation data from intensive care units (ICUs) in Ontario, Canada, which form multiple clusters. We assume that patients within the same ICU cluster are correlated. Extending the second project's methodology, a shared frailty log-logistic AFT model is introduced to account for intra-cluster correlation through a cluster-specific random intercept. A novel and fast VB algorithm for model parameter inference is presented.

Extensive simulation studies assess the performance of the proposed VB algorithms, comparing them with other methods, including MCMC algorithms. Applications to real data, such as ICU ventilation data from Ontario, illustrate the methodologies' practical use. The proposed VB algorithms demonstrate excellent performance in clustering functional data and analyzing survival data, while significantly reducing computational cost compared to MCMC methods.

September 17th, 2024 - Yu Shi's PhD public lecture

Supervisors: Dr. Grace Yi
Time: September 17th, 2024, 12:30 PM - 1:30 PM

Location: (Waiting room activated. No passcode. After the public lecture, except the examination committee, all other participants are asked to leave the room so that the examination can start.)

Title: Conditional Dependence Learning of Noisy Data under Graphical Models

Abstract: Graphical models are useful tools for characterizing the conditional dependence among variables with complex structures. While many methods have been developed under graphical models, their validity is vulnerable to the quality of data. A fundamental assumption associated with most available methods is that the variables need to be precisely measured. This assumption is, however, commonly violated in reality. In addition, the frequent occurrence of missingness in data exacerbates the difficulties of estimation within the context of graphical models. Ignoring either mismeasurement or missingness effects in estimation procedures can yield biased results, and it is imperative to accommodate these effects when conducting inferences under graphical models. In this thesis, we address challenges arising from noisy data with measurement error or missing observations within the framework of graphical models for conditional dependence learning.

The first project addresses mixed graphical models applied to data involving mismeasurement in discrete and continuous variables. We propose a mixed latent Gaussian copula graphical measurement error model to describe error-contaminated data with mixed continuous and discrete variables. To estimate the model parameters, we develop a simulation-based expectation-maximization method that incorporates the measurement error effects. Furthermore, we devise a computationally efficient procedure to implement the proposed method. The asymptotic properties of the proposed estimator are established, and the finite sample performance of the proposed method is evaluated by numerical studies.

In contrast to analyzing error-prone variables in the first project, we further examine variables that are susceptible to not only mismeasurement but also missingness. In the second project, we examine noisy data that are subject to both error-contamination and incompleteness, in which we focus on the Ising model designed for learning the conditional dependence structure among binary variables. We extend the conventional Ising model using additional layers of modeling to describe data with both misclassification and missingness. To estimate the model parameters with the misclassification and missingness effects accommodated simultaneously, we develop a new inferential procedure by utilizing the strength of the insertion correction strategy and the inverse probability weighted method. To facilitate the sparsity of the graphical model, we further employ the regularization technique, and accommodate for a class of penalty functions, including widely-used penalty functions such as Lasso, SCAD, MCP, and HT penalties. To broaden the applicability scope, we investigate settings with both fixed and diverging dimensions of the variables, and moreover, we rigorously establish the asymptotic properties of the proposed estimators, with associated regularity conditions identified.

The third project deepens the second one by accommodating mixed variables subject to both mismeasurement and missingness. Unlike the first two projects that focus on a single dataset, we consider the availability of auxiliary datasets from related studies, along with the target study dataset, where data are subjected to missingness. From the measurement error perspective, the target and auxiliary datasets can be regarded as accurate and error-contaminated measurements for the variables of interest, respectively. To describe the conditional dependence relationships among variables, we explore mixed graphical models characterized by the exponential family distributions. Moreover, leveraging the transfer learning strategy, we propose an inferential procedure that accommodates missingness effects to enhance the estimation of the model parameters pertinent to the target study using the information from auxiliary datasets. We rigorously establish theoretical properties for the proposed estimators and evaluate the finite sample performance of the proposed methods through numerical studies.

This thesis contributes new methodologies to address challenges arising from the presence of noisy data with mismeasurement or missing values. The proposed methods broaden the application of graphical models for learning complex conditional dependency among variables of various nature.

September 16th, 2024 - Wei Li Fan's PhD thesis proposal public lecture

Supervisors: Dr. Marcos Escobar-Anel
Time: September 16th, 2024, 12:15 PM - 1:00 PM

Location: 深夜福利站 Science Centre 187

Title: Enhancing portfolio investment strategies through CEV-related frameworks

Abstract: This thesis explores portfolio optimization through four related models: our newly proposed LVO-CEV and SEV-SV models, as well as the established CEV and M-CEV models. The study starts by introducing and investigating a new type of Constant Elasticity of Volatility (CEV) model, titled LVO-CEV. We prove that the stochastic differential equations of the LVO-CEV model could exhibit strong or weak solutions contingent upon the elasticity parameter. Additionally, the model offers closed-form solutions under hyperbolic absolute risk aversion (HARA) utilities, enhancing tailored investment strategies. Empirical comparisons with other models validate its efficacy in real-world scenarios. Subsequently, the thesis addresses ambiguity aversion among utility-maximizing investors, employing both the LVO-CEV and standard CEV models. This investigation is ongoing and will be completed for the final defense. Furthermore, the thesis extends its examination to include ambiguity aversion within the framework of a Modified Constant-Elasticity-of-Volatility (M-CEV) model, on the underlying asset. Through this exploration, we derive closed-form solutions of a non-affine nature for the optimal asset allocation and the value function, leveraging a Cauchy problem approach. This analysis represents a significant advancement, as we extend existing research to the presence of ambiguity while also accommodating Hyperbolic Absolute Risk Aversion utility (HARA). Lastly, we introduce and analyze a very general and novel family of diffusion models for stock prices, with direct applications in portfolio optimization. This innovative model (SEV-SV) integrates stochastic elasticity of volatility (SEV) with stochastic volatility (SV). Emphasis is placed on the SEV component, driven by an Ornstein-Uhlenbeck process with two distinct functional choices, while the SV component utilizes the 4/2 model. This endeavor yields closed-form solutions for optimal strategy, value function, and optimal wealth process, elucidating two distinct scenarios regarding prices of risk associated with the stock and highlighting the models' applicability in real-world investment contexts.

September 12th, 2024 - Pingbo Hu's PhD public lecture

Supervisors: Dr. Grace Yi
Time: September 12th, 2024, 9:00 AM - 10:00 AM

Location: 深夜福利站 Science Centre 248


Title: Statistical Learning of Noisy Data: Classification and Causal Inference with Measurement Error and Missingness


Abstract: Causal inference and statistical learning have made significant advancements in various fields, including healthcare, epidemiology, computer vision, information retrieval, and language processing. Despite numerous methods, research gaps still remain, particularly regarding noisy data with features such as missing data, censoring, and measurement errors, etc. Addressing the challenges presented by noisy data is crucial to reduce bias and enhance statistical learning of such data. This thesis tackles several issues in causal inference and statistical learning that are related to noisy data.

The first project addresses causal inference about longitudinal studies with bivariate responses, focusing on data with missingness and censoring. We decompose the overall treatment effect into two separable effects, each mediated through different causal pathways. Furthermore, we establish identification conditions for estimating these separable treatment effects using observed data. Subsequently, we employ the likelihood method to estimate these effects and derive hypothesis testing procedures for their comparison.

In the second project, we tackle the problem of detecting cause-effect relationships between two sets of variables, formed as two vectors. Although this problem can be framed as a binary classification task, it is prone to mislabeling of causal relationships for paired vectors under the study - an inherent challenge in causation studies. We quantify the effects of mislabeled outputs on training results and introduce metrics to characterize these effects. Furthermore, we develop valid learning methods that account for mislabeling effects and provide theoretical justification for their validity. Our contributions present reliable learning methods designed to handle real-world data, which commonly involve label noise.

The third project extends the research in the second project by exploring binary classification with noisy data in the general framework. To scrutinize the impact of different types of label noise, we introduce a sensible way to categorize noisy labels into three types: instance-dependent, semi-instance-independent, and instance-independent noisy labels. We theoretically assess the impact of each noise type on learning. In particular, we quantify an upper bound of bias when ignoring the effects of instance-dependent noisy labels and identify conditions under which ignoring semi-instance-independent noisy labels is acceptable. Moreover, we propose correction methods for each type of noisy label.

Contrasting with the third project that focuses on classification with label noise, the fourth project examines binary classification with mismeasured inputs. We begin by theoretically analyzing the bias induced by ignoring measurement error effects and identify a scenario where such an ignorance is acceptable. We then propose three correction methods to address the mismeasured input effects, including methods leveraging validation data and modifications to the loss function using regression calibration and conditional expectation. Finally, we establish theoretical results for each proposed method.

In summary, this thesis explores several interesting problems in causal inference and statistical learning concerning noisy data. We contribute new findings and methods to enhance our understanding of the complexities induced by noisy data and provide solutions to address them.

September 10th, 2024 - Dr. Laura Cowen's Talk

Speaker: Dr. Laura Cowen - Professor, The University of Victoria
Time: September 10th, 2024, 2:30 PM - 3:30 PM

Location: 深夜福利站 Science Centre 248

Title: Disease analytic models with applications to estimating undetected COVID-19 cases

Abstract: Even with daily case counts, the true scope of the COVID-19 pandemic in Canada is unknown due to undetected cases. We develop a novel multi-site disease analytics model which estimates undetected cases using discrete-valued multivariate time series in the framework of Bayesian hidden Markov modelling techniques. We apply our multi-site model to estimate the pandemic scope using publicly available disease count data including detected cases, recoveries among detected cases, and total deaths. These counts are used to estimate the case detection probability, the probability of recovery, and several important population parameters including the rate of spread, and importation of external cases. We estimate the total number of active COVID-19 cases per region of Canada for each reporting interval. We applied this multi-site model Canada-wide to all provinces and territories, providing an estimate of the total COVID-19 burden for the 90 weeks from 23 Apr 2020 to 10 Feb 2022. We also applied this model to the five Health Authority regions of British Columbia, Canada, describing the pandemic in B.C. over the 31 weeks from 2 Apr 2020 to 30 Oct 2020.

August 28th, 2024 - Duo Xu's PhD thesis proposal public lecture

Supervisors: Dr. Shu Li
Time: August 28th, 2024, 8:30 AM - 9:10 AM
Location: 深夜福利站 Science Centre 248

Title: Drawdown-dependent surplus analysis and applications

Abstract: This thesis mainly focus on the drawdown-dependent surplus analysis and its applications in various espects. In Section 2, we proposed a new fee structure for the drawdown insurance and analyze the fair market premium under the fee structre, we further examine the optimal terminations for the policyholder when a cancellation feature is allowed in the insurance contract, and make the comparison between the two premium structures. In Section 3, we extend our analysis to encompass the joint Laplace transform concerning the first-passage time, occupation time, and local time within the drawdown process. Under spectral negative L\'evy process, we provided expressions for the joint Laplace transforms of these three components in scenarios involving both two-sided exit and one-sided exit. Sections 4 and 5 delve into the realm of equity-linked insurance, with a focus on valuation and associated considerations. In Section 4, we present valuation outcomes concerning guaranteed minimum death benefit (GMDB) and guaranteed minimum mature benefit (GMMB) under a state-dependent fee structure. In section 5, we delve into the surrender option available to policyholders, allowing them to opt out at any point before maturity. We proceed by addressing the fair valuation problem associated with equity-linked insurance in light of this surrender option. We then tackle the optimization of surrender decisions by establishing specific surrender criteria and employing Hamilton-Jacobi-Bellman (HJB) equations to derive optimal solutions.

August 26th, 2024 - Yuan Bian's PhD public lecture

Supervisors: Dr. Grace Y. Yi & Dr. Wenqing He
Time: August 26th, 2024, 9:15 AM - 10:00 AM
Location: 深夜福利站 Science Centre 248

Title: Statistical inference and learning with incomplete data

Abstract:

Incomplete data commonly arise in applications, and research on this topic has received extensive attention over the past few decades. Numerous inference methods have been developed to address various issues related to incomplete data, such as different types of missing observations and distinct missing data mechanisms, which are often classified as missing completely at random, missing at random, and missing not at random. However, research gaps still remain.

Assessing a plausible missing data mechanism is typically difficult due to the lack of validation data, and the presence of spurious variables in covariates further complicates the challenge. Prediction in the presence of incomplete data is another area worth exploring. By utilizing newly emerging techniques, we explore new avenues in the analysis of incomplete data. This thesis aims to contribute fresh insights into statistical inference within the context of incomplete data and provide valid methods to address a few existing research gaps.

Focusing on missingness in the response variable, the first project proposes a unified framework to address the effects of missing data. By leveraging the generalized linear model to facilitate the dependence of the response on associated covariates, we develop concurrent estimation and variable selection procedures using regularized likelihood. We rigorously establish the asymptotic properties of the resultant estimators. The proposed methods offer flexibility and generality, eliminating the need to assume a specific missing data mechanism -- a requirement in most available methods. Empirical studies demonstrate the satisfactory performance of the proposed methods in finite sample settings. Furthermore, the project outlines extensions to accommodate missingness in both the response and covariates.

The second problem of interest approaches missing data from a different perspective by placing it within the framework of statistical machine learning, with a specific emphasis on exploring boosting techniques; two projects are generated accordingly. Despite the increasing attention gained by boosting, many advancements in this area have primarily focused on numerical implementation procedures, with relatively limited theoretical work. Moreover, existing boosting approaches are predominantly designed to handle datasets with complete observations, and their validity is hampered by the presence of missing data. In this thesis, we employ semiparametric estimation approaches to develop unbiased boosting estimation methods for data with missing responses. We investigate several strategies to account for the missingness effects. The proposed methods are implemented using the functional gradient descent algorithm and are justified by the establishment of theoretical properties, including convergence and consistency of the proposed estimators. Numerical studies confirm the satisfactory performance of the proposed methods in finite sample settings.

The third topic further explores different boosting procedures in the context of interval censored data, where the exact observed value for the response variable is unavailable but only known to fall within an interval. Such data commonly arise in survival analysis and fields involving time-to-events, and they present a unique challenge in data analysis. In this project, we develop boosting methods for both regression and classification problems with interval censored data. We address the censoring effects by adjusting the loss functions or imputing transformed responses. The proposed methods are implemented using a functional gradient descent algorithm, and we rigorously establish their theoretical properties, including mean squared error tradeoffs and the optimality of the proposed estimators. Numerical studies are conducted to assess the performance of the proposed methods in finite sample settings.

August 26th, 2024 - Dr. John Braun's Talk

Speaker: Dr. John Braun - Professor, The University of British Columbia
Time: August 26th, 2024, 3:00 PM - 4:00 PM
Location: 深夜福利站 Science Centre 248

Title: Iterated Data Sharpening for Local Polynomial Regression

Abstract: Data sharpening in kernel regression has been shown to be an effective method of reducing
 bias while having minimal effects on variance. Earlier efforts to iterate the data sharpening 
procedure have been less effective, due to the employment of an inappropriate sharpening
 transformation. In the present talk, an iterated data sharpening algorithm is described which
 reduces the asymptotic bias at each iteration, while having modest effects on the variance. The efficacy of the iterative approach is demonstrated
 theoretically and via a simulation study. Boundary effects persist and the affected region successively
 grows when the iteration is applied to local 
 constant regression. By contrast, boundary bias successively decreases for each iteration step when applied to local linear regression. After iteration, the resulting estimates
 are less sensitive to bandwidth choice, and a further simulation study demonstrates that iterated
 data sharpening with data-driven
 bandwidth selection via cross-validation can lead to more accurate regression function estimation. 
 Examples with real data are used to illustrate the scope of change made possible
 by using iterated data sharpening and to also identify its limitations.

(Based on joint work with Hanxiao Chen of Boston University and Xiaoping Shi of UBC)

August 26th, 2024 - Dr. Severien Nkurunziza's Talk

Speaker: Dr. Severien Nkurunziza - Professor, The University of Windsor
Time: August 26th, 2024, 4:00 PM - 5:00 PM
Location: 深夜福利站 Science Centre 248

Title: On Robust Inference In Some Mean-Reverting Processes With Change-Points

Abstract: In this talk, we consider inference problem concerning the drift parameter in generalized mean-reverting processes with unknown change-points. We also consider the scenario where the target parameter is suspected to satisfy some restrictions. We generalize some recent findings in five ways. First, the established method incorporates the uncertain prior knowledge. Second, we derive the unrestricted estimator (UE) and the restricted estimator (RE) as well as their asymptotic properties. Third, we propose a test for testing the hypothesized restrictions and we establish its asymptotic power. Fourth, we construct a class of shrinkage estimators (SEs) which includes as special cases the UE, RE, and classical SEs. Fifth, we study the asymptotic risk performance of the proposed class of SEs, and we prove that James-Stein type estimators dominate the UE. On the top of these interesting findings, the additional novelty of the derived results consists in the fact that the dimensions of the proposed estimators are random. Because of that, the asymptotic power of the proposed test and the asymptotic risk analysis do not follow from classical results in statistical literature. To overcome this problem, we establish an asymptotic result which is useful in its own.