Publications | Meshal Alharbi

2024

Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge

Meshal Alharbi

SM Thesis, Massachusetts Institute of Technology, 2024

Abs HTML

The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this thesis, we study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently. We focus on systems that evolve according to an additive disturbance model where the underlying dynamics are described by a deterministic function of states and actions, along with an unknown additive disturbance that is independent of states and actions. In the setting of finite Markov decision processes, we present an optimistic Q-learning algorithm that achieves Õ(√T) regret without polynomial dependency on the number of states and actions under perfect knowledge of the dynamics function. This is in contrast to the typical Õ(√SAT) regret for existing Q-learning methods. Further, if only a noisy estimate of the dynamics function is available, our method can learn an approximately optimal policy in a number of samples that is independent of the cardinalities of state and action spaces. The sub-optimality gap depends on the approximation error of the noisy estimate, as well as the Lipschitz constant of the corresponding optimal value function. Our approach does not require modeling of the transition probabilities and enjoys the same memory complexity as model-free methods.
Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge

Meshal Alharbi, Mardavij Roozbehani, and Munther A. Dahleh

The 38th Annual AAAI Conference on Artificial Intelligence, 2024

Abs HTML PDF

The problem of sample complexity of online reinforcement learning is often studied in the literature without taking into account any partial knowledge about the system dynamics that could potentially accelerate the learning process. In this paper, we study the sample complexity of online Q-learning methods when some prior knowledge about the dynamics is available or can be learned efficiently. We focus on systems that evolve according to an additive disturbance model of the form $S_{h + 1} = f (S_{h}, A_{h}) + W_{h}$ , where $f$ represents the underlying system dynamics, and $W_{h}$ are unknown disturbances independent of states and actions. In the setting of finite episodic Markov decision processes with $S$ states, $A$ actions, and episode length $H$ , we present an optimistic Q-learning algorithm that achieves $\tilde{O} (Poly (H) \sqrt{T})$ regret under perfect knowledge of $f$ , where $T$ is the total number of interactions with the system. This is in contrast to the typical $\tilde{O} (Poly (H) \sqrt{S A T})$ regret for existing Q-learning methods. Further, if only a noisy estimate $\hat{f}$ of $f$ is available, our method can learn an approximately optimal policy in a number of samples that is independent of the cardinalities of state and action spaces. The sub-optimality gap depends on the approximation error $\hat{f} - f$ , as well as the Lipschitz constant of the corresponding optimal value function. Our approach does not require modeling of the transition probabilities and enjoys the same memory complexity as model-free methods.

2023

Identifying Symbolic Communication in Simulated Teacher-Student Environment by Bayesian Modeling

Abdulrahman Alabdulkareem, Meshal Alharbi, Noor Almazroa, Boris Katz, and Andrei Barbu

Preprint, 2023

Abs PDF

Symbolic communication is an inherent and intuitive aspect of the human experience. In this paper, we propose, implement, and run inference on a probabilistic Bayesian model for identifying symbolic communication. We focused on a recently proposed simulated teacher-student environment where we have access to human data. We show several qualitative and quantitative results that compare our model with human judgments. These results suggest that our approach is reasonably effective at identifying symbolic communication with adequate accuracy. We utilize the Gen probabilistic programming framework for the implementation of our model.

2021

Electricity Non-Technical Loss Detection: Enhanced Cost-Driven Approach Utilizing Synthetic Control

Meshal Alharbi, Saud Alghumayjan, Mansour Alsaleh, Devavrat Shah, and Ahmad Alabdulkareem

In IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), 2021

Abs HTML PDF

This paper proposes a new cost-driven approach for detecting non-technical loss (NTL) of electricity in a resolution-constrained setting. NTLs are caused by fraudulent behavior by customers; they are reported to cost $96 billion annually to utility companies. With the global adoption of smart meters still in its early stage, with 14% market penetration, many utility companies must detect NTLs from low-resolution signals. Our proposed method optimizes for the expected economic return. It employs a synthetic control approach and ensemble boosting model that jointly outperform state-of-the-art support vector machine and random forest methods described in the literature. We also used a class-imbalance-agnostic precision-recall metric to validate our approach under various conditions. The whole analysis was conducted using a subset of a dataset of customer accounts from a large utility company that serves a population of over 30 million people. Our proposed method was tested by the utility company and initial results show $\sim$ 75% precision in detecting new NTL cases.
Next-day Electricity Demand Forecast: A New Ensemble Recommendation System Using Peak and Valley

Bader Alaskar, Abdullah Alhadlaq, Meshal Alharbi, Saud Alghumayjan, Ahmad Alabdulkareem, Mansour Alsaleh, and Devavrat Shah

In IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), 2021

Abs HTML PDF

Electricity demand forecast plays a major role in the planning and resource allocation phase of utility companies. In particular, predicted peak and valley (PaV) demand points seems critical, as they determine the maximum required generation capacity and baseload to meet the minimum underlying demand, respectively. In this paper, we propose multiple techniques to enhance day-ahead forecasting models by leveraging independent daily PaV predictors to ensemble short-term electricity demand forecasters. These ensemble techniques are then incorporated into a novel ensemble recommendation system (ERS). The ERS suggests the most appropriate ensemble technique to enhance the day-ahead predictor's performance while minimizing the computation required for testing multiple ensemble algorithms, relative to a single ensemble algorithm. This approach aims to improve the PaV forecasting and to enhance the overall accuracy of the day-ahead forecaster and it can be used with any combination of forecasting models. We demonstrate the effectiveness of our approach through a case study using a time-series prediction database model (tspDB) and a deep neural network (DNN) model for predicting the demand of the next day. The results show an improvement of 33% and 12% in the mean absolute percentage error of the forecasted PaV points using the tspDB and DNN models, respectively, as well as, enhancement in the overall day-ahead forecast.

2019

Automatic Modulation Classification: Investigation for Millimeter Wave Over Fiber Channels

M Alharbi, A Alhuseini, A Ragheb, M Altamimi, T Alshawi, and S Alshebeili

IEEE Photonics Technology Letters, 2019

Abs HTML PDF

With the rapid development of intelligent communication systems, classical problems, such as automatic modulation classification (AMC), have gained extensive research interest. This is due to the significant role that AMC plays in many civilian and military applications. In this letter, we consider AMC for millimeter wave-over-fiber (MMWoF) communication. This type of communication is of practical interest because it enables centralized analysis and processing, taking the advantages of low transmission loss of MMW signals over fiber optic channels. In this letter, we use autoencoder neural networks for automatic features extraction and classification, preceded by a pre-processing step applied to the samples of the input signal. The performance of the system under consideration has been thoroughly investigated by simulation and verified experimentally under different impairments, including fiber chromatic dispersion and amplified spontaneous emission noise. The results are presented in terms of the probability of correct classification for different values of optical signal-to-noise ratio and different lengths of fiber channels. The results from simulation are in good match to those obtained experimentally.