Proprietary Data Projects

This repo lists tasks and projects involving proprietary data (i.e., data that cannot be fully disclosed to the public) and provide valuable insights for respective organizations and employers.

Audit at Great Canadian Casino Resort Toronto (August 2022 - August 2024)

Utilized Excel and accounting software such as Ballys to verify financial entries by gaming floor personnel. However, discrepancies often arise with reporting, for legitimate reasons.

In this example, I will show I was able to harness Excel to help streamline the verification process. In this scenario, the Tables department reports certain amount of money for some tables, but the system reports $0 for said tables, most likely due to glitches, which are not unusual. In that case, I would download both the system and Tables department report as csv's, and compare both. The values and table names are made-up to maintain the proprietary nature of the verification process. There are also only a few records present for demonstration purposes only, and does not accurately reflect the high volume of tables available on the gaming floor.

CSV from Table Games Department

Table Game	Other Columns	Amount Reported ($)
AB501	-------------	766
AB502	-------------	201
AB503	-------------	1055
CS307	-------------	94
CS308	-------------	476
CS309	-------------	877
FX702	-------------	503
KL244	-------------	3022
KL246	-------------	711
TT904	-------------	2491

CSV from System

Table Game	Other Columns	Amount Reported ($)
AB501	-------------	766
AB502	-------------	201
AB503	-------------	0
CS307	-------------	94
CS308	-------------	476
CS309	-------------	0
FX702	-------------	0
KL244	-------------	3022
KL246	-------------	711
TT904	-------------	0

New CSV

Table Game	Amount Reported, Tables	Amount Reported, System	System - Tables
AB501	766	766	`=C2-B2`
AB502	201	201	`=C3-B3`
AB503	1055	0	`=C4-B4`
CS307	94	94	`=C5-B5`
CS308	476	476	`=C6-B6`
CS309	877	0	`=C7-B7`
FX702	503	0	`=C8-B8`
KL244	3022	3022	`=C9-B9`
KL246	711	711	`=C10-B10`
TT904	2491	0	`=C11-B11`

Some rows on the fourth column will show non-zero values, which can then be easily filtered for, resulting in the tables that have discrepancies.

Back to Table of Contents

Elections Canada (August 2024 - June 2025)

Weekly Dashboard

To better present my weekly findings to senior analysts, I developed weekly dashboards that succinctly summarizes key metric related to collated content (i.e., relevant posts and comments from the Canadian digital media environment that touches on topics pertinent to the agency such as, but not limited to, democratic institutions and views on electoral systems). The following is a sample dashboard (with proprietary data being redacted in red).

Filipino Language Monitoring

I also collaborated with a senior analyst to create a Tagalog language Meltwater Boolean query to help capture posts and comments from social and traditional media platforms:


(“Halalan Canada” OR “Halalan sa Canada” OR  “Eleksyons Canada” OR “Eleksyons sa Canada” OR “Eleksyon sa Canada” OR “Komisyoner ng Canada Elections” OR “Komisyoner sa Eleksyon sa Canada” OR “Komisyoner ng Halalan sa Canada” OR “Batas sa halalan sa Canada”) 
OR 
((“Araw ng Eleksyon” OR “Pagboto sa mail” OR “Bumoto nang advance” OR “panghihimasok ng dayuhan” OR “susunod na halalan” OR “susunod na eleksyon” OR “susunod na election” OR “45th na election" OR “panghalalan” OR “45th na eleksyon" OR “45th na halalan" OR “Halalan 2025” OR “Halalan 2025”  OR “punong opisyal ng electoral” OR “punong opisyal ng Eleksyon” OR “punong opisyal ng halalan” OR “opisyal ng electoral” OR “opisyal ng Eleksyon” OR “opisyal ng halalan” OR "Araw ng Eleksyon” OR “Araw ng Halalan” OR “Petsa ng halalan” OR “Petsa ng eleksyon” OR  “Bumoto nang advance” OR “panghihimasok ng dayuhan” OR “Pangkalahatang halalan” OR “Pangkalahatang eleksyon” OR “Pederal na halalan” OR “Pederal na eleksyon” OR "Halalan" OR “Eleksyon” OR “Pederal na byelection” OR  “Pederal na by-election” OR “Paunang pagboto” OR “listahan ng mga botante” OR “lugar ng pagboto” OR “lokasyon ng Pagboto” OR “istasyon ng botohan” OR “istasyon sa pagboto” OR “Gabay sa pagboto” OR "makina sa pagboto" OR “Online na pagboto” OR "Pagboto sa mail" OR “bumoto sa pamamagitan ng koreo” OR “Ikinoreong Pagboto”  OR “balotang papel” OR “espesyal na balota”) 
AND 
("Canada" OR “Canadian” OR “Canadyano” OR “Kanadyano” OR “Winnipeg” OR “Winnipeg North” OR “Winnipeg Centre” OR “Mount Royal” OR “Scarborough Centre” OR “Eglinton—Lawrence” OR “Eglinton-Lawrence” OR “Eglinton Lawrence”  OR “York Centre” OR “Mississauga—Streetsville” OR “Mississauga-Streetsville” OR “Mississauga Streetsville” OR “Calgary Forest Lawn” OR “Edmonton West” OR “Saskatoon West” OR “New Westminster—Burnaby” OR “New Westminster-Burnaby” OR “New Westminster Burnaby” OR “New Westminster” OR “Burnaby” OR “Fleetwood—Port Kells” OR “Fleetwood-Port Kells” OR “Fleetwood Port Kells” OR “Surrey Centre” OR “Surrey” OR “Vancouver Kingsway” OR “Vancouver South”))

Note that the riding names represent the federal ridings with the highest per-capita Filipino population.

In a similar vein, I developed a Boolean query for the Ilocano language:

("Eleksion" OR “Eleksyon ti Canada” OR “Eleksion ti Canada” OR “Komisionado ti Eleksyon ti Canada” OR “Komisionado ti Eleksion ti Canada” OR “Linteg ti Eleksyon ti Canada” OR “Linteg ti Eleksion ti Canada” OR “Hepe ti Eleksyon” OR “Hepe ti Eleksion” OR “Opisial ti eleksyon” OR “Opisial ti eleksion”)
OR 
((“Aldaw ti *botos” OR “Aldaw ti Pagboto*” OR “Aldaw ti Eleksion” OR “Aldaw ti Eleksyon” OR “Aldaw ti Election” OR “Petsa ti *botos” OR “Petsa ti pagboto*” OR “Petsa ti Eleksion” OR “Petsa ti Eleksyon” OR “babaen ti koreo” OR “babaen ti koreo” OR “babaen ti koreo” OR “Balota babaen ti mail” OR “Maisapsapa a panagbutos” OR “Maisapsapa a panagbotos” OR “Panangsinga ti ganggannaet” OR “Sumampitaw” OR “Kinasoberano” OR “sumaruno nga eleksyon” OR “sumaruno nga eleksion” OR “Panagbobotos” OR “Panagbubotos” OR “Panangibotos” OR “Espesial nga eleksyon” OR “Espesial nga eleksion” OR “Nasapa nga eleksyon” OR “Nasapa nga eleksion” OR “Eleksion Heneral” OR “Heneral nga eleksion” OR “Heneral nga eleksyon" OR “Sapasap nga eleksyon” OR “Sapasap nga eleksion” OR “Pederal nga eleksion” OR “Pederal nga eleksyon" OR “Pederal nga byelection” OR “Pederal nga by-election” OR “listaan dagiti botante” OR “Giya iti panagbotos” OR “Giya iti panagbutos” OR “Makina ti panagbutos” OR “Makina ti panagbotos” OR “Makina ti pagbotosan” OR “Makina ti Dominion” OR “Online a panagbutos” OR “Online a panagbotos” OR “papel a balota” OR “espesial a balota” OR “Pilien” OR “Agpili” OR “Panagbotos” OR “Panagbutos” OR “Agboto” OR “Agbotos” OR “Agbuto” OR “Agbutos” OR “Iboto” OR “Ibotos” OR “Ibuto” OR “Ibutos” OR “Mangiboto” OR “Mangibotos” OR “Mangibuto” OR “Mangibutos” OR “Makapagboto” OR “Makapagbotos” OR “Makapagbuto” OR “Makapagbutos” OR “Nagbotosan” OR “Nagbutosan” OR “Botosak” OR “Butosak” OR “Ibotosak” OR “Ibutosak” OR “Botosam” OR “Butosam” OR “Ibotosam” OR “Ibutosam” OR “Botosna” OR “Butosna” OR “Ibotosna” OR “Ibutosna” OR “Botosmi” OR “Butosmi” OR “Ibotosmi” OR “Ibutosmi” OR “Botostayo” OR “Butostayo” OR “Ibotostayo” OR “Ibutostayo” OR “Botosyo” OR “Butosyo” OR “Ibotosyo” OR “Ibutosyo” OR “Botosda” OR “Butosda” OR “Ibotosda” OR “Ibutosda” OR “Botante” OR “Balota” OR “saan a umili” OR “haan a umili” OR “saan a makipagili” OR “haan a makipagili” OR “Istasion ti *botos” OR “Istasion ti Pagboto*” OR “Istasion a *botos” OR “Istasion a Pagboto*” OR “Presinto ti *botos” OR “Presinto ti Pagboto*” OR “Presinto a *botos” OR “Presinto a Pagboto*” OR “Lugar ti *botos” OR “Lugar ti Pagboto*” OR “Lugar a *botos” OR “Lugar a Pagboto*” OR “Lokasion ti *botos” OR “Lokasion ti Pagboto*” OR “Lokasion a *botos” OR “Lokasion a Pagboto*” OR “Kard ti botante” OR “kard ti panagbotos” OR “tarheta ti botante” OR “tarheta ti panagbotos” OR “ID ti botante” OR “ID ti panagbotos” OR “Edad ti panagbotos” OR “Edad ti panagbutos” OR “edad ti pagbotosan” OR “Ballasiw-Taaw a Botante” OR “kameng ti Parlamento” OR “Gobierno ti Canada” OR “Gobierno ti Kanada” OR “Kangrunaan a Ministro” OR “Demokrasia”) 
AND 

(“Canad?ano” OR “Kanad?ano” OR “Taga-Kanada” OR “Taga-Canada” OR “Mount Royal” OR “Saint-Laurent” OR “Saint Laurent” OR “Notre-Dame-de-Gr?ce—Westmount” OR “Notre-Dame-de-Gr?ce-Westmount” OR “Notre-Dame-de-Gr?ce” OR “Westmount” OR “York South—Weston—Etobicoke” OR “York South-Weston-Etobicoke” OR “York South” OR “Weston” OR “Etobicoke” OR “Humber River—Black Creek” OR “Humber River-Black Creek” OR “Humber River” OR “Black Creek” OR  “Willowdale” OR “Davenport” “Eglinton—Lawrence” OR “Eglinton-Lawrence” OR “Eglinton Lawrence”  OR "Scarborough Centre—Don Valley East” OR “Scarborough Centre-Don Valley East” OR “Don Valley East” OR “Scarborough Centre” OR “York Centre” OR “Winnipeg” OR “Winnipeg North” OR “Winnipeg Centre” OR “Calgary East” OR “Calgary Heritage" OR “North Vancouver—Capilano” OR “Vancouver Kingsway” OR “Vancouver South” OR “North Vancouver-Capilano” OR “North Vancouver” OR “Capilano” OR “Vancouver Fraserview—South Burnaby” OR  “Vancouver Fraserview-South Burnaby”  OR “Vancouver Fraserview” OR “South Burnaby”  OR “Burnaby”))

Note that the riding names represent the federal ridings with the highest per-capita Ilocano population or the highest Ilocano speaksers-Tagalog speakers ratio. Calculations for both the Ilocano and Tagalog riding populations were determined based on data from the following two special quantitative projects that I lead for the agency:

Back to Table of Contents

2025 Converge National Mental Health Datathon

Event details

Collaborated with a team of seven that contributed to a pipeline that involved uploading documentation onto NextCloud, using Oracle SQL to transform proprietary mental health data from across 10 different datasets (totalling 11,919,097 entries-1,702,900 records and 97 fields-spanning 6 years), then feeding processed data to Tableau to construct interactive dashboards that was presented to participating mental health organizations such as the Canadian Mental Health Association. I used the following sql script to create a copy of an organization’s database, then filter out null values and unnecessary records and fields (keep in mind that generic names have been used in place of the actual names of datasets, records, fields, variables, etc.):

--Create new table from organizational data
CREATE TABLE ORG_DATA_COPY AS SELECT * FROM ORG.INDICATOR_TABLES_JOINED;


SELECT * FROM ORG_DATA_COPY;


--Delete selected records (not needed for analysis)
DELETE FROM ORG_DATA_COPY WHERE FIELD_1 IN ('Unspecified_Record_1', 'Unspecified_Record_2', 'Unspecified_Record_3');


--Select columns that do not have null value
CREATE TABLE ORG_DATA_FILTER AS SELECT FIELD_1, UNSPECIFIED_FIELD_1, UNSPECIFIED_FIELD_2,
UNSPECIFIED_FIELD_3, UNSPECIFIED_FIELD_4, UNSPECIFIED_FIELD_5, UNSPECIFIED_FIELD_6,
UNSPECIFIED_FIELD_7, UNSPECIFIED_FIELD_8, UNSPECIFIED_FIELD_9, UNSPECIFIED_FIELD_10,
UNSPECIFIED_FIELD_11, UNSPECIFIED_FIELD_12 FROM ORG_DATA_COPY;


SELECT * FROM ORG_DATA_FILTER;


--Delete second and third columns (not quantitative)
ALTER TABLE ORG_DATA_FILTER DROP COLUMN UNSPECIFIED_FIELD_1;
ALTER TABLE ORG_DATA_FILTER DROP COLUMN UNSPECIFIED_FIELD_2;


SELECT * FROM ORG_DATA_FILTER;


---We are interested in the following variables: Unspecified Variable 1, Unspecified Variable, Unspecified Variable 3


CREATE TABLE ORG_DATA_3VAR AS SELECT "FIELD_1", "Unspecified_Variable_1", "Unspecified_Variable_2", "Unspecified_Variable_3" FROM ORG_DATA_FILTER;


SELECT * FROM ORG_DATA_3VAR;


---Rename columns for simplicity


ALTER TABLE ORG_DATA_3VAR
RENAME COLUMN "Unspecified_Variable_1" TO "Descriptive_Name_1";


ALTER TABLE ORG_DATA_3VAR
RENAME COLUMN "Unspecified_Variable_2" TO "Descriptive_Name_2";


ALTER TABLE ORG_DATA_3VAR
RENAME COLUMN "Unspecified_Variable_3" TO "Descriptive_Name_3";


SELECT * FROM ORG_DATA_3VAR;

The final database was exported as a csv file then uploaded to NextCloud to be fed to the Tableau dashboard.

To download the SQL file hosting the above sql script (maintaining the redactions): Download file

Back to Table of Contents

Object Detection Model, M2M Tech Client Company (April 2025 - July 2025)

Under M2M Tech, collaborated in a 19-member team and supported a client company in creating a pipeline to deploy an object detection machine learning model that will augment the company’s geospatial capabilities in environmental disaster mitigation, utilizing Python, PowerShell, and Label Studio.

To install Python: Install

PowerShell for Batch Uploading Images

PowerShell script for installing Label Studio:

python --version
pip --install

pip install label-studio

PowerShell script for creating batches of downloaded images (path file and file names redacted in compliance with NDA):

#### UNZIP AFTER DOWNLOADING IMAGES AS ZIP FILE ####


cd "C:\path\to\folder"  


Expand-Archive -Path "C:\path\to\folder\File-Of-Images-To-Unzip.zip" -DestinationPath "target_images"


#### SORT INTO BATCHES ####


cd "C:\path\to\folder"  


Expand-Archive -Path "File-Of-Images-To-Unzip.zip" -DestinationPath "target_images"
# Create 'batches' folder
New-Item -ItemType Directory -Path "..\folder\batches" -Force | Out-Null
# Change directory into 'fire_images'
Set-Location -Path "..\target_images\Object"
# Initialize counter
$i = 0
# --- Loop through each image in 'Object' folder and move to batch folders ---
Get-ChildItem -Path "..\Object" -File | ForEach-Object {
   
    # Calculate batch number (each batch contains 100 images)
    $batchNumber = [math]::Floor($i / 100)
   
    # Set the folder path for the batch
    $batchFolder = "../batches/batch_$batchNumber"


    # Check if the batch folder exists; if not, create it
    if (-not (Test-Path $batchFolder)) {
        New-Item -ItemType Directory -Path $batchFolder | Out-Null
    }


    # Move the current image into the appropriate batch folder
    Move-Item $_.FullName -Destination $batchFolder
   
    # Increment the counter for each file processed
    $i++
}

Python for Uploading Sample Images

Python script to select 10 random images (path file and file names redacted in compliance with NDA):

# Import dependencies


import zipfile
import os
import random
import shutil


# Set variables


zip_path = r'C:\path\to\folder\File-Of-Images-To-Unzip.zip"'  # zip file path
all_images_root = r'C:\path\to\folder\all_images'  # all_images will be folder housing unzipped images
batch_dir = r'C:\path\to\folder\batch_folder'      # folder to store randomly selected sample images
sample_size = 10


with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(all_images_root)


print(f"✅ Unzipped to '{all_images_root}'")


# Create batch folder if it doesn't exist


os.makedirs(batch_dir, exist_ok=True)


# Recursively collect all image files


image_extensions = ('.jpg', '.jpeg', '.png', '.gif')
all_images = []


for root, dirs, files in os.walk(all_images_root):
    for file in files:
        if file.lower().endswith(image_extensions):
            full_path = os.path.join(root, file)
            all_images.append(full_path)


# Check if there are enough images


sample_size = 10
if len(all_images) < sample_size:
    raise ValueError(f"Not enough images to sample {sample_size}. Found only {len(all_images)}.")


# Sample and copy the images


sampled_images = random.sample(all_images, sample_size)


for img_path in sampled_images:
    dest_path = os.path.join(batch_dir, os.path.basename(img_path))
    shutil.copy(img_path, dest_path)


print(f"Copied {sample_size} random images to '{batch_dir}'.")

Jupyter Notebook Script for Unit Testing

Unit testing was executed in a trainer notebook:

with open("test_trainer.py", "w") as f:
    f.write('''
######======Original trainer.py script (certain parts redacted)======######

from typing import Dict, Any, Optional
import torch
import torch.nn as nn
#from ..utils.logging import get_logger

def get_logger(name):
    class DummyLogger:
        def info(self, msg):
            print(f"[{name}] {msg}")
    return DummyLogger()

logger = get_logger(__name__)

class ModelTrainer:
    """Handles model training and evaluation."""

    def __init__(self, model: nn.Module, config: Dict[str, Any]):
        #####---REDACTED---#####
        logger.info("Initialized ModelTrainer")

    def train(self,
             train_loader: torch.utils.data.DataLoader,
             val_loader: torch.utils.data.DataLoader,
             num_epochs: int,
             save_path: str) -> Dict[str, float]:
        #####---REDACTED---#####
        logger.info(f"Starting training for {num_epochs} epochs")
        pass

    def evaluate(self,
                val_loader: torch.utils.data.DataLoader) -> Dict[str, float]:
        #####---REDACTED---#####
        logger.info("Starting model evaluation")
        pass

######======Script for unit testing with pytest======######

import pytest
import torch
import torch.nn as nn

# The ModelTrainer class is already defined in the previous cell


# Create dummy dataset of 10 samples, each sample containing an a row 10 random numbers, and 1 label
class DummyDataset(torch.utils.data.Dataset):
    def __len__(self):
        return 10
    def __getitem__(self, idx):
        return torch.randn(1, 10), torch.tensor(1)

# Create dummy model for testing (10 input features and 1 output)
class DummyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 1)
    def forward(self, x):
        return self.linear(x)

# Start unit testing
@pytest.fixture
def setup_trainer():
  # Give the trainer a new dummy model and an empty config
    model = DummyModel()
    config = {}
    trainer = ModelTrainer(model, config)
    return trainer, model, config

def test_init(setup_trainer):
  # Test __init__
    trainer, model, config = setup_trainer
    assert trainer.model == model
    assert trainer.config == config

def test_train_returns_none(setup_trainer):
  # Test train
    trainer, _, _ = setup_trainer
    train_loader = torch.utils.data.DataLoader(DummyDataset(), batch_size=2)
    val_loader = torch.utils.data.DataLoader(DummyDataset(), batch_size=2)
    result = trainer.train(train_loader, val_loader, num_epochs=1, save_path="model.pth")
    assert result is None

def test_evaluate_returns_none(setup_trainer):
  # Test evaluate
    trainer, _, _ = setup_trainer
    val_loader = torch.utils.data.DataLoader(DummyDataset(), batch_size=2)
    result = trainer.evaluate(val_loader)
    assert result is None
    ''')'

!pytest test_trainer.py --maxfail=1 --disable-warnings -q

Back to Table of Contents

Filipino Canadian Youth Organizations (2016 - Present)

Organization 1

As part of the organization's year-end assessment evaluating the 2022-2023 program period, I was tasked with collecting and analyzing the organization's Instagram posts posted throughout the program period, delivering valuable and actionable insights for the next program period.

Data Sheet 1: Main Sheet

638 entries (58 records x 11 fields)

Data Sheet 2: Heatmap-Posting Times

Data Sheet 3: Heatmap-Engagements by Topic

Data Sheet 4: Top 5 Lists-Engagements by Topics

Organization 2

I spearheaded and was the head of the organizing committee responsible for this university organization's inaugural 4-month competition filled with challenges.

Data Sheet 1: Participant Sign-Up Information

400 entries (25 records x 16 fields)

Data Sheet 2: Feedback Form Data, Organizing Committee Members

132 entries (12 records x 11 fields)

Google Sheet file generated from Google Form responses.

Data Sheet 3: Progression of Points (as of March)

Organization 3

To automate mass-emailing of organizations and individuals (with attached letters), while ensuring that the names of the individual recipient are embedded in both the email and documents to keep the correspondance genuine, Gmail's Mail Merge was used.

The resulting email with attachments:

Back to Table of Contents

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
2025 Converge National Datathon		2025 Converge National Datathon
M2M Tech Object Detection Model		M2M Tech Object Detection Model
images		images
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proprietary Data Projects

Table of Contents

Audit at Great Canadian Casino Resort Toronto (August 2022 - August 2024)

Elections Canada (August 2024 - June 2025)

Weekly Dashboard

Filipino Language Monitoring

2025 Converge National Mental Health Datathon

Object Detection Model, M2M Tech Client Company (April 2025 - July 2025)

PowerShell for Batch Uploading Images

Python for Uploading Sample Images

Jupyter Notebook Script for Unit Testing

Filipino Canadian Youth Organizations (2016 - Present)

Organization 1

Organization 2

Organization 3

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Proprietary Data Projects

Table of Contents

Audit at Great Canadian Casino Resort Toronto (August 2022 - August 2024)

Elections Canada (August 2024 - June 2025)

Weekly Dashboard

Filipino Language Monitoring

2025 Converge National Mental Health Datathon

Object Detection Model, M2M Tech Client Company (April 2025 - July 2025)

PowerShell for Batch Uploading Images

Python for Uploading Sample Images

Jupyter Notebook Script for Unit Testing

Filipino Canadian Youth Organizations (2016 - Present)

Organization 1

Organization 2

Organization 3

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages