Liam Bui

Senior Machine Learning Engineer

Vancouver, Canada

Summary

Liam is a Certified Analytics Professional (CAP) with over six years of experience in machine learning and data analytics. He has worked with data science, machine learning, deep learning, Computer Vision, and image processing projects from data collection and processing to model training, evaluation, and deployment. He is passionate about building products to help organizations extract insight from data, especially in agriculture and healthcare.

Languages:

English

Favorite Python Packages:

Scikit-learn, OpenCV, Keras, Tensorflow, NLTK, Pandas, Numpy, Scipy

Experience

PROFESSIONAL EXPERIENCE

Senior Machine Learning Engineer – Terramera, Vancouver, BC Jan 2018 – Present

  • Research deep learning models for object detection and semantic segmentation (Mask-RCNN, U-Net) using Python and Tensorflow to automate pest counting and disease evaluation processes
  • Develop a multispectral imaging prototype (with RaspberryPi and bandpass filters) and implement a regression model to estimate grape sugar level based on multispectral reflectance, in Python, Scikit-learn
  • Collaborate with software team to implement multispectral image processing pipelines for plant health evaluation using computer vision and machine learning techniques, including feature matching for image alignment & stitching, stereovision for depth estimation, color thresholding for segmentation, regression & tree-based models for plant trait estimation, in Python, OpenCV, and Scikit-learn
  • Implement machine learning models for drug dose-response modelling, drug synergy analysis and prediction using cheminformatics and machine learning libraries (Scikit-learn, RDKit, PubChemPy) to accelerate drug discovery process in plant health research
  • Provide advice on experimental design and implement statistical analysis pipelines with Python, Rpy2 and Statsmodels to automate various statistical analyses for plant health research

Data Scientist – PHEMI Systems, Vancouver, BC May 2017 – Dec 2017

  • Developed distributed data processing and analytics prototypes using Spark (Scala), Hive, and Zeppelin to demonstrate fast query and analytics on terabytes of clinical data
  • Implemented natural language processing pipeline with Scala and cTAKES, a library with both rule-based and machine learning techniques, to extract clinical information from unstructured medical text
  • Proposed machine learning and deep learning demos using Python, Scikit-learn and Tensorflow to show how medical imaging data can be analyzed to support diagnosis
  • Researched time-domain/frequency-domain signal processing and machine learning algorithms for fall detection based on biomedical signal data collected from wearable sensors using Python and Scikit-learn
  • Developed term partitioned index mechanism in Java to enable fast document search in Accumulo

Data Analytics Engineer - DBS Bank, Singapore Jul 2012 – Jul 2016

  • Developed SAS code to extract data from Teradata SQL databases and perform statistical analysis for Card & Unsecured Lending sales and marketing
  • Liaised with modelling team to deploy predictive models (Recommender System, Location Analytics) for targeted marketing, leading to 2x lift in customer response rate in digital campaigns
  • Proposed experimental design and hypothesis testing on different content factors to improve customer engagement in email marketing
  • Developed Java analysis reports and Qlikview dashboards to provide technology and operations teams with insight on process improvement and risk control
  • Performed process mapping and simulation modelling to optimize business processes, resulting in 10% reduction in operating cost
  • Led infrastructure design and deployment for an enterprise data management system


OTHER DATA SCIENCE PROJECTS

Skin lesion classification [Github]

  • Developed image processing pipeline to extract global features (shape, texture, color descriptors) and local features (visual bag of words with keypoint descriptors) of skin lesion images
  • Trained machine learning models and implemented model stacking for skin lesion image classification
  • Implemented convolutional neural networks and transfer learning to improve the classification accuracy
  • Technologies: Python, OpenCV, Scitki-learn, Keras

Drug efficacy prediction [Github]

  • Developed data processing pipeline to extract 1D, 2D and 3D molecular descriptors of chemical molecules and implemented model stacking to predict the molecules’ efficacy against HIV
  • Explored Graph Neural Network (GCN) and Recurrent Neural Network (LSTM) models from research papers for molecules’ representation learning
  • Technologies: Python, RDKit, Scitki-learn, Keras, Tensorflow

Longitudinal study: Effect of tobacco use on mortality [Github]

  • Analyzed National Longitudinal Mortality dataset to understand the effect of tobacco use on mortality using mixed model logistic regression and survival analysis (Kaplan-Meier estimator)
  • Technologies: R, ggplot2, glm, lme4, survival

Diabetes prediction based on clinical measurements [Github]

  • Analyzed Pima Indians Diabetes dataset to understand relationship between diabetes diagnosis and clinical measurements using generalized addictive models and tree-based methods.
  • Implemented an ROC curve interactive dashboard to show the optimal cut-off threshold based on a specific True Positive Rate or False Positive Rate criteria
  • Technologies: R, shinny, glmet, gam, gbm

Topic Model and Network Analysis on research publications[Github]

  • Developed web scraping function to retrieval publications from major machine learning journals
  • Analyzed topic models (Latent Dirichlet Allocation) to explore underlying topics and performed graph analytics to visualize interesting relationship among publication topics
  • Technologies: Scala, Spark ML, Spark GraphX, MongoDB, Gephi

Online Restaurant Recommender System [Github]

  • Implemented Collaborative Filtering Recommender System (Matrix Factorization, Item-Similarity) based on restaurant detail and user rating datasets from Yelp
  • Developed a web-based Proof of Concept to demonstrate and visualize the recommender system’s result
  • Technologies: pySpark, MLlib, Cassandra, Hadoop Cloudera Distribution, Google Map API, D3.js

Skills

Amazon Web Services (AWS), Artificial Intelligence, Big Data, Data Science, Hadoop, Keras, Machine Learning, MongoDB, SQL, Spark, TensorFlow

Joined: November 2019