Data analyst excited to use skills in Python, Pandas, and SQL developed in Columbia University’s data analytics bootcamp. A life-long learner always curious to gain new skills and efficiently extract useful information out of data sets and present it with effective visualizations. Enjoys problem solving and is driven to use data to combat the impending ecological crisis more effectively. A background with a master’s in environmental policy and technical analysis skills lay a good foundation for work supporting data driven decision-making particularly in environmental solutions. Works well with teams and individually. Strengths combine adaptability and big picture thinking with detailed analysis and critical thinking, making me a positive force on any team.
I am an enthusiastic data analyst with a passion for condensing large data sets into clear tables and visualizations using Python, Pandas, Matplotlib, SQL, Excel, and Tableau, R, and Machine learning. I'm available for work!
Data Analytics Certificate
Data Analysis is a high-growth career track, and Columbia University Data Boot Camp teaches you the specialized skills for analysis from ETL, to data collection via webscraping and API calls, to data visualizations. Through a fast-paced, immersive curriculum, I've learned in-demand skills and technologies.
M.S. Environmental Policy
Graduate degree with a concentration in energy and sustainability, included a capstone project on methane emissions in the oil and gas industry and a clean energy transition. Other noteable projects include a sustainable transportation analysis in the Roaring Fork Valley of Colorado, and work in environmental economics and finance with a group project analyzing green upgrades for a development project.
B.A. Spanish
Graduated with honors and attended an intensive Spanish emmersion study abroad in Valencia, Spain. Other courses included an internship with the PEAS farm, independent study to become a nationally certified EMT, and plant ecology study and post graduation work with graduate students in the ecology department.
The purpose of this analysis is to determine how ride sharing data varies by city type. The three city types are urban, suburban, and rural. Ride sharing is more popluar in urban and suburban areas leading to a few trends; there are far more rides and drivers in urban cities and the fares are typically higher in rural areas per driver. Investing more money or resources in urban areas makes the most sense as the ride sharing programs are more robust and make more sense in densely populated areas.
View Code
This analysis uses data from a mock company's HR to create a relational database in order to determine the number of employees that will retire soon, which departments the employees work in and their titles. It also determines some employees could serve as mentors in a mentorship program, so the company would not face such a threat from high number of retiring employees in the future and address the leadership vaccum the retirements are likely to create. The analysis uses SQL to create relational databases which will help keep sensitive information private while tracking information about the employees. Follow up provides two queries: the first shows unique titles for the retiring employees and the second shows employees eligible for a mentorship program based on their date of birth.
View Code
The purpose of this analysis was to complete a full group project from data and topic selection to EDA and data analysis, database creation, and database connection to machine learning model. The results were then displayed in a Leaflet map and on a Tableau dashboard. The project was presented as a group with Google slides. The project analyzes crop yields from countries around the world and predicts crop yields based on a number of features with a multivariate linear regression model.
View Code
This project uses unsupervised machine learning to cluster data. The project began by preprocessing the data with Pandas. This includes cleaning the data and using get_dummies() to convert to numerical values. Then scaling the data with StandardScaler from the sklearn library. Then the Principal Component Analysis algorithm is applied to reduce the number of features to 3. The K means algorithm is then applied using the elbow curve to get the best value for K. Joining the cleaned dataset with the PCA data yields a new dataframe with information on crytopcurrencies and their clusters or 'class'. The data is visualized with a 3D scatter plot to see the different clusters. Finally, the total coins and total mined coins columns are scaled and fit to be plotted on a scatter plot.
View Code
This project utilizes Javascript, Plotly, and HTML/CSS to plot data on various charts and display them on a webpage. The project is presented on pages on Github. The charts analyze data from bacteria samples from bellybuttons to match the bateria with a flavor for a meaty flavor to add to vegan burgers. The page includes interactivity with a dropdown to select the data from different test subjects.
View Code
This project uses JavaScript, a Mapbox API call, and Leaflet library to create interactive maps that display data from USGS to map the earthquakes that occured over the course of the past week. The maps have dfferent styles that users can select from and also show the tectonic plates.
View Code