High school summer interns generate OpenTopography user and data analytics

Sep 29, 2014

This summer three San Diego high school students interned for OpenTopography as part of the Research Experience for High School Students (REHS) program at the San Diego Supercomputer Center (SDSC) at UCSD. The students primarily worked on analysis of anonymized user behavior data as well as data usage patterns to identify relationships and trends in OpenTopography system use. A summary of their work was presented in a scientific poster session at SDSC (included below). Some of the students were also featured in a story on the local San Diego CBS news about the internship program.

The interns collectively wrote the following blog post summarizing their REHS experience working on OpenTopography:

By: Rosalle Chen, Poway High; Lobna Allam, Westview High; Alec Wall, University City High

Our first day at SDSC (San Diego Supercomputer Center) started with a REHS orientation session where use interns were informally introduced to each other and to the organizers during registration. The administrative staff organized some activities to better acquaint us with our fellow interns and the UCSD campus. Also we were introduced to the center?s many supercomputers with a machine room tour and has our hands scanned (biometric security) for off-hours building access.

On the second day, we were formally introduced to our mentors, data scientists that worked in the Advanced Cyberinfrastructure Development (ACID) group at SDSC, who were going to guide us for the next two months. They debriefed us on the OpenTopography and Tropical Ecological Assessment and Monitoring Network (TEAM) projects.

On the third day, we got down to business. Our first assignment was to get familiar with SQL code using online tutorials from SQLZoo and W3Schools. This was followed by learning HTML & CSS from CodeAcademy. After two weeks of familiarizing ourselves with the languages and PostgreSQL database that we would be utilizing in our tasks, we were given data in a CSV file containing de-identified user visit and registration information. We were told to upload the data into PostgreSQL database and utilize our newfound knowledge in SQL to analyze the data, extract the essential relationships, and the graph the resulting patterns we saw in Microsoft Excel. After a few days of working on our first task together (we are a team of three), each of us completed our own graphs. After we each sent them to our mentors, we each were assigned three different tasks that we had to complete over the remaining few weeks.

For our respective first tasks, we analyzed user registrations and site visits, categorized them by domain and country, and observed the impact of meetings, conferences, and dataset releases on the overall user growth in OpenTopography. Site traffic and user visits/registrations were positively affected by dataset releases, especially the SRTM dataset. Additionally, annual geoscience workshops boosted site traffic especially during October and November. Most users yielded from private companies/personal emails and educational facilities.

Our second task consisted of generating heatmaps (hot spot regions that are selected most often) for the SRTM, B4, and Indiana lidar (raster and point cloud) datasets. Our mentors introduced us to Google Fusion Tables and walked us through some of the basic concepts of making and inputting correctly formatted bounding boxes into the table, which allowed us to generate and visualize the heat maps. Analyzing user queries to generate heat maps helped determine more optimal cost-effective storage solutions.

Our third task consisted of creating pivot charts in Excel that analyzed lidar scientific datasets use over time and compared that to the use of popular datasets like Youtube. We noticed that lidar scientific datasets received continued use over time where as a popular data set (non-scientific) such as those of a youtube video receive access in the first days of its release and die off.

As of now, we are entering our last week of the summer internship at SDSC and will be presenting our completed posters to our parents and other groups on Friday. This summer truly was an eventful, fun, and stimulating learning experience at the San Diego Supercomputer Center. The internship helped increase our interest in pursuing careers in the STEM fields.

Click image to view larger