The course includes an introduction to the methods of modern statistics such as splines, general additive models, principal components analysis, and classifiers. Students learn resampling methods such as bootstrap, cross-validation, boosting, and bagging. Methods of model selection include search-and-score and regularization, and students practice communicating technical ideas to a non-technical audience, including via data visualization.
This course provides a hands-on introduction to the manipulation and visualization of complex data sets using a programming language. Efficient techniques for importing and exporting data in various formats, data acquisition, data integrity, and good analysis practices are discussed. Several programming tools and libraries are introduced to restructure, transform and fuse disparate data types for visualization and data summaries in table format. Basics of manipulating space-time data is also covered.
This course introduces software tools and data science techniques for analyzing big data. It covers big data principles, state-of-the-art methodologies for large data management and analysis, and their applications to real-world problems. Modern and traditional machine learning techniques and data mining methods are discussed and ethical implications of big data analysis are examined. May be offered in conjunction with CIS*6180.
This course emphasizes machine learning for sequential data processing. It covers common challenges and pre-processing techniques for sequential data such as text, biological sequences, and time series data. Students are exposed to machine learning techniques, including classical methods and more recent deep learning models, so that they obtain the background and skills needed to confront real-world applications of sequential data processing. May be offered in conjunction with CIS*6190.
This course introduces software tools and data science techniques for analyzing big geospatial data. An overview of raster-based geographic information systems (GIS) for identifying patterns and clusters in spatial-temporal data using state-of-the-art software and programming languages is provided. Concepts such as kriging/Gaussian processes, variograms and autoregressive correlation structures are discussed. Data summaries and visualizations specific to spatial-temporal problems are introduced.
This interdisciplinary team-taught seminar course provides students the opportunity to synthesize information, research methods, and present cutting-edge applications of data science. Learning outcomes include identifying reliable sources, understanding and presenting relevant contemporary data science methods, thinking critically about practical implementations of data science, and effective peer collaboration. Emphasis is placed on effectively communicating technical content and insights to a non-technical audience.
This course is a one-semester research project course for students in the Master of Data Science program. In this course, students plan, develop, and write a faculty- or industry-led research paper, as well as present on their work. The project should advance knowledge or practice in data science or a closely related area, and address a real-world problem faced by industry. The project should focus on data science in the spatial and temporal dimension(s), to be approved by the course instructor.