Lisa Lendway, PhD
Office Hours: See Moodle page
Email: llendway@macalester.edu
Preceptors: See Moodle page
This capstone course will expand on R skills you learned in COMP/STAT 112: Introduction to Data Science and STAT 253: Statistical Machine Learning. The biggest part of the course is a project which you’ll start working on early in the course. Last spring, class was on Monday, Wednesday, and Fridays so I described the course using cute names (Modeling Mondays, Wisdom Wednesdays, and Function Fridays). Although we’re meeting on Tuesdays and Thursdays now, so the names don’t match the days, I decided to still keep them because I think they do a good job of describing the main themes of the course. So, just know the topics will not only occur on those days. I have a calendar at the end you can reference for more detail.
Modeling Mondays are focused on … modeling, extending the machine learning tools and skills you learned in STAT 253. This will definitely include learning new modeling methods and tools. You will also learn about topics that are important parts of the modeling process but aren’t directly related to building a model: using git and GitHub for sharing code and collaborating, using R and SQL to extract data from a database, building shiny apps to put a model in a place where people can use it, exploring new ways to aid model interpretation, and evaluating the impact of the model. The learning material for these days can be found on the Course Materials tab of the course website. Specifics of which material to read when will be found on the moodle page. You will do the reading/watching/prep before class and come to class prepared to work on problem sets that reinforce these topics.
Wisdom Wednesdays are guest speaker days. We will have speakers from a variety of industries visiting our class. They will tell you about what they do (ie. pass on some wisdom) and you will have an opportunity to ask them questions. There will also be time on Wednesdays for you to share wisdom with one another while working on problem sets.
Function Fridays are when all of you are the teachers! You will work in groups of 2-3 students giving a ~20 minute presentation/tutorial on an R package or useful set of R functions. In addition to the presentation, you will also write a problem or short set of problems to be added to a problem set. Some ideas for potential topics are listed below. You will meet with me roughly a week before your presentation to talk about your topic, and we will meet again a couple days before to assure you’re ready to present.
purrr
stringr
, including its cheatsheet.reticulate
tidytext
rayshader
data.table
library to manipulate data.DT
, gt
, kable
, kableExtra
, etc.NEW! Tidy Tuesdays are days to practice your data wrangling and visualization skills! If you took Intro Data Science with me last year, you know what Tiday Tuesday is about. If not, you can read more about it on their website. We’ll work on this in-class for roughly 30-40 minutes every other week and you will spend some more time outside of class. See more details below and in the separate document I’ll provide.
Additionally, Data Ethics/Justice will be a key component of the class. I am not an expert in this area so I will be doing a lot of learning along with you.
Academic Integrity: Students are expected to maintain the highest standards of honesty in their college work; violations of academic integrity are serious offenses. Students found guilty of any form of academic dishonesty – including, for instance, forgery, cheating, and plagiarism – are subject to disciplinary action. Examples of behavior that violates this policy, as well as the process and sanctions involved, can be found on the Academic Programs website.
Accessibility: I am committed to ensuring access to course content for students. Reasonable accommodations are available for students with documented disabilities. Contact the Disability Services Office, 651-696-6874 to schedule an appointment and discuss your individual circumstances. It is important to meet as early in the semester as possible; this will ensure that your accommodations can be implemented early on. The Director of Disability Services coordinates services for students seeking accommodations.
Diversity: At Macalester, we embrace diversity of age, background, beliefs, ethnicity, gender, gender identity, gender expression, national origin, religious affiliation, sexual orientation, and other visible and non-visible categories. I do not tolerate discrimination. We are all here because we deserve to be here.
Names/pronouns: You deserve to be addressed in the manner you prefer. To guarantee that I address you properly, you are welcome to tell me your pronoun(s) and/or preferred name at any time, either in person or via email.
Problem sets: You will complete ~6 problem sets, mostly in the first half of the course. These will reinforce the modeling concepts I cover and will include problems from the topics covered by students. You are encouraged to work in groups but each person will turn in their own assignment. These will be graded by the preceptors and I.
Function Friday teaching: You will work in groups to present a topic of the group’s choosing (see my suggestions above). Presentations will occur in class and will be about 20 minutes.
Attending and participating in guest speaker sessions: You are expected to attend class when we have a speaker and participate in conversation with the speaker. This is your opportunity to learn about what people do in their work as data scientists! The majority of the speakers will be on zoom, so I will try to give you some time to get to other places if you don’t want to be in the classroom during that time.
Tidy Tuesdays: You will do about 6 of these throughout the semester. You’ll spend 30-40 minutes in class working on them, discussing ideas with the people around you. You will post your work, including the final graph, to your website. I will provide more detail in a separate document on the moodle page.
Project: This is a HUGE part of this course! You will start working on the project during the 3rd-4th week of the course and much of the 2nd half will be dedicated to that project. I will provide more details in a separate document.
Your grade will be determined mostly by you. I will provide you with written and oral feedback and you will have opportunities to reflect on your learning and evaluate how you are doing. In the end, you will decide your letter grade. If I feel your choice is really different than the grade I would have assigned, I can change it, but we will have plenty of opportunities to talk about this.
Week | Start date | Topics | Notes |
---|---|---|---|
1 | 2020-08-30 | git/GitHub, creating a website | |
2 | 2020-09-06 | ML review with tidymodels | |
3 | 2020-09-13 | Model stacking | |
4 | 2020-09-20 | Boosting | |
5 | 2020-09-27 | SQL, Shiny | |
6 | 2020-10-04 | Interpretable ML | |
7 | 2020-10-11 | Interpretable ML | |
8 | 2020-10-18 | Catch-up | Fall Break |
9 | 2020-10-25 | H20 | |
10 | 2020-11-01 | Deep Learning | |
11 | 2020-11-08 | plumber & Docker | |
12 | 2020-11-15 | ||
13 | 2020-11-22 | ||
14 | 2020-11-29 | Thanksgiving Break | |
15 | 2020-12-06 | ||
16 | 2020-12-13 | Project presentations |