This time last week, the much anticipated day finally arrived and Hadley Wickham came to visit my school. Also, my sinuses decided to do a number on me which I didn’t anticipate or appreciate 💀. That did not keep me from going to the planned events though 😬.
Before the talk, there was a Q&A session with the PhD students. The main takeaway I got from this session is that there’s still hope for me(and anyone else) wanting to transition to a data science career without having to re-invent the wheel and it’s not surprising that I feel a bit inadequate despite all this schoolin’ I have had. I have taken a statistics class, but nothing compared to a statistics student. And as for math…well the last time I took a math course was a looooong time ago back in undergrad and to be honest I just took two of the easiest math classes to check those boxes to fulfill the course requirements. For programming? Mostly self-taught. I only took a programming course back in high school and that was C++. High school was a looooooooong time ago though. I was seriously considering taking some classes at the university near my home after graduation, but after talking to Hadley(and also a friend who works for Facebook), I will just take advantage of the online MOOCs(Coursera, Udacity, Data Camp etc) that are related to data science.
A couple of hours after the meeting, I went to his talk which was on The Reproducible Life. He mentioned four characteristics of this life which are…..
He also talked about some of the packages which addresses these points.
To be honest, I could do better in living The Reproducible Life.
Hadley’s point that code is repeatable really struck home to me. I do rely on my good friends, Copy and Paste, way too much and I seriously just need to make a function for it. I remember in the Data Camp course on functions that if I am copying and pasting more than three or four times, then I need to make a function for it. For me, I think it is an issue of doing what is convenient at the time since I don’t have a lot of time to do the given task as it is. I feel that I will spend way too much time making a function and by the time I figured it out, I could have been done with the task at hand with Copy and Paste. I feel like I’m going to have to make this a Data Science New Year’s Resolution to get myself out of the habit of doing so and actually stick to the resolution.
After that, I need to work on finding ways to make code more transportable. Hadley mentioned the reprex package which addresses this issue. Reprex allows one to create a reproducible example of some problematic code so others can run it on their machine. It is really good to know this given I’m going to be doing my R Ready to Map course with the Programming for GIS class next week. Last year, a lot of students sent me code via email and it just wasn’t helpful at all. I’m going to have to add a section on using reprex so the students can use this to package their code when they have an issue.
I feel like I do a good job with making code sharable. My GitHub game has been getting better. I used to just e-mail code to friends, but now I just send them to my GitHub. I am now interested in putting the course materials for my R Ready to Map course into book using bookdown. When I first taught this course, I made a big PDF and sent it to the students. The second time I taught this course, I was a bit more comfortable with GitHub and put all of my documents on GitHub. Third time might be a charm with bookdown. I also like how others can easily make changes in the book which makes it a living reference. I’m going to take a look at this package this weekend and see how it works for me.
I comment like no one’s business. This is due to many reasons. I also comment my code so if I don’t touch it for a year and come back to it, I know what’s going on my code. Hadley mentioned that you write code where the future you can understand. I also think this is a practice that students should learn as well. This is best served by providing students a rubric of what to expect when they send in an assignment. When I was the TA for the Programming from GIS class, I made students include a code dictionary along with adding comments their code. If the students didn’t include a code dictionary or comments, then points will be docked off.
Hadley’s talk also made me consider creating a package. I always felt a bit intimidated doing that, but if I have some spare time next semester, why not take a stab at it? 😅 Next year. Next year. I have an idea on what I want to do it on too. One is fun. Another one is practical. I’ll just add that to my Data Science New Years Resolutions list.
All in all(minus the sinuses) it was a good day. Given I won’t be able to go to rstudio::conf() next year, I’m glad I had this opportunity! It’s good to know that my data science aspirations are not a %>% dream(total credit goes to my friend for this) and with a bit of hard work and dedication, I too can enter into a data science career.