Data Science: Where Do I Start From?
If you have no experience at data analysis before, you should definitely start from Microsoft Excel.
Data Science has essentially created a buzz and many people want to join the train. However, like Francis Chollet, the creator of the Keras library for Deep Learning rightly pointed out, you should not believe in the short term hype but in the long-term effect of Data Science. Due to the short term buzz, there are a plethora of resources online and many people are apparently not sure about where to start from. You should read What It is like to learn Data Science in 2019. It is a beautiful piece. I won’t be attempting to give you a road map to learning Data Science in 2019 (you can check out Analytica Vidhya’s beautiful guide). I would also not be attempting to define the scope of Data Science. Rather, I will be laying out a practical guide on how you might want to attempt learning data science.
First Things First
If you have no experience at data analysis before, you should definitely start from Microsoft Excel. The course to start with is Introduction to Data Analysis using Excel on EdX. You can then move to Analyzing and Visualizing Data using Microsoft Excel. Both courses are offered by Microsoft and can be audited for free if you are on a tight budget. The instructors and labs are great and it was a great learning experience for me. Alternatively, if you want the help of a physical tutor and you are on a tight budget, you can also Request for a Personal Tutor from SkillNG. I recommend these courses as they help you make a critical decision as you move on: am I satisfied with being a traditional Data Analyst or do I really want to be a Data Scientist. Spoiler Alert: using a search engine to check out the difference is likely to further confuse you.
Okay, I guess I will have to do some definitions. Across all Data Science learning platforms I have come across (EdX, Coursera, DataCamp, Udemy, Udacity, AnalyticsVidya among others), the common denominator about all the Data Science Specializations/ Courses is that you need a bit of Mathematics (Linear Algebra and Basic Calculus), Statistics (Basic Descriptive and Inferential Statistics), Programming (Python, R, and SQL), Research and Business/ Domain Knowledge. The key is Data Science is more than just one or more of these skills in tandem, it blends all of these skills alongside communication and other ‘human’ skills. While many of these courses differ on the depth of each skill that is required, they all mostly agree that a basic blend of these skills is needed to be a data scientist.
Early Data Science Steps
If you are sure that you want to be a data scientist (there is absolutely nothing wrong in sticking to pure data analyst or even jumping ahead to being a Machine Learning Engineer), your next step should be spending time learning Mathematics, Statistics, and Programming. The Udacity Bertelsmann Data Science Challenge Course offered this to me. You learn the basics of Statistics, Python Programming, and SQL required for Data Science. While the challenge course ended a while ago, you can use these resources: Descriptive Statistics, Inferential Statistics, Programming Using Python (note that this course is very Computer Science oriented). For knowledge of Mathematics, you can brush up knowledge of Matrices, Vectors and Differential Calculus on Udemy, YouTube or Khan Academy. There are excellent resources there and anyone will suffice. While I really enjoyed the Introduction to Machine Learning Course offered by Andrew Ng on Coursera, I wouldn’t recommend taking it as it is very intensive and the assignments are done in Matlab/ Octave, languages that many people no longer use to do Data Science. For more practice on SQL, R or Python for Data Science, DataCamp is the place to go. If Python is your preferred language of choice (it should be really), then UC San Diego’s Python for Data Analysis on EdX is a great resource you should make use of. Finally, you should try out Principles of Machine Learning using Python, Data Science Research Methods both offered by Microsoft on EdX to give you a feel of Machine Learning and the Research World. Note that while these are my recommendations based on my learning experience, you can always check out other excellent resources. I cannot name all the resources I have used or come across here, I am only sharing those that hugely resonated with me.
Deep Learning Foretaste and Practice
After soaking up these learning resources, you should pivot a little bit into Deep Learning. While Deep Learning is not an essential part of Data Science from my interactions with Data Science resources, I feel it is a valuable addition to your toolbox as a Data Scientist. Your deep learning learnings should include Convolutional Neural Networks, Recurrent Neural Networks, and Recommendation Systems. Either before or after pivoting into deep learning, you should pick up projects/ real-life datasets to work on. You can also join live competition. Kaggle is a great site to start from. However, Kaggle can get easily confusing if you don’t know what exactly you are searching for. I recommend Zindi Africa Loan Prediction and Busara Health datasets if you want to test your data cleaning skills and have a feel of perhaps the most challenging type of structured data to work with; imbalanced classes. You can also check out AnalyticsVidhya’s learning challenges. While it is important to have very good results in these challenges, projects and/or competitions, remember you want to learn as much as you can and apply all you have learned so far as a data scientist. At this point, you should see that Stack Overflow or Google generally will be your best friend. Also, I recommend you should try to pick a real-life project to work on. I applied the Data Science workflow to solve an engineering problem as part of my final year project (if you don’t already know, I am an engineering student). This brought to life the entire skill set that I already mentioned that a data scientist should have. It was hugely challenging but fun all the way. Lastly, open a GitHub account where you can always put up projects you have worked upon and start following some data science blogs.
I realize that this is quite a long read but I hope I have been able to answer the ‘Where do I start from’ question. If you haven’t noticed already, I am a very audio-visual biased learner but I have come across and used some excellent books too. You can check them out if you lean more towards learning from books (they are all written for Python users). However, whether you lean towards books or videos, I recommend you use both of them to aid your learning curve.
- Introduction to Machine Learning with Python by Muller A.C. and Guido S.
- Statistics ., The Art and Science of Learning from Data by Agresh Franklin
- Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurelien Geron
- Automate the Boring Stuff with Python by AL Sweigart
- Deep Learning With Python by Francis Chollet
- Predictive Analytics, The Power to Predict Who Will Click, Buy, Lie or Die by Eric Siegel
- The Data Science Handbook by Field Cady
- Python Data Science Handbook by Jake VanderPlas
- Big Data, A Revolution That Will Transform How We Live, Work and Think by Viktor Mayer-Schonberger
P.S: Try to join a Data Science Community early on. Attend meet-ups and boot camps. AISaturdays is an excellent platform if there is one near you. For people living in Nigeria, Data Science Nigeria is perhaps the best platform where you meet people who not only encourage you as you learn but also challenge you to be better. You can also find a group of data science enthusiasts that act as a sort of support group for you. I did and we call ourselves the Pilgrims. Lastly, no matter how dedicated you are, learning Data Science will take some time. Push yourself as much as you should but don’t be too hard on yourself. Au revoir.
Do you need personal tutors to stay with you on each step of this journey? WE CAN HELP YOU AT SKILLNG…whether you want to learn Excel or you want to start your first coding session in Python…
OR JOIN OUR WEEKEND PYTHON CLASSES