I have started working on my habit of learning something new every day and I want to focus on building my profile as a Data Scientist in the domain of Finance. Now I am not limited to Finance, but the majority part of my focus will be Finance. There are three main reasons for it:
- I Love the field.
- It is really fascinating to learn about financial terms, trading and trading strategy.
- I currently work in this field.
- A major part of my day is revolved around finance.
- It because of my office work.
- I have a curiosity to learn everything!
Now coming back to the main topic of this post. I have started working on coding each day and bit by bit I want to excel my coding skills for AI. Hence, I have taken up the challenge of 100 Days of Code.
If you haven’t heard of this challenge then lets me brief on it first. It pretty simple actually.
Code for next 100 days.
Told ya. Pretty simple!
But how to make sure that I am actually doing and not procrastinating in the middle is by logging it on my Github.
Here is the link to the log on my GitHub.
Now the part of this challenge is to grow with the community. and hence, I would like to tell what all have I done in the past few days. This way I would like to invite you, innovators, to join me in this challenge. I would brief at last how to join the challenge. But let me first tell what have I done so far.
Day 0 and Day 1: 16 and 17 August 2018.
Image Classification using OpenCV and tensorflow
I built a pipeline to classify images using OpenCV and tensorflow libraries. Well, the task was to pick images and classify them into 6 classes. Finally saving the trained model.
Once the model is saved, it can be used to test on test images. This was my second task. Since this was my first attempt at a self-project using OpenCV and tensorflow, I had many challenges while coding. But none the less, I have completed this task.
Oh, what an amazing experience. I would come back and work on deploying this project and will put a formal post on this blog’s Data Science page.
Refer the GitHub repo for more.
Day 2: 18 August 2018.
AI For Trading: Project 1 – Trading with Momentum
I have joined the course AI for Trading. AI For Trading course is my attempt to learn more about financial data and working on them using AI. So the first project in this course was Trading with Momentum. I have finished this one! Yayyi!
Refer the GitHub repo for more.
_____ That’s all since it just been 3 days for me to take up this challenge seriously. Hope I continue working on it.
How to Join this challenge??
Here are the details of this challenge >>>
Hope you would join me. I would continue updating this post as I progress. Wish me luck!!
Recursive functions are well known in the domain on algorithms. They are widely used due to their simplicity of code. Here we describe the topic of recursion.
Any function that calls itself is called a recursive function!
Yep! It’s just that simple. Let me explain with an example. Below is the code for fibonacci series. Note how fib() takes an input of (n) and returns the n-th term of the series. This function calls itself when n>2. This is a typical example of recursive functions.
def fib(n): if(n>0): if(n==1): return 0 if(n==2): return 1 else: return fib(n-1) + fib(n-2)
Fork the code yourself here!
Properties of recursive functions:
- Recursive functions have two cases:
- A base case: where the function terminates.
- A recursive case: where the function calls itself.
- Every recursive function must terminate at base case. If not defined, the function goes into infinite loop and hence result in stack overflow.
- Recursive functions store the function information in stack memory.
- Every algorithm modeled using recursive function can also be modeled using iterative functions by using stack data structures.
- Iterative solutions are more efficient than recursive functions due to the overhead of extra function calls and stack memory usage.
- Recursive functions are used since it is easy to visualize the algorithm using recursive approach.
- For some problems there are no obvious iterative approach algorithms, and hence recursive functions are preferred.
Examples of recursive algorithms:
Identifying the Problem Type:
It is important in Machine learning to understand the problem type first. If it is continuous output – [1,23,4,5,6, 5.5, 6.7,..], use Linear Regression. If it is a categorical output – [0,1,0,0,1…] or [‘High’, ‘low’, ‘Medium’, …] etc., go for Logistic Regression. Since your target labels are either 0 or 1, this is a problem to be worked with Logistic Regression or other Classification algorithms (SVM, Decision Tree, Random Forest).
You must convert your data to numeric format or standardized format for regression.https://realpython.com/python-data-cleaning-numpy-pandas/
In case you are looking for a starter code for your problem, you can find that from Kaggle kernels. Here are a few links:
Why do machine learning models need regularization?
- Overfitting is a state where the model is trying too hard to capture the noise in your training dataset. This means each point and feature in the training set is too much fitted with the visible training set, that it fails to understand anything beyond the train set. The leads to low accuracy in the test set.
- Overfitting the train set is being specific to training set data. Hence to have good accuracy on the test set (unknown to model), it must generalize.
- Overfitting happens due to the heavy bias and variance in the data.
Now Let us understand cross-validation and regularization:
- Cross-Validation: One of the ways of avoiding overfitting is using cross-validation, that helps in estimating the error over the test set, and in deciding what parameters work best for your model. Cross-validation is done by building models on sub-samples of train data and then getting results on sub-sample test sets. This helps in removing the randomness in data, which may be the cause for the noise. This is different from regularisation technique, but it has its importance in choosing the regularization parameter which I will explain below.
- Regularization: The is a technique in which an additional term is introduced in the loss function of the learned model to remove the overfitting problem.
Let me explain:
Consider a simple relation for linear regression. Here Y represents the learned relation and β represents the coefficient estimates for different variables or predictors(X).
Y ≈ β0 + β1*X1 + β2*X2 + …+ βp*Xp
A machine learning model is trying to fit X with Y to attain the β coefficients. The fitting procedure involves a loss function, known as residual sum of squares or RSS.
This is sum of square of the difference between actual (y_i) minus predicted values (y_predicted_i).
The coefficients are chosen, such that they minimize this loss function.
A zero (or minimum) loss function indicates the tight fit of the model with parameters. In layman terms, the actuals and the predicted in the train are same.
Hence this RSS function helps in finding the optimal coefficients of the equation.
(The below equation is before regularization)
Now, this will adjust the coefficients based on your training data. If there is noise in the training data, then the estimated coefficients won’t generalize well to the future data. This is where regularization comes in and shrinks or regularizes these learned estimates towards zero. (source)
For regularization (Using ridge regression), the loss function is modified as follows:
Note, a lambda (λ) parameter is multiplied with each of the coefficient parameters. λ is the tuning parameter that decides how much we want to penalize the flexibility of our model. The increase in flexibility of a model is represented by an increase in its coefficients, and if we want to minimize the above function, then these coefficients need to be small. This is how the Ridge regression technique prevents coefficients from rising too high.
Selecting a good value of λ is critical. Cross-validation comes in handy for this purpose. The value of lambda depends on the data, and there is no universal rule how a lambda should be. So to find the optimal value of lambda, several models are created using cross-validation and the lambda is averaged among the best performing models.
Below is an image showing sample data points and learned equation. The green and blue functions both incur zero loss on the given data points. A learned model can be induced to prefer the green function, which may generalize better to more points drawn from the underlying unknown distribution by adjusting lambda (regularization term), the weight of the regularization term.
Referred below links for this answer:
Qualcomm came to our campus for internship recruitment. They were looking for hardware engineers and software developer. Internship was open for both CSE and ECE graduates. Process had 3 rounds:
- Written Round
- Technical Interview
- HR interview
Pattern was MCQ based. It had 3 Sections: General Aptitude, CS Technical, Electronics Technical
This was not too tough. It comprised of 4 programming (subjective) questions as well. I forget but I guess it was about writing a recursion based code. For CS Technical, mostly C/C++ based questions, pointers, output based, etc. For ECE Technical it had Computer Organization, Computer Networks, One flipflop question, etc.
The interviewer was kind. It was 45 minutes around. He asked questions on projects, programming language. Difference between OS and kernel. Design a software for Airplane. My mobile Model, processor(it has qualcomm snapdragon 650). Then If my phone supports virtual memory.
This was just simply to check your communication skills, your attitude towards qualcomm. Pretty simple.