“pip install pyspark”: Getting started with Spark in Python


I have worked with spark and spark cluster setup multiple times before. I started out with hadoop map-reduce in java and then I moved to a much efficient spark framework. Python was my default choice for coding, so pyspark is my saviour to build distributed code.

But a pain point in spark or hadoop mapreduce is setting up the pyspark environment. Having java, then installing hadoop framework and then setting up clusters …. blah blah blah…

I was looking for a simple one step setup process in which i can simply just do one click/command setup and just get started with coding in my local system. Once the code is ready I can simply run the job in a pre-setup cluster. (say over cloud)

So this article is to help you get started with pyspark in your local environment.

Assuming you have conda or python setup in local.

For the purpose of ease, we will be creating a virtual environment using conda. Simply follow the below commands in terminal:

conda create -n pyspark_local python=3.7

Click on [y] for setups.

conda activate pyspark_local

To ensure things are working fine, just check which python/pip the environment is taking.

which python
which pip
pip install pyspark

And voila! Its done!

Now that you have a pyspark setup. Let us write a basic spark code to check things.

We will we reading a file in pyspark now. So, create a sample.txt with some dummy text to check things are running fine.

Simply run the command to start spark shell: (you can do the same in python notebook as well)

pyspark

Now let us run the below code.

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .master('local') \ 
    .appName('firstapp') \
    .config('spark.come.config.option','some-value') \
    .getOrCreate()

# make sure the path of the file is correct
text_f = spark.read.text('sample.txt')
print(text_f.first())

Hope this will get you excited for running spark code in python!

For more: https://spark.apache.org/docs/latest/quick-start.html

Quiz: Do you know what is the role of SparkSession here? Comment below.

Advertisements

Topic: Matrix Rotation


In this section, I will be discussing a simple yet not so intuitive approach to solving matrix rotation problem. So the problem statement is to rotate a NxN matrix clock wise 90 degrees in an efficient way. I can think of it as Rubic’s Cube one side rotation.

Now what all changes when a matrix is rotated 90 degrees:

  • row 1 becomes last column,
  • row 2 becomes last second column,
  • row last becomes first column

Now I approach this problem in the following way…

  • Take one outer boundary at a time. Just the boundary.
  • Top right is at 0,0 index
  • Top left is at 0, len(A)-1 index
  • Bottom right is at len(A)-1, len(A)-1 index
  • Bottom left is at len(A)-1, 0 index

Now simply swap these four cells clock wise. top-left with top-right, top-right with bottom-right, bottom-right with bottom-left, and finally bottom left with top-left. Now just move one step ahead in clockwise direction.

When you are done with outer boundary movement, take a step into one layer inner circle, and perform above swaps the same way!

Illustration to show movement of swaps.

Below is the python code for same.

class Matrix_Rotation:
    # @param A : list of list of integers
    # @return the same list modified
    def rotate90Clockwise(self, A):
        for steps in range((len(A)/2)):
            for move_ahead in range(len(A)-1-steps*2):
                temp = A[len(A)-1-move_ahead-steps][steps]
                A[len(A)-1-move_ahead-steps][steps] = A[len(A)-1-steps][len(A)-1-move_ahead-steps]
                A[len(A)-1-steps][len(A)-1-move_ahead-steps] = A[steps+move_ahead][len(A)-steps-1]
                A[steps+move_ahead][len(A)-steps-1] = A[steps][steps+move_ahead]
                A[steps][steps+move_ahead] = temp
        return A
                

Can you find this code’s complexity? Comment below.

Algorithms Basics: Topic – Recursion


Recursive functions are well known in the domain on algorithms. They are widely used due to their simplicity of code. Here we describe the topic of recursion.

Recursive functions:

Any function that calls itself is called a recursive function!

Yep! It’s just that simple. Let me explain with an example. Below is the code for fibonacci series. Note how fib() takes an input of (n) and returns the n-th term of the series. This function calls itself when n>2. This is a typical example of recursive functions.

def fib(n):
	if(n>0):
		if(n==1):
			return 0
		if(n==2):
			return 1
		else:
			return fib(n-1) + fib(n-2)

Fork the code yourself here!

Properties of recursive functions:

  • Recursive functions have two cases:
    • A base case: where the function terminates.
    • A recursive case: where the function calls itself.
  • Every recursive function must terminate at base case. If not defined, the function goes into infinite loop and hence result in stack overflow.
  • Recursive functions store the function information in stack memory.
  • Every algorithm modeled using recursive function can also be modeled using iterative functions by using stack data structures.
  • Iterative solutions are more efficient than recursive functions due to the overhead of extra function calls and stack memory usage.
  • Recursive functions are used since it is easy to visualize the algorithm using recursive approach.
  • For some problems there are no obvious iterative approach algorithms, and hence recursive functions are preferred.

Examples of recursive algorithms:

  1. Fibonacci Series. Try yourself here!
  2. Factorial of a number. Try yourself here!
  3. Merge Sort.
  4. Quick Sort.

Amazing Grace


I just came across this wonderful experience of watching Rear Admiral Grace M. Hopper  on David Letterman Show! And trust me, I am just talking to my self since then , Where on earth you were Ankita till now! Why you haven’t seen this video till now! I think everyone on this planet who believes in science must try and know more about Amazing Grace! Thank you Mam (Rear Admiral Grace M. Hopper) for making the first compiler! If you wouldn’t have done that, this post or my identity couldn’t have been possible!

To all readers:

  • Please read more about her: https://en.wikipedia.org/wiki/Grace_Hopper
  • Enjoy the below video

Requirement Analysis explained in a Screen Play!!


What is better than learning from a movie!! I know theory subjects can get boring sometimes… But We are here with a video to explain the concepts of Requirement Analysis Phase of Software engineering!!  This video was created as a part of our project in college. Back then we were kids with great enthusiasm, which we have always kept alive! This is an initiative to let the ideas explore! The video also features an in-depth presentation of how a software is created from scratch! Let us know your feed back!!

************************************************************************************************************************************************************

Innovate hub is in a constant endeavour to provide its readers the best it can. Although a niche, we believe in starting from the roots itself. Put your comments, messages below or hit Like! Subscribe to our YouTube Channel for interesting videos! Don’t forget to hit the “I am Innovator” Button to follow this blog! Thanks a Ton!

Solution to Route To Mumbai Problem! #PGEE #trickyquestion #twodoorsriddle


Lets have a look at the constraints of the problem…

1) One of the two roads go to Mumbai – the traveler’s destination.

2) One of the two brothers speak truth and other is a liar.

If the traveler asks the direct question to one of the brothers- “Which road goes to Mumbai?” to any one of the brothers, there could be one of the two cases:

  • The traveler asked this question to the brother who speaks truth. And he is pointing to the right road.
  • The traveler asked this question to the brother who speaks lie. And he is pointing to the wrong road.

In any of the case, the traveler is in 50% probability of getting the right road. Since there is no way traveler can tell with this question who is the truth telling brother. Clearly direct question is no way to our solution.

But if we think little logically and ask the traveler this question to one of the brothers – “What do you think if I asked your brother(means the other brother) on which road goes to Mumbai, what will be his reply?”

Now lets look at the possible answers to this question. Any one of the two cases are possible –

  • Traveler asked this question from  the Truth telling brother. Now the truth telling brother always speaks truth. And he know that his brother is a liar. So he must reply the road his brother will point to, on asking the route to Mumbai. Since his brother is a liar, He will point to the Road that leads to Delhi. Truth speaking brother will thus point to road to Delhi as his reply to the intelligent question asked.
  • Traveler asked this question from the liar brother. Now he knows that his brother is truth speaking person. And his brother will point to Route to Mumbai on the question – Which road goes to Mumbai. But here comes the Twist! This brother is a liar. He always speak false. So he will point to route to Delhi as opposed to his brother’s reply. 

Notice the situation carefully. In both cases the traveler is getting pointed towards the Route to Delhi. Now this traveler is smart. And he had already analyzed the responses before asking his intelligent question. So he will now know which road is wrong. Clearly the other one is his Route to Mumbai!

Happy travelling Sir!


Innovate-Hub is constantly in the endeavor to provide its readers the best they need. To help us grow and analyze our readers wants, help us by answering the below survey.

Link to the Survey