Adding PySpark to Jupiter Notebook in Anaconda shell

I have downloaded Spark, Hadoop, etc. I cannot link PySpark to Jupyter Notebook/Lab.

I have entered in Jupyter Notebook:

  1. pip install findspark (no error)

import findspark
findspark.init()

import pyspark
sc = pyspark.SparkContext()

f = sc.textFile(‘recent-grads.csv’)
data = f.map(lambda line: line.split(’\n’))
data.take(10)

Error - cannot find findspark, etc

Please assist.

1 Like

hello.
Sounds like a big deal.

First, on github

Let’s read.
Since jupyter’s python is IPython, what is written below must be executed.

Findspark can add a startup file to the current IPython profile so that the environment vaiables will be properly set and pyspark will be imported upon IPython startup. This file is created when edit_profile is set to true.

ipython --profile=myprofile
findspark.init('/path/to/spark_home', edit_profile=True)

Findspark can also add to the .bashrc configuration file if it is present so that the environment variables will be properly set whenever a new shell is opened. This is enabled by setting the optional argument edit_rc to true.

findspark.init(’/path/to/spark_home’, edit_rc=True)

If changes are persisted, findspark will not need to be called again unless the spark installation is moved.

Sincerely, yours.
ktsh.tanaka.2020

Hi

Thanks. I had IT helping with the path, etc and it is working. I have tested in Jupyter Notebook.

Have a nice day.

Kind Regards

Liana

1 Like

from ktsh.tanaka.2020 to Isabella_Ahrens_Teix

Thank you for reading the advice.
Please have a nice day.

Regards you.
ktsh.tanaka.2020