I have downloaded Spark, Hadoop, etc. I cannot link PySpark to Jupyter Notebook/Lab.
I have entered in Jupyter Notebook:
- pip install findspark (no error)
-
import findspark
findspark.init()
import pyspark
sc = pyspark.SparkContext()
f = sc.textFile(‘recent-grads.csv’)
data = f.map(lambda line: line.split(’\n’))
data.take(10)
Error - cannot find findspark, etc
Please assist.
1 Like
hello.
Sounds like a big deal.
First, on github
Let’s read.
Since jupyter’s python is IPython, what is written below must be executed.
Findspark can add a startup file to the current IPython profile so that the environment vaiables will be properly set and pyspark will be imported upon IPython startup. This file is created when edit_profile
is set to true.
ipython --profile=myprofile
findspark.init('/path/to/spark_home', edit_profile=True)
Findspark can also add to the .bashrc configuration file if it is present so that the environment variables will be properly set whenever a new shell is opened. This is enabled by setting the optional argument edit_rc
to true.
findspark.init(’/path/to/spark_home’, edit_rc=True)
If changes are persisted, findspark will not need to be called again unless the spark installation is moved.
Sincerely, yours.
ktsh.tanaka.2020
Hi
Thanks. I had IT helping with the path, etc and it is working. I have tested in Jupyter Notebook.
Have a nice day.
Kind Regards
Liana
1 Like
from ktsh.tanaka.2020 to Isabella_Ahrens_Teix
Thank you for reading the advice.
Please have a nice day.
Regards you.
ktsh.tanaka.2020