I have downloaded Spark, Hadoop, etc. I cannot link PySpark to Jupyter Notebook/Lab.
I have entered in Jupyter Notebook:
- pip install findspark (no error)
sc = pyspark.SparkContext()
f = sc.textFile(‘recent-grads.csv’)
data = f.map(lambda line: line.split(’\n’))
Error - cannot find findspark, etc
Sounds like a big deal.
First, on github
Since jupyter’s python is IPython, what is written below must be executed.
Findspark can add a startup file to the current IPython profile so that the environment vaiables will be properly set and pyspark will be imported upon IPython startup. This file is created when
edit_profile is set to true.
Findspark can also add to the .bashrc configuration file if it is present so that the environment variables will be properly set whenever a new shell is opened. This is enabled by setting the optional argument
edit_rc to true.
If changes are persisted, findspark will not need to be called again unless the spark installation is moved.
Thanks. I had IT helping with the path, etc and it is working. I have tested in Jupyter Notebook.
Have a nice day.
from ktsh.tanaka.2020 to Isabella_Ahrens_Teix
Thank you for reading the advice.
Please have a nice day.