Apache spark distributed application, using pyspark in google colab.

  Develop an Apache Spark application per provided specifications and Crunchbase Open Data Map organizations dataset download, using PySpark in Google Colab.


Use the Week 11 Class Exercise downloads a reference:

  • Create a new notebook in Google Colab
  • Download Crunchbase ODM Orgs CSV download file and upload it to the “Files” section in your Colab notebook (may take a few minutes to upload)
  • Read the Crunchbase Orgs dataset into Spark DataFrame

Implement PySpark code using DataFrames, RDDs or Spark UDF functions:

  1. Find all entities with the name that starts with a letter “F” (e.g. Facebook, etc.):
    • print the count and show() the resulting Spark DataFrame
  2. Find all entities located in New York City:
    • print the count and show() the resulting Spark DataFrame
  3. Add a “Blog” column to the DataFrame with the row entries set to 1 if the “domain” field contains “blogspot.com”, and 0 otherwise.
    • show() only the records with the “Blog” field marked as 1
  4. Find all entities with names that are palindromes (name reads the same way forward and reverse, e.g. madam):
    • print the count and show() the resulting Spark DataFrame 
Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.