You are reading the article Pyspark Create Dataframe From List updated in September 2023 on the website Chivangcangda.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested October 2023 Pyspark Create Dataframe From List
Introduction to PySpark Create DataFrame from ListPySpark Create DataFrame from List is a way of creating of Data frame from elements in List in PySpark. This conversion includes the data that is in the List into the data frame which further applies all the optimization and operations in PySpark data model. The iteration and data operation over huge data that resides over a list is easily done when converted to a data frame, several related data operations can be done by converting the list to a data frame.
Start Your Free Software Development Course
Web development, programming languages, Software testing & others
Syntax of PySpark Create DataFrame from ListGiven below is the syntax mentioned:
data1 = [["Arpit", "ENGG", "BANGALORE"], ... ["Anand", "PG", "DELHI"], ... ["Maz","MEDICAL","CHENNAI"]] columns1 = ["NAME", "PROFESSION", "LOCATION"] df = spark.createDataFrame(data, columns)
Data1: The list of data that is passed to be created as a Data frame.
Columns1: The column schema name that needs to be pass on.
df: spark.createDataframe to be used for the creation of dataframe. This takes up two-parameter the one with data and column schema that will be created.
Screenshot:
Working of DataFrame from List in PySparkGiven below shows how to Create DataFrame from List works in PySpark:
The list is an ordered collection that is used to store data elements with duplicates values allowed. The data are stored in the memory location in a list form where a user can iterate the data one by one are can traverse the list needed for analysis purposes. This iteration or merging of data with another list sometimes is a costly operation so the Spark.createdataframe function takes the list element as the input with a schema that converts a list to a data frame and the user can use all the data frame-related operations thereafter.
They are converted in a data frame and the data model is much more optimized post creation of data frame, this can be treated as a table element where certain SQL operations can also be done. The data frame post-analysis of result can be converted back to list creating the data element back to list items.
Examples of PySpark Create DataFrame from ListGiven below shows some examples of how PySpark Create DataFrame from List operation works:
Example #1Let’s start by creating a simple List in PySpark.
List Creation:
Code:
data1 = [["Arpit", "ENGG", "BANGALORE"], ... ["Anand", "PG", "DELHI"], ... ["Maz","MEDICAL","CHENNAI"]]Let’s create a defined Schema that will be used to create the data frame.
columns1 = ["NAME", "PROFESSION", "LOCATION"]The Spark.createDataFrame in PySpark takes up two-parameter which accepts the data and the schema together and results out data frame out of it.
df = spark.createDataFrame(data1, columns1) df.printSchema() root |-- NAME: string (nullable = true) |-- PROFESSION: string (nullable = true) |-- LOCATION: string (nullable = true)Let’s check the data by using the data frame .show() that prints the converted data frame in PySpark data model.
df.show()Output:
Example #2The creation of a data frame in PySpark from List elements.
The struct type can be used here for defining the Schema. The schema can be put into spark.createdataframe to create the data frame in the PySpark.
Let’s import the data frame to be used.
Code:
import pyspark from chúng tôi import SparkSession, Row from pyspark.sql.types import StructType,StructField, StringType c1 = StructType([StructField('Name',StringType(),True),StructField('Profession',StringType(),True) , StructField('Location',StringType(), True)]) df = spark.createDataFrame(data1,c1) df.show() Example #3Using the row type as List. Insert the list elements as the Row Type and pass it to the parameter needed for the creation of the data frame in PySpark.
Code:
e = [Row("Max","Doctor","USA"),Row("Mike","Enterprenur","UX")] df = spark.createDataFrame(e,c1) df.show()Output:
These are the method by which a list can be created to Data Frame in PySpark.
Conclusion Recommended ArticlesWe hope that this EDUCBA information on “PySpark Create Dataframe from List” was beneficial to you. You can view EDUCBA’s recommended articles for more information.
PySpark Round
PySpark Column to List
PySpark Select Columns
PySpark Join
You're reading Pyspark Create Dataframe From List
Update the detailed information about Pyspark Create Dataframe From List on the Chivangcangda.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!