flatten nested json in spark | Lec-20 | most requested video

In this video I have talked about how you can flatten your nested json in spark.
Directly connect with me on:- topmate.io/manish_kumar25
Download data from here:- github.com/manisnitt/resturan..., www.kaggle.com/datasets/shrut...
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj

Пікірлер: 44

  • @Ajay_C_JadhavIII
    @Ajay_C_JadhavIII2 күн бұрын

    Are bhai ye admi abhi tak viral kaise nahi hua he deserve it

  • @da_nalyst
    @da_nalyst11 ай бұрын

    Thank you Manish Bhai, very helpful

  • @nikharjain5876
    @nikharjain58764 ай бұрын

    Useful content, Much thanks Manish bhai :)

  • @da_nalyst
    @da_nalyst11 ай бұрын

    Thank you Manish Bhai for this gem

  • @nayanjyotibhagawati939
    @nayanjyotibhagawati93911 ай бұрын

    Very helpful video.. ek interview question tha .. how to validate schema and null values.. please ek real time scenario as eg le kar bata do

  • @fashionate6527
    @fashionate6527Ай бұрын

    thanks for great quality content

  • @mantukumar-qn9pv
    @mantukumar-qn9pv11 ай бұрын

    Thank you Guru!

  • @Ronak-Data-Engineer
    @Ronak-Data-Engineer11 ай бұрын

    Very helpful

  • @adarsharora6097
    @adarsharora60975 ай бұрын

    Thanks Manish!

  • @shayankabasi160
    @shayankabasi1609 ай бұрын

    Good work upload something on streaming

  • @HanuamnthReddy
    @HanuamnthReddy6 ай бұрын

    ThANK U GURUGHEE...

  • @niladridey9666
    @niladridey966611 ай бұрын

    thanks for quality content.very helpful for fresher...

  • @roshan_off1955
    @roshan_off195511 ай бұрын

    Yar manish kuch ispe v video banao Cv p kya project dale Aur Different technology se Data me switch karna hai to kya resume pe hona chaiye

  • @nayanjyotibhagawati939
    @nayanjyotibhagawati93911 ай бұрын

    Please add a video on how to handle null value and how to validate a scheme

  • @shreyaspatil4861
    @shreyaspatil48616 ай бұрын

    Thanks very much for the tutorial :) , I have a query regarding reading in json files. so i have an array of structs where each struct has a different structure/schema. And based on a certain property value of struct I apply filter to get that nested struct , however when I display using printschema it contains fields that do not belong to that object but are somehow being associated with the object from the schema of other structs , how can i possibly fix this issue ?

  • @____prajwal____
    @____prajwal____6 ай бұрын

    Thanks. How can be extend this in case we have a stringified json and we need json fields inside that .

  • @DeepakSingh-nc2wf
    @DeepakSingh-nc2wf11 ай бұрын

    Bhai, there are much simpler functions like json_tuple to extract columns from nested json inspite of exploding columns.

  • @manish_kumar_1

    @manish_kumar_1

    11 ай бұрын

    Oh I didn't know. Let me check that.

  • @Tanveer_Shaikh_330
    @Tanveer_Shaikh_33011 ай бұрын

    RDD k practical ya theory quetions aate hai kya interview me?

  • @RahulRathore-wj9uy
    @RahulRathore-wj9uy3 ай бұрын

    can we define our own schema using this json

  • @user-dl3ck6ym4r
    @user-dl3ck6ym4r4 ай бұрын

    is nested data and complex data both are same?

  • @gauravsingh-gn4zz
    @gauravsingh-gn4zz8 ай бұрын

    Hello Manish , Just one doubt , what if we have 100 columns of struct type and 100 columns of 100 type. Should we write explode and .column 200 times. Or is there any other way please help to this find out. Thanks

  • @user-sl4ry1rg1u
    @user-sl4ry1rg1u4 ай бұрын

    mere data me leveling nahi he, lekin ek column me data ke andar list k andar list he usme schema me bhi string dikha raha he or leveling nahi dikha raha, any idea?

  • @poojajoshi871
    @poojajoshi87111 ай бұрын

    .select use karke we are selecting to expload can we use withColumn also ?

  • @manish_kumar_1

    @manish_kumar_1

    11 ай бұрын

    Yes it will work using withColumn too

  • @wayzonic
    @wayzonic11 ай бұрын

    How many lectures remain for complete this series? Please start SQL playlist also.

  • @manish_kumar_1

    @manish_kumar_1

    11 ай бұрын

    After 8-10 videos new playlist will come

  • @user-np1ww4ue4d
    @user-np1ww4ue4d11 ай бұрын

    i am learning python but when i go to geeks for geeks to solve easy question i cant be able to solve them like runner up questions,or etc , can you guide me regarding this

  • @manish_kumar_1

    @manish_kumar_1

    11 ай бұрын

    Starting me tough lagta hai. But slowly aapko samjhne lagaega, pattern to solve questions.

  • @VIVEKSINGH-us6he
    @VIVEKSINGH-us6he11 ай бұрын

    how to make a generic json parser(flatnner) function, do u have that code , could you please share , here u have hard coded, but any generic funciton

  • @manish_kumar_1

    @manish_kumar_1

    11 ай бұрын

    I have not written yet. I will try to write a generic function that will flatten the entire json into dataframe

  • @ETLMasters

    @ETLMasters

    11 ай бұрын

    I think you are talking about this: from pyspark.sql.types import * from pyspark.sql.functions import col, posexplode_outer def flattenDataFrame(explodeDF): DFSchema = explodeDF.schema fields = DFSchema.fields fieldNames = DFSchema.fieldNames() fieldLength = len(fieldNames) for i in range(fieldLength): field = fields[i] fieldName = field.name fieldDataType = field.dataType if isinstance(fieldDataType, ArrayType): fieldNameExcludingArray = list(filter(lambda colName: colName != fieldName, fieldNames)) fieldNamesAndExplode = fieldNameExcludingArray + ["posexplode_outer({0}) as ({1}, {2})".format(fieldName, fieldName+"_pos", fieldName)] arrayDF = explodeDF.selectExpr(*fieldNamesAndExplode) return flattenDataFrame(arrayDF) elif isinstance(fieldDataType, StructType): childFieldNames = fieldDataType.names structFieldNames = list(map(lambda childname: fieldName +"."+childname, childFieldNames)) newFieldNames = list(filter(lambda colName: colName != fieldName, fieldNames)) + structFieldNames renamedCols = map(lambda x: x.replace(".", "_"), newFieldNames) zipAliasColNames = zip(newFieldNames, renamedCols) aliasColNames = map(lambda y: col(y[0]).alias(y[1]), zipAliasColNames) structDF = explodeDF.select(*aliasColNames) return flattenDataFrame(structDF) return explodeDF

  • @poojajoshi871
    @poojajoshi87111 ай бұрын

    Hi Sir, How many videos still left to complete.

  • @manish_kumar_1

    @manish_kumar_1

    11 ай бұрын

    A lot of things to learn but after 8-10 videos we will move forward with others topics.

  • @satishlovewanshi2540
    @satishlovewanshi254010 ай бұрын

    data = [("openai",'[{"name":"ram","work":"salesman"}]'), ("tech support",'[{"name":"lakhan","work":"service man","lname":"mishra"}]'), ("data operator ",'[{"name":"lakhan","work":"service man","salary":"5000","System":"del"}]')] Bhaiya ji Jo data upar diya h ise kese flatten karenge ye mere client project ka sample data h please help me

  • @DedloxGMR
    @DedloxGMR9 ай бұрын

    Manish bhai I see all your videos of this play list but mujhe meri problem kas answer nhi mila me nested json dataset pr work kr rha hu Wo load to ho rha hai but show me corrupt record arha hai mene uska schema type change kia to jo dataset hai Wase he show horha hai json me me Ise kase load kru mene multiline bhi use kia

  • @manish_kumar_1

    @manish_kumar_1

    9 ай бұрын

    Aap mujhe may be a smaller data set mail karo in a file or linked par bhejo

  • @DedloxGMR

    @DedloxGMR

    9 ай бұрын

    @@manish_kumar_1 bhai linked in pr nhi horha aap mail bata dijiye

  • @dineshughade6741
    @dineshughade67413 ай бұрын

    Hello manish, Ciuld ypu provide the json file which you have used here?

  • @manish_kumar_1

    @manish_kumar_1

    3 ай бұрын

    Data download karne ka link description me hai

  • @dineshughade6741

    @dineshughade6741

    3 ай бұрын

    Oh Great, thanks Manish

  • @saisri6404
    @saisri640410 ай бұрын

    you havn't started this vedio with `Possible Interview Questions`😅

  • @manish_kumar_1

    @manish_kumar_1

    10 ай бұрын

    Code hi likhwayenge isme