1、读取json数据(python)

sparkDF = spark.read.json("/FileStore/shared_uploads/2020_10_01_9_json.gz")

2、输出数据Schema

sparkDF.printSchema()

3、输出数据

display(sparkDF)

4、基于data中的数据创建视图

%sql
Create temporary view json_table
using json
options (path "/FileStore/shared_uploads/2020_10_01_9_json.gz")

5、查询数据

%sql 
select type as Event_Type, actor as Actor,repo as Repository,created_at as Date_Time from json_table limit 10;

6、对于json字段,查询指定字段

%sql 
select type as Event_Type, actor.login as Handle,repo.name as Repository,created_at as Date_Time from json_table limit 10;

Databricks基本操作-编程之家《新程序员》:云原生和全面数字化实践50位技术专家共同创作,文字、视频、音频交互阅读