i trying flatten json file looks this:
{ "teams": [ { "teamname": "1", "members": [ { "firstname": "john", "lastname": "doe", "orgname": "anon", "phone": "916-555-1234", "mobile": "", "email": "john.doe@wildlife.net" }, { "firstname": "jane", "lastname": "doe", "orgname": "anon", "phone": "916-555-4321", "mobile": "916-555-7890", "email": "jane.doe@wildlife.net" } ] }, { "teamname": "2", "members": [ { "firstname": "mickey", "lastname": "moose", "orgname": "moosers", "phone": "916-555-0000", "mobile": "916-555-1111", "email": "mickey.moose@wildlife.net" }, { "firstname": "minny", "lastname": "moose", "orgname": "moosers", "phone": "916-555-2222", "mobile": "", "email": "minny.moose@wildlife.net" } ] } ]
}
i wish export excel table. current code this:
from pandas.io.json import json_normalize import json import pandas pd inputfile = 'e:\\teams.json' outputfile = 'e:\\teams.xlsx' f = open(inputfile) data = json.load(f) f.close() df = pd.dataframe(data) result1 = json_normalize(data, 'teams' ) print result1
results in output:
members teamname 0 [{u'firstname': u'john', u'phone': u'916-555-... 1 1 [{u'firstname': u'mickey', u'phone': u'916-555-... 2
there 2 members's data nested within each row. have output table displays 4 members' data plus associated teamname.
this 1 way it. should give ideas.
df = pd.concat( [ pd.concat([pd.series(m) m in t['members']], axis=1) t in data['teams'] ], keys=[t['teamname'] t in data['teams']] ) 0 1 1 email john.doe@wildlife.net jane.doe@wildlife.net firstname john jane lastname doe doe mobile 916-555-7890 orgname anon anon phone 916-555-1234 916-555-4321 2 email mickey.moose@wildlife.net minny.moose@wildlife.net firstname mickey minny lastname moose moose mobile 916-555-1111 orgname moosers moosers phone 916-555-0000 916-555-2222
to nice table team name , members rows, attributes in columns:
df.index.levels[0].name = 'teamname' df.columns.name = 'member' df.t.stack(0).swaplevel(0, 1).sort_index()
to team name , member actual columns, reset index.
df.index.levels[0].name = 'teamname' df.columns.name = 'member' df.t.stack(0).swaplevel(0, 1).sort_index().reset_index()
the whole thing
import json import pandas pd json_text = """{ "teams": [ { "teamname": "1", "members": [ { "firstname": "john", "lastname": "doe", "orgname": "anon", "phone": "916-555-1234", "mobile": "", "email": "john.doe@wildlife.net" }, { "firstname": "jane", "lastname": "doe", "orgname": "anon", "phone": "916-555-4321", "mobile": "916-555-7890", "email": "jane.doe@wildlife.net" } ] }, { "teamname": "2", "members": [ { "firstname": "mickey", "lastname": "moose", "orgname": "moosers", "phone": "916-555-0000", "mobile": "916-555-1111", "email": "mickey.moose@wildlife.net" }, { "firstname": "minny", "lastname": "moose", "orgname": "moosers", "phone": "916-555-2222", "mobile": "", "email": "minny.moose@wildlife.net" } ] } ] }""" data = json.loads(json_text) df = pd.concat( [ pd.concat([pd.series(m) m in t['members']], axis=1) t in data['teams'] ], keys=[t['teamname'] t in data['teams']] ) df.index.levels[0].name = 'teamname' df.columns.name = 'member' df.t.stack(0).swaplevel(0, 1).sort_index().reset_index()