i trying scrape 9gag comment section sentiment analysis , label post positive or negative. ultimate goal train data of thousands of posts , predict sentiment of post based on comment count, post upvotes, top ten ten comment upvotes, , title of post.
i scraped hot section titles , upvotes when comes scraping comments, html parser wont show relevant tags. tried different libraries bs4, requests, pattern, urllib1/2. tried 'html.parser' instead of lxml.
my question 9gag comment section restricted scraping? if not, there reason why of parser not able tags?
update #2- here's code used-
url = url("http://9gag.com/gag/a1mzz1d") req = requests.get(url) soup = beautifulsoup(req.text, 'html.parser') soup.findall("div", attrs={"class":"comment-embed"})
the output looks , empty list- [ ]
the data loaded using react can little bit of parsing , data need in json format:
import requests urlparse import urljoin import ast base = "http://9gag.com/" # these params json. params = {"appid": "", "url": "", "count": "10", "level": "2", "order": "score", "mentionmapping": "true", "origin": "9gag.com"} js = "request url:http://comment-cdn.9gag.com/v1/cacheable/comment-list.json" requests.session() s: r = s.get(base) soup = beautifulsoup(r.content,"lxml") # links each actual page. links = [urljoin(base, a["href"]) in soup.select("a.badge-evt.point"")] link in links: cont = s.get(link).content soup = beautifulsoup(cont,"lxml") # params in script body script = soup.find("script", text=re.compile('appid')).text # convert dict can pull need key data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1]) params["appid"] = data["appid"] params["url"] = data["url"] page_json = s.get(js, params=params).json() dct in page_json["payload"]["comments"]: print(dct)
if run code using first url returned, get:
in [28]: requests.session() s: ....: r = s.get(base) ....: soup = beautifulsoup(r.content,"lxml") ....: links = [urljoin(base, a["href"]) in soup.select("a.comment.badge-evt")][:1] ....: link in links: ....: cont = s.get(link).content ....: soup = beautifulsoup(cont,"lxml") ....: script = soup.find("script", text=re.compile('appid')).text ....: data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1]) ....: params["appid"] = data["appid"] ....: params["url"] = data["url"] ....: page_json = s.get(js, params=params).json() ....: dct in page_json["payload"]["comments"]: ....: print(dct) ....: {u'hasnext': true, u'dislikecount': 0, u'text': u'this awkward watch ... , funny', u'userid': u'u_13759018032623', u'likecount': 343, u'orderkey': u'score_00000000004834_14651297124662', u'children': [{u'hasnext': false, u'dislikecount': 0, u'text': u'@twistedpickle.and fake.', u'userid': u'u_145548331532421082', u'likecount': 26, u'children': [], u'iscollapsed': 0, u'mediatext': u'@twistedpickle.and fake.', u'section': u'', u'mentionmapping': {u'@twistedpickle': u'abl7q1'}, u'commentid': u'c_146513113612585611', u'type': u'text', u'status': 0, u'parent': u'c_146512971246623391', u'timestamp': 1465131136, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'savage_ali', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/34323189_100_45.jpg', u'timestamp': u'1455483315', u'userid': u'u_145548331532421082', u'hashedaccountid': u'anbn66n', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/savage_ali'}, u'accountid': u'34323189', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513113612585611', u'level': 2, u'suppdata': {}, u'richtext': u'@twistedpickle.and fake.', u'childrentotal': 0, u'isanonymous': 0}], u'iscollapsed': 0, u'mediatext': u'this awkward watch ... , funny', u'section': u'', u'mentionmapping': {u'dummy': u''}, u'commentid': u'c_146512971246623391', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129712, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'twistedpickle', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/1870095_100_1.jpg', u'timestamp': u'1375901803', u'userid': u'u_13759018032623', u'hashedaccountid': u'abl7q1', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/twistedpickle'}, u'accountid': u'1870095', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146512971246623391', u'level': 1, u'suppdata': {}, u'richtext': u'this awkward watch ... , funny', u'childrentotal': 19, u'isanonymous': 0} {u'hasnext': true, u'dislikecount': 0, u'text': u'hahaha pantura', u'userid': u'u_143454521023534763', u'likecount': 231, u'orderkey': u'score_00000000004076_14649387351969', u'children': [{u'hasnext': false, u'dislikecount': 0, u'text': u'@deadfight nussittuna nukut paremmin', u'userid': u'u_141790386790069041', u'likecount': 39, u'children': [], u'iscollapsed': 0, u'mediatext': u'@deadfight nussittuna nukut paremmin', u'section': u'', u'mentionmapping': {u'@deadfight': u'aylgpy7'}, u'commentid': u'c_146513018381635287', u'type': u'text', u'status': 0, u'parent': u'c_146493873519691145', u'timestamp': 1465130183, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'lady_kappa', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/22251683_100_38.jpg', u'timestamp': u'1417903867', u'userid': u'u_141790386790069041', u'hashedaccountid': u'a5k8b5n', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/lady_kappa'}, u'accountid': u'22251683', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513018381635287', u'level': 2, u'suppdata': {}, u'richtext': u'@deadfight nussittuna nukut paremmin', u'childrentotal': 0, u'isanonymous': 0}], u'iscollapsed': 0, u'mediatext': u'hahaha pantura', u'section': u'', u'mentionmapping': {u'dummy': u''}, u'commentid': u'c_146493873519691145', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464938735, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'deadfight', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/27180133_100_2.jpg', u'timestamp': u'1434545210', u'userid': u'u_143454521023534763', u'hashedaccountid': u'aylgpy7', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/deadfight'}, u'accountid': u'27180133', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146493873519691145', u'level': 1, u'suppdata': {}, u'richtext': u'hahaha pantura', u'childrentotal': 16, u'isanonymous': 0} {u'hasnext': true, u'dislikecount': 0, u'text': u'http://i.memeful.com/media/post/omj28xm_700wa_0.gif', u'userid': u'u_141680114571912397', u'likecount': 225, u'orderkey': u'score_00000000003373_14649381081078', u'children': [{u'hasnext': false, u'dislikecount': 0, u'text': u'@shogun_ka_yo go', u'userid': u'u_144283683005248817', u'likecount': 2, u'children': [], u'iscollapsed': 0, u'mediatext': u'@shogun_ka_yo go', u'section': u'', u'mentionmapping': {u'@shogun_ka_yo': u'amqrlrw'}, u'commentid': u'c_146513150738658348', u'type': u'text', u'status': 0, u'parent': u'c_146493810810784782', u'timestamp': 1465131507, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'dergermanyball', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/29998985_100_29.jpg', u'timestamp': u'', u'userid': u'u_144283683005248817', u'hashedaccountid': u'a1dpxry', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/dergermanyball'}, u'accountid': u'29998985', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513150738658348', u'level': 2, u'suppdata': {}, u'richtext': u'@shogun_ka_yo go', u'childrentotal': 0, u'isanonymous': 0}], u'iscollapsed': 0, u'mediatext': u'http://i.memeful.com/media/post/omj28xm_700wa_0.gif', u'section': u'', u'mentionmapping': {u'dummy': u''}, u'commentid': u'c_146493810810784782', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464938108, u'embedmediameta': {u'embedimage': {u'type': u'animated', u'image': {u'url': u'http://img-comment-fun.9cache.com/media/287e9c03142644331422775855_700w_0.jpg', u'width': 400, u'height': 206}, u'animated': {u'url': u'http://img-comment-fun.9cache.com/media/287e9c03142644331422775855_700wa_0.gif', u'width': 400, u'height': 206}, u'video': {u'url': u'http://img-comment-fun.9cache.com/media/287e9c03142644331422775855_700wv_0.mp4', u'width': 400, u'height': 206}}}, u'user': {u'displayname': u'shogun_ka_yo', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/22391718_100_2.jpg', u'timestamp': u'1416801145', u'userid': u'u_141680114571912397', u'hashedaccountid': u'amqrlrw', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/shogun_ka_yo'}, u'accountid': u'22391718', u'permissions': []}, u'isurl': 1, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146493810810784782', u'level': 1, u'suppdata': {}, u'richtext': u'[url]http://i.memeful.com/media/post/omj28xm_700wa_0.gif[/url]', u'childrentotal': 4, u'isanonymous': 0} {u'hasnext': true, u'dislikecount': 0, u'text': u'now imagine if genders reversed', u'userid': u'u_143552720523387146', u'likecount': 179, u'orderkey': u'score_00000000003144_14651301155438', u'children': [{u'hasnext': false, u'dislikecount': 0, u'text': u'@rednotash hush little one. you're making sense now', u'userid': u'u_141363015125977644', u'likecount': 77, u'children': [], u'iscollapsed': 0, u'mediatext': u'@rednotash hush little one. you're making sense now', u'section': u'', u'mentionmapping': {u'@rednotash': u'aov8rmy'}, u'commentid': u'c_146513114535963914', u'type': u'text', u'status': 0, u'parent': u'c_146513011554386056', u'timestamp': 1465131145, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'srslydude', u'avatarurl': u'http://accounts-cdn.9gag.com/media/default-avatar/1_59_100_v0.jpg', u'timestamp': u'1413630151', u'userid': u'u_141363015125977644', u'hashedaccountid': u'aywvpzx', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/srslydude'}, u'accountid': u'21558777', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513114535963914', u'level': 2, u'suppdata': {}, u'richtext': u'@rednotash hush little one. you're making sense now', u'childrentotal': 0, u'isanonymous': 0}], u'iscollapsed': 0, u'mediatext': u'now imagine if genders reversed', u'section': u'', u'mentionmapping': {u'dummy': u''}, u'commentid': u'c_146513011554386056', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130115, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'rednotash', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/27823975_100_5.jpg', u'timestamp': u'1435527205', u'userid': u'u_143552720523387146', u'hashedaccountid': u'aov8rmy', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/rednotash'}, u'accountid': u'27823975', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513011554386056', u'level': 1, u'suppdata': {}, u'richtext': u'now imagine if genders reversed', u'childrentotal': 9, u'isanonymous': 0} {u'hasnext': true, u'dislikecount': 0, u'text': u'never let waif follow you? wouldnt follow if werent dickhead. women have sixth sense . know whats going on.', u'userid': u'u_145321627176216569', u'likecount': 78, u'orderkey': u'score_00000000002462_14651303108023', u'children': [{u'hasnext': false, u'dislikecount': 0, u'text': u'@marshmallowww if tell gender has nothing it? men have "sixth sense" too.', u'userid': u'u_143741207696358239', u'likecount': 56, u'children': [], u'iscollapsed': 0, u'mediatext': u'@marshmallowww if tell gender has nothing it? men have "sixth sense" too.', u'section': u'', u'mentionmapping': {u'@marshmallowww': u'ab693mb'}, u'commentid': u'c_146513102333226094', u'type': u'text', u'status': 0, u'parent': u'c_146513031080236628', u'timestamp': 1465131023, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'the_hidden', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/28267060_100_15.jpg', u'timestamp': u'1437412076', u'userid': u'u_143741207696358239', u'hashedaccountid': u'aop4wg2', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/the_hidden'}, u'accountid': u'28267060', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513102333226094', u'level': 2, u'suppdata': {}, u'richtext': u'@marshmallowww if tell gender has nothing it? men have "sixth sense" too.', u'childrentotal': 0, u'isanonymous': 0}], u'iscollapsed': 0, u'mediatext': u'never let waif follow you? wouldnt follow if werent dickhead. women have sixth sense . know whats going on.', u'section': u'', u'mentionmapping': {u'dummy': u''}, u'commentid': u'c_146513031080236628', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130310, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'marshmallowww', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/33477821_100_134.jpg', u'timestamp': u'1453216271', u'userid': u'u_145321627176216569', u'hashedaccountid': u'ab693mb', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/marshmallowww'}, u'accountid': u'33477821', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513031080236628', u'level': 1, u'suppdata': {}, u'richtext': u'never let waif follow you? wouldnt follow if werent dickhead. women have sixth sense . know whats going on.', u'childrentotal': 20, u'isanonymous': 0} {u'hasnext': true, u'dislikecount': 0, u'text': u'but correct can hit him? mean, "no violence" right? if drunk , doing stupid things, , husband go , hit her, correct too? because equality.', u'userid': u'u_143329792027606743', u'likecount': 54, u'orderkey': u'score_00000000001796_14651298735006', u'children': [{u'hasnext': false, u'dislikecount': 0, u'text': u'@pcmasteracer yes it's correct', u'userid': u'u_143073218849877360', u'likecount': 9, u'children': [], u'iscollapsed': 0, u'mediatext': u'@pcmasteracer yes it's correct', u'section': u'', u'mentionmapping': {u'@pcmasteracer': u'avnovdq'}, u'commentid': u'c_146513013516459530', u'type': u'text', u'status': 0, u'parent': u'c_146512987350064451', u'timestamp': 1465130135, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'kkakuka97', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/26450856_100_3.jpg', u'timestamp': u'1430732188', u'userid': u'u_143073218849877360', u'hashedaccountid': u'a4j4nwy', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/kkakuka97'}, u'accountid': u'26450856', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513013516459530', u'level': 2, u'suppdata': {}, u'richtext': u'@pcmasteracer yes it's correct', u'childrentotal': 0, u'isanonymous': 0}], u'iscollapsed': 0, u'mediatext': u'but correct can hit him? mean, "no violence" right? if drunk , doing stupid things, , husband go , hit her, correct too? because equality.', u'section': u'', u'mentionmapping': {u'dummy': u''}, u'commentid': u'c_146512987350064451', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129873, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'pcmasteracer', u'avatarurl': u'http://accounts-cdn.9gag.com/media/default-avatar/1_62_100_v0.jpg', u'timestamp': u'1433297920', u'userid': u'u_143329792027606743', u'hashedaccountid': u'avnovdq', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/pcmasteracer'}, u'accountid': u'27225255', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146512987350064451', u'level': 1, u'suppdata': {}, u'richtext': u'but correct can hit him? mean, "no violence" right? if drunk , doing stupid things, , husband go , hit her, correct too? because equality.', u'childrentotal': 7, u'isanonymous': 0} {u'hasnext': false, u'dislikecount': 0, u'text': u'i can hear 'bong!'', u'userid': u'u_13987497367750', u'likecount': 30, u'orderkey': u'score_00000000001168_14650124142865', u'children': [{u'hasnext': false, u'dislikecount': 0, u'text': u'@yajirobe__ not boing', u'userid': u'u_13775281935884', u'likecount': 4, u'children': [], u'iscollapsed': 0, u'mediatext': u'@yajirobe__ not boing', u'section': u'', u'mentionmapping': {u'@yajirobe__': u'avge1y5'}, u'commentid': u'c_146513060674619430', u'type': u'text', u'status': 0, u'parent': u'c_146501241428653553', u'timestamp': 1465130606, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'siophang', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/11455251_100_2.jpg', u'timestamp': u'1377528193', u'userid': u'u_13775281935884', u'hashedaccountid': u'abqk6qo', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/siophang'}, u'accountid': u'11455251', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513060674619430', u'level': 2, u'suppdata': {}, u'richtext': u'@yajirobe__ not boing', u'childrentotal': 0, u'isanonymous': 0}], u'iscollapsed': 0, u'mediatext': u'i can hear 'bong!'', u'section': u'', u'mentionmapping': {u'dummy': u''}, u'commentid': u'c_146501241428653553', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465012414, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'yajirobe__', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/16992199_100_5.jpg', u'timestamp': u'1398749736', u'userid': u'u_13987497367750', u'hashedaccountid': u'avge1y5', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/yajirobe__'}, u'accountid': u'16992199', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146501241428653553', u'level': 1, u'suppdata': {}, u'richtext': u'i can hear 'bong!'', u'childrentotal': 1, u'isanonymous': 0} {u'hasnext': false, u'dislikecount': 0, u'text': u'http://i.memeful.com/media/post/propbdo_700wa_0.gif', u'userid': u'u_13907047642371', u'likecount': 21, u'orderkey': u'score_00000000000967_14649476233018', u'children': [{u'hasnext': false, u'dislikecount': 0, u'text': u'@kaylaruffalo mfw', u'userid': u'u_13907047642371', u'likecount': 0, u'children': [], u'iscollapsed': 0, u'mediatext': u'@kaylaruffalo mfw', u'section': u'', u'mentionmapping': {u'@kaylaruffalo': u'adykgqj'}, u'commentid': u'c_146494763324897147', u'type': u'text', u'status': 0, u'parent': u'c_146494762330186947', u'timestamp': 1464947633, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'kaylaruffalo', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/16005886_100_9.jpg', u'timestamp': u'1390704764', u'userid': u'u_13907047642371', u'hashedaccountid': u'adykgqj', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/kaylaruffalo'}, u'accountid': u'16005886', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146494763324897147', u'level': 2, u'suppdata': {}, u'richtext': u'@kaylaruffalo mfw', u'childrentotal': 0, u'isanonymous': 0}], u'iscollapsed': 0, u'mediatext': u'http://i.memeful.com/media/post/propbdo_700wa_0.gif', u'section': u'', u'mentionmapping': {u'dummy': u''}, u'commentid': u'c_146494762330186947', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1464947623, u'embedmediameta': {u'embedimage': {u'type': u'animated', u'image': {u'url': u'http://img-comment-fun.9cache.com/media/872be169144077120242844098_700w_0.jpg', u'width': 500, u'height': 400}, u'animated': {u'url': u'http://img-comment-fun.9cache.com/media/872be169144077120242844098_700wa_0.gif', u'width': 500, u'height': 400}, u'video': {u'url': u'http://img-comment-fun.9cache.com/media/872be169144077120242844098_700wv_0.mp4', u'width': 500, u'height': 400}}}, u'user': {u'displayname': u'kaylaruffalo', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/16005886_100_9.jpg', u'timestamp': u'1390704764', u'userid': u'u_13907047642371', u'hashedaccountid': u'adykgqj', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/kaylaruffalo'}, u'accountid': u'16005886', u'permissions': []}, u'isurl': 1, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146494762330186947', u'level': 1, u'suppdata': {}, u'richtext': u'[url]http://i.memeful.com/media/post/propbdo_700wa_0.gif[/url]', u'childrentotal': 1, u'isanonymous': 0} {u'hasnext': false, u'dislikecount': 0, u'text': u'look @ dude in red shirt run xd', u'userid': u'u_144176454299618603', u'likecount': 15, u'orderkey': u'score_00000000000806_14651298710300', u'children': [{u'hasnext': false, u'dislikecount': 0, u'text': u'@crazybrownguy knew next', u'userid': u'u_13976607580627', u'likecount': 1, u'children': [], u'iscollapsed': 0, u'mediatext': u'@crazybrownguy knew next', u'section': u'', u'mentionmapping': {u'@crazybrownguy': u'aggwl5q'}, u'commentid': u'c_146514413390208345', u'type': u'text', u'status': 0, u'parent': u'c_146512987103009031', u'timestamp': 1465144133, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'lightfoot2012', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/17248879_100_6.jpg', u'timestamp': u'1397660758', u'userid': u'u_13976607580627', u'hashedaccountid': u'axzpvbp', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/lightfoot2012'}, u'accountid': u'17248879', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146514413390208345', u'level': 2, u'suppdata': {}, u'richtext': u'@crazybrownguy knew next', u'childrentotal': 0, u'isanonymous': 0}], u'iscollapsed': 0, u'mediatext': u'look @ dude in red shirt run xd', u'section': u'', u'mentionmapping': {u'dummy': u''}, u'commentid': u'c_146512987103009031', u'type': u'text', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465129871, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'crazybrownguy', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/29662036_100_10.jpg', u'timestamp': u'1441764542', u'userid': u'u_144176454299618603', u'hashedaccountid': u'aggwl5q', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/crazybrownguy'}, u'accountid': u'29662036', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146512987103009031', u'level': 1, u'suppdata': {}, u'richtext': u'look @ dude in red shirt run xd', u'childrentotal': 1, u'isanonymous': 0} {u'hasnext': true, u'dislikecount': 0, u'text': u'http://i.memeful.com/media/post/krp6z2w_700wa_0.gif', u'userid': u'u_144337172763285563', u'likecount': 5, u'orderkey': u'score_00000000000626_14651301539010', u'children': [{u'hasnext': false, u'dislikecount': 0, u'text': u'@wat_ya_doin agree wife', u'userid': u'u_144337172763285563', u'likecount': 3, u'children': [], u'iscollapsed': 0, u'mediatext': u'@wat_ya_doin agree wife', u'section': u'', u'mentionmapping': {u'@wat_ya_doin': u'ay8yrom'}, u'commentid': u'c_146513018506335085', u'type': u'text', u'status': 0, u'parent': u'c_146513015390105680', u'timestamp': 1465130185, u'embedmediameta': {u'dummy': []}, u'user': {u'displayname': u'wat_ya_doin', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/29948571_100_6.jpg', u'timestamp': u'', u'userid': u'u_144337172763285563', u'hashedaccountid': u'ay8yrom', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/wat_ya_doin'}, u'accountid': u'29948571', u'permissions': []}, u'isurl': 0, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513018506335085', u'level': 2, u'suppdata': {}, u'richtext': u'@wat_ya_doin agree wife', u'childrentotal': 0, u'isanonymous': 0}], u'iscollapsed': 0, u'mediatext': u'http://i.memeful.com/media/post/krp6z2w_700wa_0.gif', u'section': u'', u'mentionmapping': {u'dummy': u''}, u'commentid': u'c_146513015390105680', u'type': u'media', u'status': 0, u'parent': u'c_146493707813378457', u'timestamp': 1465130153, u'embedmediameta': {u'embedimage': {u'type': u'animated', u'image': {u'url': u'http://img-comment-fun.9cache.com/media/be90178a145186181304494323_700w_0.jpg', u'width': 319, u'height': 260}, u'animated': {u'url': u'http://img-comment-fun.9cache.com/media/be90178a145186181304494323_700wa_0.gif', u'width': 319, u'height': 260}, u'video': {u'url': u'http://img-comment-fun.9cache.com/media/be90178a145186181304494323_700wv_0.mp4', u'width': 318, u'height': 260}}}, u'user': {u'displayname': u'wat_ya_doin', u'avatarurl': u'http://accounts-cdn.9gag.com/media/avatar/29948571_100_6.jpg', u'timestamp': u'', u'userid': u'u_144337172763285563', u'hashedaccountid': u'ay8yrom', u'profileurls': {u'a_dd8f2b7d304a10edaf6f29517ea0ca4100a43d1b': u'http://9gag.com/u/wat_ya_doin'}, u'accountid': u'29948571', u'permissions': []}, u'isurl': 1, u'islike': {u'value': 0}, u'permalink': u'http://9gag.com/gag/a4ym4n1#cs_comment_id=c_146513015390105680', u'level': 1, u'suppdata': {}, u'richtext': u'[url]http://i.memeful.com/media/post/krp6z2w_700wa_0.gif[/url]', u'childrentotal': 3, u'isanonymous': 0}
as example can pull text dct iterate on dct["children"]
more comments:
in [30]: params = {"appid": "", ....: "url": "", ....: "count": "2", ....: "level": "2", ....: "order": "score", ....: "mentionmapping": "true", ....: "origin": "9gag.com"} in [31]: js = "request url:http://comment-cdn.9gag.com/v1/cacheable/comment-list.json" in [32]: requests.session() s: ....: r = s.get(base) ....: soup = beautifulsoup(r.content,"lxml") ....: links = [urljoin(base, a["href"]) in soup.select("a.badge-evt.point")][:1] ....: link in links: ....: cont = s.get(link).content ....: soup = beautifulsoup(cont,"lxml") ....: script = soup.find("script", text=re.compile('appid')).text ....: data = ast.literal_eval(script[script.find("{"):script.rfind("}") + 1]) ....: params["appid"] = data["appid"] ....: params["url"] = data["url"] ....: page_json = s.get(js, params=params).json() ....: dct in page_json["payload"]["comments"]: ....: print(dct["text"]) ....: child in dct["children"]: ....: print(child["text"]) ....: once again post made has no idea true love is. true love jealous, painful, , difficult. it's battle be. you're either fighting better person, fighting life give other person life deserve or fighting other person. true love worth of it, beautiful, kind, gentle , warm. no relationship perfect. there not "8 ways know". 1 one put shit @ same time make want make better person. true love on nerves, piss off, hurt you, love you, hold when can't , forgive you. true love when find can stand beside through anything, never want hurt when find can trust no matter what. no 1 perfect , there more 1 person in world can fall in love with, when find person, fi @celticdraconian true comment complaining lead straight "friendzone" comment saying "friendzone" not thing.
you can see changed param count 2, data can set high number "count":"1000"
data if kept loading more comments on page: