i've been writing simple method check if web address has subdirectories and, if does, separate them me list. code wrote should ignore last subdirectory url has (this mistake realized after noticed loop iterating 4 times while checking url single subdirectory).
here code:
import re def check_web_address(web_address): #set pattern, check if matches pattern = re.compile(r"[\w\-\.]*") pat_check = pattern.match(web_address) #if does, separate subdirs, assuming checked '/' earlier if pat_check: pattern_span = pat_check.span() web_add_no_subdir = web_address[pattern_span[0]:pattern_span[1]] raw_web_subs = web_address[pattern_span[1]:] web_subs = [] """check if there additional slash, separate our subdir if regex matches.""" slash = "/" slash in raw_web_subs[1:]: pat_span = pattern.match(raw_web_subs[1:]).span() real_end = pat_span[1]+1 web_subs.append(raw_web_subs[:real_end]) raw_web_subs = raw_web_subs[real_end:] separated = [ web_add_no_subdir, web_subs ] return separated else: return none
this code returns subdirectory, unittest says ran test successfully:
checked_add = wc.check_web_address("www.google.com/docs") self.assertequal(checked_add[0], 'www.google.com') self.assertequal(checked_add[1][0], '/docs')
so, tested following in python console:
>>test = "/docs" >>"/" in test[1:] false
also, if ask python print
raw_web_subs[1:]
before loop begins, string "docs", without forward slash.
what missing here?
as @tadhgmcdonald-jensen explained, happening python iterating on each 1 of characters, @evert's suggested using 'while' loop, gives result looking for.
i'll end using urllib.parse @blckknght suggested.
@thelazyscripter mentioned can done separate string using test = some_string_url.split('/'). more elegant solution had in mind.
thank everybody.