shanenin Posted August 26, 2005 Report Share Posted August 26, 2005 I have been tryiong to parse an xml file using some string methods(python). If I use the split or splitlines method, I am getting a 'u' character in my list that is created. Is this u character some sort of formatting character(like a newlne) below is the example>>> tdoc.split()[u'<?xml', u'version="1.0"', u'?>', u'<team>', u'<player', u'age="27"', u'height="1.96m"', u'name="Mick', u'Fowler">', u'<points>17.1</points>', u'<rebounds>6.4</rebounds>', u'</player>', u'<player', u'age="29"', u'height="2.04m"', u'name="Ivan', u'Ivanovic">', u'<points>15.5</points>', u'<rebounds>7.8</rebounds>', u'</player>', u'</team>']>>> tdoc.splitlines()[u'<?xml version="1.0" ?>', u'<team>', u' <player age="27" height="1.96m" name="Mick Fowler">', u' <points>17.1</points>', u' <rebounds>6.4</rebounds>', u' </player>', u' <player age="29" height="2.04m" name="Ivan Ivanovic">', u' <points>15.5</points>', u' <rebounds>7.8</rebounds>', u' </player>', u'</team>']here is the original xml file<team> <player name='Mick Fowler' age='27' height='1.96m'> <points>17.1</points> <rebounds>6.4</rebounds> </player> <player name='Ivan Ivanovic' age='29' height='2.04m'> <points>15.5</points> <rebounds>7.8</rebounds> </player></team> Quote Link to post Share on other sites
jcl Posted August 26, 2005 Report Share Posted August 26, 2005 The 'u' prefix indicates that they're Unicode strings. Quote Link to post Share on other sites
shanenin Posted August 26, 2005 Author Report Share Posted August 26, 2005 thanks. not really sure how that factors into anything(yet) Quote Link to post Share on other sites
shanenin Posted August 27, 2005 Author Report Share Posted August 27, 2005 I noticed I also get these specail characters \r . My book does not mention these anywhere[ '</description>\r', '<pubDate>Sun, 07 Aug 2005 23:59:59 -0400</pubDate>\r', '<enclosure url="http://freetalklive.com/files/FTLpromocheerleading.mp3" length="1000000" type="audio/mpeg"/>\r', '</item>\r', '', '', '</channel>\r', '</rss>'] Quote Link to post Share on other sites
jcl Posted August 27, 2005 Report Share Posted August 27, 2005 '\r' is carriage return (ASCII 0x0d). Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.