Disclaimer first: I am a python newbie.
I am using feedparser
version 6.0.8 (checked using pip freeze | grep feedparser
) to parse Twitter feeds from random Nitter instances (using twiiit.com). Python version is 3.9.2 on Linux MX. The problem is that according to docs (https://pythonhosted.org/feedparser/reference-bozo.html) the variable responsible for indicating whether a parsed rss is a well-formated XML is of integer type but in my case it is (almost) always a bool. What is even more troubling is that when bozo is of bool type it does not indicate if a parsed feed is an xml correctly.
At first I have thought that it has something to do with implicit int to bool conversion (0 -> false and 1 -> true) but it is not. Most of the time the result is a bool of value false and it is a valid feed (at least thunderbird can parse it correctly). However it is not a rule because sometimes it is false-bozo and thunderbird cannot parse it*.
Mnimal working example:
import requestsimport feedparserurl = 'http://twiiit.com/nws/rss'print("Requesting rss feed for:"+url)resp = requests.get(url)rss_content = resp.contentd=feedparser.parse(rss_content)print('bozo value:',d.bozo)if type(d.bozo) is int: print('bozo as int')elif type(d.bozo) is bool: print('bozo as bool')else : print('bozo is of unknown type')
What I expect is something like
bozo value:0bozo as int
but instead I have this:
bozo value: Falsebozo as bool
*it is a simplification, in order for thunderbird to update feeds I need to modify them so that ids from different Nitter instances are the same if they refer to the same tweet.