You can use regular XPath functions to find the comments as you suggested:
comments = doc.xpath('.//div[starts-with(@id, "comment-")]')
But, for more complex matching, you could use regular expressions: with lxml, XPath supports regular expressions in the EXSLT namespace. See the official documentation Regular expressions in XPath.
Here is a demo:
from lxml import etree
content = """\
<body>
<div id="comment-1">
TEXT
</div>
<div id="comment-2">
TEXT
</div>
<div id="comment-3">
TEXT
</div>
<div id="note-4">
not matched
</div>
</body>
"""
doc = etree.XML(content)
# You must give the namespace to use EXSLT RegEx
REGEX_NS = "http://exslt.org/regular-expressions"
comments = doc.xpath(r'.//div[re:test(@id, "^comment-\d+$")]',
namespaces={'re': REGEX_NS})
To see the result, you can "dump" the matched nodes:
for comment in comments:
print("---")
etree.dump(comment)
You get:
---
<div id="comment-1">
TEXT
</div>
---
<div id="comment-2">
TEXT
</div>
---
<div id="comment-3">
TEXT
</div>