18

给定一个clang中的AST对象,我怎样才能得到它背后的代码?我尝试在教程中编辑代码,并添加:

clang::SourceLocation _b = d->getLocStart(), _e = d->getLocEnd();
char *b = sourceManager->getCharacterData(_b),
      e = sourceManager->getCharacterData(_E);
llvm:errs() << std::string(b, e-b) << "\n";

但是,唉,它没有打印整个 typedef 声明,只有大约一半!打印时也出现了同样的现象Expr

如何打印并查看构成声明的整个原始字符串?

4

4 回答 4

19

使用Lexer模块:

clang::SourceManager *sm;
clang::LangOptions lopt;

std::string decl2str(clang::Decl *d) {
    clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd());
    clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0, *sm, lopt));
    return std::string(sm->getCharacterData(b),
        sm->getCharacterData(e)-sm->getCharacterData(b));
}
于 2012-06-22T10:06:37.517 回答
7

以下代码适用于我。

std::string decl2str(clang::Decl *d, SourceManager &sm) {
    // (T, U) => "T,,"
    string text = Lexer::getSourceText(CharSourceRange::getTokenRange(d->getSourceRange()), sm, LangOptions(), 0);
    if (text.size() > 0 && (text.at(text.size()-1) == ',')) //the text can be ""
        return Lexer::getSourceText(CharSourceRange::getCharRange(d->getSourceRange()), sm, LangOptions(), 0);
    return text;
}
于 2014-04-15T11:02:04.193 回答
2

正如答案的评论所指出的,所有其他答案似乎都有缺陷,所以我将发布我自己的代码,似乎涵盖了评论中提到的所有缺陷。

我相信这getSourceRange()将语句视为一系列标记,而不是一系列字符。这意味着,如果我们有 aclang::Stmt对应于FOO + BAR,则标记FOO在字符 1 处,标记+在字符 5 处,标记BAR在字符 7 处。getSourceRange()因此返回的 aSourceRange基本上意味着“此代码以标记 1 开始并结束令牌位于 7"。因此,我们必须使用来获取令牌clang::Lexer::getLocForEndOfToken(stmt.getSourceRange().getEnd())结束字符的实际字符位置,并将作为结束位置传递给. 如果我们不这样做,那么将返回,而不是我们可能想要的。BARclang::Lexer::getSourceTextclang::Lexer::getSourceText"FOO + ""FOO + BAR"

我不认为我的实现有评论中提到的@Steven Lu 的问题,因为这段代码使用了该clang::Lexer::getSourceText函数,根据 Clang 的源文档,该函数专门用于从一个范围中获取源文本。

此实现还考虑了@Ramin Halavati 的评论;我已经在一些代码上对其进行了测试,它确实返回了宏扩展的字符串。

这是我的实现:

/**
 * Gets the portion of the code that corresponds to given SourceRange, including the
 * last token. Returns expanded macros.
 * 
 * @see get_source_text_raw()
 */
std::string get_source_text(clang::SourceRange range, const clang::SourceManager& sm) {
    clang::LangOptions lo;

    // NOTE: sm.getSpellingLoc() used in case the range corresponds to a macro/preprocessed source.
    auto start_loc = sm.getSpellingLoc(range.getBegin());
    auto last_token_loc = sm.getSpellingLoc(range.getEnd());
    auto end_loc = clang::Lexer::getLocForEndOfToken(last_token_loc, 0, sm, lo);
    auto printable_range = clang::SourceRange{start_loc, end_loc};
    return get_source_text_raw(printable_range, sm);
}

/**
 * Gets the portion of the code that corresponds to given SourceRange exactly as
 * the range is given.
 *
 * @warning The end location of the SourceRange returned by some Clang functions 
 * (such as clang::Expr::getSourceRange) might actually point to the first character
 * (the "location") of the last token of the expression, rather than the character
 * past-the-end of the expression like clang::Lexer::getSourceText expects.
 * get_source_text_raw() does not take this into account. Use get_source_text()
 * instead if you want to get the source text including the last token.
 *
 * @warning This function does not obtain the source of a macro/preprocessor expansion.
 * Use get_source_text() for that.
 */
std::string get_source_text_raw(clang::SourceRange range, const clang::SourceManager& sm) {
    return clang::Lexer::getSourceText(clang::CharSourceRange::getCharRange(range), sm, clang::LangOptions());
}
于 2020-04-29T20:20:56.617 回答
1

除非涉及宏,否则 Elazar 的方法对我有用。以下更正解决了它:

std::string decl2str(clang::Decl *d) {
    clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd());
    if (b.isMacroID())
        b = sm->getSpellingLoc(b);
    if (e.isMacroID())
        e = sm->getSpellingLoc(e);
    clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0, *sm, lopt));
    return std::string(sm->getCharacterData(b),
        sm->getCharacterData(e)-sm->getCharacterData(b));
}
于 2016-09-20T16:06:21.340 回答