14

我真的很难理解 Twitter 如何期望其 API 的用户将其发送的纯文本推文转换为正确链接的 HTML。

这是交易:当您请求推文的详细数据时,Twitter 的 JSON API 会将这组信息发回:

{
    "created_at":"Wed Jul 18 01:03:31 +0000 2012",
    "id":225395341250412544,
    "id_str":"225395341250412544",
    "text":"This is a test tweet. #boring @nbc http://t.co/LUfDreY6 #skronk @crux http://t.co/VpuMlaDs @twitter",
    "source":"web",
    "truncated":false,
    "in_reply_to_status_id":null,
    "in_reply_to_status_id_str":null,
    "in_reply_to_user_id":null,
    "in_reply_to_user_id_str":null,
    "in_reply_to_screen_name":null,
    "user": <REDACTED>,
    "geo":null,
    "coordinates":null,
    "place":null,
    "contributors":null,
    "retweet_count":0,
    "entities":{
        "hashtags":[
            {
                "text":"boring",
                "indices":[22,29]
            },
            {
                "text":"skronk",
                "indices":[56,63]
            }
        ],
        "urls":[
            {
                "url":"http://t.co/LUfDreY6",
                "expanded_url":"http://www.twitter.com",
                "display_url":"twitter.com",
                "indices":[35,55]
            },
            {
                "url":"http://t.co/VpuMlaDs",
                "expanded_url":"http://www.example.com",
                "display_url":"example.com",
                "indices":[70,90]
            }
        ],
        "user_mentions":[
            {
                "screen_name":"nbc",
                "name":"NBC",
                "id":26585095,
                "id_str":"26585095",
                "indices":[30,34]
            },
            {
                "screen_name":"crux",
                "name":"Z. D. Smith",
                "id":407213,
                "id_str":"407213",
                "indices":[64,69]
            },
            {
                "screen_name":"twitter",
                "name":"Twitter",
                "id":783214,
                "id_str":"783214",
                "indices":[91,99]
            }
        ]
    },
    "favorited":false,
    "retweeted":false,
    "possibly_sensitive":false
}

对于这个问题,有趣的部分是 、 和 数组中的元素和条目text。Twitter告诉我们hastags,提及和url与数组一起出现在元素中的哪个位置......所以这里是问题的症结所在:hashtagsuser_mentionsurlstextindices

你如何使用这些indices数组?

您不能通过使用类似的内容循环每个链接元素来直接使用它们substr_replace,因为替换中的第一个链接元素text将使后续链接元素的所有索引值无效。您也不能使用substr_replace' 数组功能,因为它仅在您为第一个 arg 提供一个字符串数组而不是单个字符串时才有效(我已经对此进行了测试。结果......很奇怪)。

是否有一些函数可以同时用不同的替换字符串替换单个字符串中的多个索引分隔的子字符串?

4

7 回答 7

18

要使用 twitter 提供的索引,只需简单的替换即可直接收集您想要进行的替换,然后将它们向后排序。你可能会找到一种更聪明的方法来构建 $entities,无论如何我希望它们是可选的,所以我就这样亲吻了。

无论哪种方式,我的观点只是为了表明您不需要爆炸字符串和字符数等等。不管你怎么做,你需要做的就是从字符串的末尾开始,一直到字符串的开头,并且 twitter 的索引仍然有效。

<?php 

function json_tweet_text_to_HTML($tweet, $links=true, $users=true, $hashtags=true)
{
    $return = $tweet->text;

    $entities = array();

    if($links && is_array($tweet->entities->urls))
    {
        foreach($tweet->entities->urls as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = "<a href='".$e->expanded_url."' target='_blank'>".$e->display_url."</a>";
            $entities[] = $temp;
        }
    }
    if($users && is_array($tweet->entities->user_mentions))
    {
        foreach($tweet->entities->user_mentions as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = "<a href='https://twitter.com/".$e->screen_name."' target='_blank'>@".$e->screen_name."</a>";
            $entities[] = $temp;
        }
    }
    if($hashtags && is_array($tweet->entities->hashtags))
    {
        foreach($tweet->entities->hashtags as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = "<a href='https://twitter.com/hashtag/".$e->text."?src=hash' target='_blank'>#".$e->text."</a>";
            $entities[] = $temp;
        }
    }

    usort($entities, function($a,$b){return($b["start"]-$a["start"]);});


    foreach($entities as $item)
    {
        $return = substr_replace($return, $item["replacement"], $item["start"], $item["end"] - $item["start"]);
    }

    return($return);
}


?>
于 2014-08-26T20:31:18.647 回答
13

好的,所以我需要这样做,我解决了它。这是我写的函数。https://gist.github.com/3337428

function parse_message( &$tweet ) {
    if ( !empty($tweet['entities']) ) {
        $replace_index = array();
        $append = array();
        $text = $tweet['text'];
        foreach ($tweet['entities'] as $area => $items) {
            $prefix = false;
            $display = false;
            switch ( $area ) {
                case 'hashtags':
                    $find   = 'text';
                    $prefix = '#';
                    $url    = 'https://twitter.com/search/?src=hash&q=%23';
                    break;
                case 'user_mentions':
                    $find   = 'screen_name';
                    $prefix = '@';
                    $url    = 'https://twitter.com/';
                    break;
                case 'media':
                    $display = 'media_url_https';
                    $href    = 'media_url_https';
                    $size    = 'small';
                    break;
                case 'urls':
                    $find    = 'url';
                    $display = 'display_url';
                    $url     = "expanded_url";
                    break;
                default: break;
            }
            foreach ($items as $item) {
                if ( $area == 'media' ) {
                    // We can display images at the end of the tweet but sizing needs to added all the way to the top.
                    // $append[$item->$display] = "<img src=\"{$item->$href}:$size\" />";
                }else{
                    $msg     = $display ? $prefix.$item->$display : $prefix.$item->$find;
                    $replace = $prefix.$item->$find;
                    $href    = isset($item->$url) ? $item->$url : $url;
                    if (!(strpos($href, 'http') === 0)) $href = "http://".$href;
                    if ( $prefix ) $href .= $item->$find;
                    $with = "<a href=\"$href\">$msg</a>";
                    $replace_index[$replace] = $with;
                }
            }
        }
        foreach ($replace_index as $replace => $with) $tweet['text'] = str_replace($replace,$with,$tweet['text']);
        foreach ($append as $add) $tweet['text'] .= $add;
    }
}
于 2012-08-13T06:34:57.940 回答
7

这是一个极端情况,但如果一个实体包含在另一个实体中,则在 Styledev 的答案中使用 str_replace() 可能会导致问题。例如,如果先替换较短的实体,“我是天才!#me #mensa”可能会变成“我是天才!#me #me nsa”。

此解决方案避免了该问题:

<?php
/**
 * Hyperlinks hashtags, twitter names, and urls within the text of a tweet
 * 
 * @param object $apiResponseTweetObject A json_decoded() one of these: https://dev.twitter.com/docs/platform-objects/tweets
 * @return string The tweet's text with hyperlinks added
 */
function linkEntitiesWithinText($apiResponseTweetObject) {

    // Convert tweet text to array of one-character strings
    // $characters = str_split($apiResponseTweetObject->text);
    $characters = preg_split('//u', $apiResponseTweetObject->text, null, PREG_SPLIT_NO_EMPTY);

    // Insert starting and closing link tags at indices...

    // ... for @user_mentions
    foreach ($apiResponseTweetObject->entities->user_mentions as $entity) {
        $link = "https://twitter.com/" . $entity->screen_name;          
        $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
        $characters[$entity->indices[1] - 1] .= "</a>";         
    }               

    // ... for #hashtags
    foreach ($apiResponseTweetObject->entities->hashtags as $entity) {
        $link = "https://twitter.com/search?q=%23" . $entity->text;         
        $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
        $characters[$entity->indices[1] - 1] .= "</a>";         
    }

    // ... for http://urls
    foreach ($apiResponseTweetObject->entities->urls as $entity) {
        $link = $entity->expanded_url;          
        $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
        $characters[$entity->indices[1] - 1] .= "</a>";         
    }

    // ... for media
    foreach ($apiResponseTweetObject->entities->media as $entity) {
        $link = $entity->expanded_url;          
        $characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
        $characters[$entity->indices[1] - 1] .= "</a>";         
    }

    // Convert array back to string
    return implode('', $characters);

}
?>  
于 2013-03-09T03:46:02.303 回答
6

Jeff 的解决方案适用于英文文本,但当推文包含非 ASCII 字符时,它就被破坏了。此解决方案避免了该问题:

mb_internal_encoding("UTF-8");

// Return hyperlinked tweet text from json_decoded status object:
function MakeStatusLinks($status) 
{$TextLength=mb_strlen($status['text']); // Number of UTF-8 characters in plain tweet.
 for ($i=0;$i<$TextLength;$i++)
 {$ch=mb_substr($status['text'],$i,1); if ($ch<>"\n") $ChAr[]=$ch; else $ChAr[]="\n<br/>"; // Keep new lines in HTML tweet.
 }
if (isset($status['entities']['user_mentions']))
 foreach ($status['entities']['user_mentions'] as $entity)
 {$ChAr[$entity['indices'][0]] = "<a href='https://twitter.com/".$entity['screen_name']."'>".$ChAr[$entity['indices'][0]];
  $ChAr[$entity['indices'][1]-1].="</a>";
 }
if (isset($status['entities']['hashtags']))
 foreach ($status['entities']['hashtags'] as $entity)
 {$ChAr[$entity['indices'][0]] = "<a href='https://twitter.com/search?q=%23".$entity['text']."'>".$ChAr[$entity['indices'][0]];
  $ChAr[$entity['indices'][1]-1] .= "</a>";
 }
if (isset($status['entities']['urls']))
 foreach ($status['entities']['urls'] as $entity)
 {$ChAr[$entity['indices'][0]] = "<a href='".$entity['expanded_url']."'>".$entity['display_url']."</a>";
  for ($i=$entity['indices'][0]+1;$i<$entity['indices'][1];$i++) $ChAr[$i]='';
 }
if (isset($status['entities']['media']))
 foreach ($status['entities']['media'] as $entity)
 {$ChAr[$entity['indices'][0]] = "<a href='".$entity['expanded_url']."'>".$entity['display_url']."</a>";
  for ($i=$entity['indices'][0]+1;$i<$entity['indices'][1];$i++) $ChAr[$i]='';
 }
return implode('', $ChAr); // HTML tweet.
}
于 2013-07-14T18:20:20.543 回答
2

这是一个更新的答案,适用于 Twitter 的新扩展模式。它结合了@vita10gy 的答案和@Hugo 的评论(以使其与 utf8 兼容),并进行了一些小的调整以使用新的 api 值。

function utf8_substr_replace($original, $replacement, $position, $length) {
    $startString = mb_substr($original, 0, $position, "UTF-8");
    $endString = mb_substr($original, $position + $length, mb_strlen($original), "UTF-8");
    $out = $startString . $replacement . $endString;
    return $out;
}

function json_tweet_text_to_HTML($tweet, $links=true, $users=true, $hashtags=true) {
    // Media urls can show up on the end of the full_text tweet, but twitter doesn't index that url. 
    // The display_text_range indexes show the actual tweet text length.
    // Cut the string off at the end to get rid of this unindexed url.
    $return = mb_substr($tweet->full_text, $tweet->display_text_range[0],$tweet->display_text_range[1]);
    $entities = array();

    if($links && is_array($tweet->entities->urls))
    {
        foreach($tweet->entities->urls as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = " <a href='".$e->expanded_url."' target='_blank'>".$e->display_url."</a>";
            $entities[] = $temp;
        }
    }
    if($users && is_array($tweet->entities->user_mentions))
    {
        foreach($tweet->entities->user_mentions as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = " <a href='https://twitter.com/".$e->screen_name."' target='_blank'>@".$e->screen_name."</a>";
            $entities[] = $temp;
        }
    }
    if($hashtags && is_array($tweet->entities->hashtags))
    {
        foreach($tweet->entities->hashtags as $e)
        {
            $temp["start"] = $e->indices[0];
            $temp["end"] = $e->indices[1];
            $temp["replacement"] = " <a href='https://twitter.com/hashtag/".$e->text."?src=hash' target='_blank'>#".$e->text."</a>";
            $entities[] = $temp;
        }
    }

    usort($entities, function($a,$b){return($b["start"]-$a["start"]);});


    foreach($entities as $item)
    {
        $return =  utf8_substr_replace($return, $item["replacement"], $item["start"], $item["end"] - $item["start"]);
    }

    return($return);
}
于 2017-12-14T20:28:24.967 回答
0

这是vita10gy解决方案的 JavaScript 版本(使用 jQuery)

function tweetTextToHtml(tweet, links, users, hashtags) {

    if (typeof(links)==='undefined') { links = true; }
    if (typeof(users)==='undefined') { users = true; }
    if (typeof(hashtags)==='undefined') { hashtags = true; }

    var returnStr = tweet.text;
    var entitiesArray = [];

    if(links && tweet.entities.urls.length > 0) {
        jQuery.each(tweet.entities.urls, function() {
            var temp1 = {};
            temp1.start = this.indices[0];
            temp1.end = this.indices[1];
            temp1.replacement = '<a href="' + this.expanded_url + '" target="_blank">' + this.display_url + '</a>';
            entitiesArray.push(temp1);
        });
    }

    if(users && tweet.entities.user_mentions.length > 0) {
        jQuery.each(tweet.entities.user_mentions, function() {
            var temp2 = {};
            temp2.start = this.indices[0];
            temp2.end = this.indices[1];
            temp2.replacement = '<a href="https://twitter.com/' + this.screen_name + '" target="_blank">@' + this.screen_name + '</a>';
            entitiesArray.push(temp2);
        });
    }

    if(hashtags && tweet.entities.hashtags.length > 0) {
        jQuery.each(tweet.entities.hashtags, function() {
            var temp3 = {};
            temp3.start = this.indices[0];
            temp3.end = this.indices[1];
            temp3.replacement = '<a href="https://twitter.com/hashtag/' + this.text + '?src=hash" target="_blank">#' + this.text + '</a>';
            entitiesArray.push(temp3);
        });
    }

    entitiesArray.sort(function(a, b) {return b.start - a.start;});

    jQuery.each(entitiesArray, function() {
        returnStr = substrReplace(returnStr, this.replacement, this.start, this.end - this.start);
    });

    return returnStr;
}

然后,您可以像这样使用此功能...

for(var i in tweetsJsonObj) {
    var tweet = tweetsJsonObj[i];
    var htmlTweetText = tweetTextToHtml(tweet);

    // Do something with the formatted tweet here ...
}
于 2015-04-30T19:16:41.353 回答
0

关于 vita10gy 的帮助json_tweet_text_to_HTML(),我发现了一条无法正确格式化的推文:626125868247552000。

这条推文中有一个不间断的空间。我的解决方案是将函数的第一行替换为以下内容:

$return = str_replace("\xC2\xA0", ' ', $tweet->text);

此处介绍了执行str_replace()on 。&nbsp;

于 2015-09-09T14:04:22.070 回答