4

我正在尝试使用 JPA 解决 GAE 的百万扇出问题。如果我理解正确,我应该有以下实体,比如 Twitter(只是一个例子):

public User {
    @Id Key id;
    String name;
    String displayName;
    List<Key> subscribers;  // users
}

public Tweet {
    @Id Key id;
    User tweetMaker;
    String message;
}

public TweetIndex {
    @Id Key id;
    Key tweetMaker;        // user
    List<Key> subscribers; // users
}

发布推文时,将保存 Tweet 对象,并在 tweetMaker 是发布推文的用户的位置保存 TweetIndex,并将订阅者从 User 对象复制到 TweetIndex 中。然后我会在 TweetIndex 中查询订阅者以获取特定订阅者的消息。

  1. 那你有这个权利吗?对我来说,事情变得模糊的地方是我希望订阅者被存储到一个多值属性中。由于多值属性只能有 5000 个条目,我认为应该为每 5000 个订阅者 ID 重复 TweetIndex。
  2. 什么控制将多值属性分成 5000 个组?我需要管理代码中的重复保存吗?
  3. 我将如何存储原始订阅者列表?在我看来,用户对象中的订阅者列表也将被限制为相同的 5000 限制。

感谢您的任何答案/见解/建议!

4

1 回答 1

1

1) 那你有这个权利吗?-> 索引时,多值属性列表大小的种类限制在 20K 左右(这是您的情况,因为您将针对订阅者 ID 运行查询)Google App Engine 数据存储的 ListProperty 的最大大小/限制是多少? 总而言之,在这种用例中您将面临的限制是: - 索引多值属性大小 (20K) - 实体大小 (1MB) - 除非您在其中存储 blob,否则应该没问题

2)故障需要手动处理,因为我不知道有任何持久性框架可以做到这一点。Objectify 是唯一一个足够专门化 GAE 数据存储以具有这种功能的持久性框架,尽管如此 IDK,我并没有使用它。

3) 您需要清楚地了解促使您在 GAE 数据存储上对用例进行建模的约束。在我看来,您仍然深受关系数据库建模的影响:

由于您正在为数百万用户进行规划,因此您正在为规模和性能构建您的应用程序。这些“连接”正是您必须避免的,这就是为什么您一开始就没有使用 RDBMS。关键是:重复!非规范化,以便您的数据与您的用例匹配。

public class UserEntity {

    @Id Key id;
    String name;

    /** INDEXED : to retrieve a user by display name */
    String displayName;

    /** For the sake of the example below */
    int tweetCount;

    /** 
     * USE CASE : See a user's followers from his "profile" page.
     *
     * Easily get subscribers data from your user entity.
     * Duplicate UserEntity (this object) 's data in the UserSubscriberEntity.
     * You just need to run an ancestor query on UserSubscriberEntity using the User id.
     */
    List<UserSubscriberChildEntity> subscribers;

}

/** Duplicate user data in this entity, retrieved easily with an ancestor query */
public class UserSubscriberChildEntity {
    /** The id of this entity */
    @Id Key subscriberId;
    /** Duplicate your User Entity data */
    String name;
    String displayName;
    /** The id from the UserEntity referenced */
    String userId;
}

public class TweetEntity {
    @Id Key id;

    /**
     * The actual text message
     */
    String tweetContent;

    /**
     * USE CASE : display the tweet maker name alongside the tweet content.
     *
     * Duplicate user data to prevent an expensive join when not needed.
     * You will always need to display this along with the tweet content !
     * Model your entity based on what you want to see when you display them
     */
    String tweetMakerName;
    String tweetMakerDisplayName;
    /**
     * USE CASE 
     * 1) to retrieve tweets MADE by a given user
     * 2) In case you actually need to access the User entity
     *    (for example, if you remove this tweet and want to decrease the user tweet counter)
     *
     * INDEXED 
     */
    Key tweetMakerId;

    /**
     * USE CASE : display tweet subscribers from the "tweet page"
     * 
     * Same as "UserSubscriberChildEntity", retrieve data fast by duplicating
     */
    List<TweetSubscriberChildEntity> subscribers;
}

现在的核心问题是:如何检索“一位用户订阅的所有推文”?

跨实体分片您的订阅:

/**
 * USE CASE : Retrieve tweets one user subscribed to
 *
 * Same goes for User subscription
 */
public class TweetSubscriptionShardedEntity {
    /** unused */
    @Id Key shardKey;
    /** INDEXED : Tweet reference */
    Key tweetId;
    /** INDEXED : Users reference */
    List<Key> userKeys;
    /** INDEXED : subscriber count, to retrieve shards that are actually under the limitation of 20K */
    int subscribersCount = 0;

    /**
     * Add a subscriber and increment the subscriberCount
     */ 
    public void addSubscriber(Key userId) {
        userKeys.add(userId);
        subscribersCount++;
    }
}

将所有内容连接在一起的示例推文服务:

/**
 * Pseudo code
 */
public class TweetService {

    public List<TweetEntity> getTweetsSubscribed(Key userId) {
        List<TweetEntity> tweetsFollowed = new ArrayList<TweetEntity>;
        // Get all the subscriptions from a user
        List<TweetSubscriberShardedEntity> shards = datastoreService.find("from TweetSubscriberShardedEntity  where userKeys contains (userId)");
        // Iterate over each subscription to retrieve the complete Tweet
        for (TweetSubscriberShardedEntity shard : shards) {
            TweetEntity tweet = datastoreService.get(TweetEntity.class, shard.getTweetId);
            tweetsFollowed.add(tweet);
        }
        return tweetsFollowed;
    }

    public void subscribeToTweet(Key subscriberId, Key tweetId) {
        TweetSubscriberShardedEntity shardToUse = null;
        // Only get the first shard with under 20000 subscribers 
        TweetSubscriberShardedEntity shardNotFull = datastoreService.find("
        FROM TweetSubscriberShardedEntity  
        WHERE tweetId == tweetId 
        AND userKeys contains (subscriberId)
        AND subscribersCount < 20000 
        LIMIT 1");
        if (shardNotFull == null) {
            // If no shard exist create one
            shardToUse = new TweetSubscriberShardedEntity();
        }
        else {
            shardToUse = shardNotFull;
        }
        // Link user and tweet
        shardToUse.setTweet(tweetId);
        shardToUse.getUserKeys().add(subscriberId);
        // Save shard
        datastoreService.put(shardToUse);
    }

    /**
     * Hard to put in a transaction with so many entities updated !
     * See cross entity group docs for more info.
     */
    public void createTweet(UserEntity creator, TweetEntity newTweet) {

        creator.tweetCount++;
        newTweet.tweetMakerName = creator.name;
        newTweet.tweetMakerDisplayName = creator.displayName;
        newTweet.tweetMakerId = creator.id;

        // Duplicate User subscribers to Tweet
        for(UserSubscriberChildEntity userSubscriber : creator.subcribers) {
            // Create a Tweet child entity 
            TweetSubscriberChildEntity tweetSubscriber = new TweetSubscriberChildEntity();
            tweetSubscriber.name = userSubscriber.name;
            // ... (duplicate all data)
            newTweet.add(tweetSubscriber);

            // Create a shard with the previous method !!
            subscribeToTweet(newTweet.id, subscriber.id);
        }           
        // Update the user (tweet count)
        datastoreService.put(creator);
        // Create the new tweet and child entities (duplicated subscribers data)
        datastoreService.put(newTweet);         
    }

}
于 2014-03-11T18:16:02.973 回答