In the course of the last weeks, Android's NSD implementation is making me go mad:
From a user's point of view, the following problems arise:
Devices discover each other in a completely non-deterministic fashion. If I launch my
NsdManager
-based app, it works more or less if only two devices are involved. If a third device joins, it will seldomly discover the first two, and the first two will not see the third. If I exit the apps (they gracefully unregister the NSD listeners) and restart them in a different order, the discovery pattern is not exactly the same, but similar.In my home network, the IP resolution of the discovered devices basically works as expected. At work, sometimes even when only using two devices (A and B), device A will resolve device B's service with A's IP address and B's port and vice-versa. So somehow the IP address and service name seem to get mixed up at a lower level (probably the NsdManager).
I've now filed a bug report on Google code for this (https://code.google.com/p/android/issues/detail?id=201314&thanks=201314&ts=1455814995). I'm posting this here too in hope to get more feedback; maybe I got something in my Nsd helper class wrong.
First of all, after endless debugging, I have now found hints in the logcat that Android's underlying NsdService
itself might be malfunctioning, while MDnsDS seems to work correctly. But I'm unsure...
Here's some log output that illustrates the problem (some messages filtered for readability):
02-18 16:57:02.327: D/NsdService(628): startMDnsDaemon
02-18 16:57:02.327: D/MDnsDS(187): Starting MDNSD
02-18 16:57:02.529: D/NsdService(628): New client listening to asynchronous messages
02-18 16:57:02.529: D/NsdService(628): New client, channel: com.android.internal.util.AsyncChannel@1fa188ce messenger: android.os.Messenger@cca33ef
02-18 16:57:02.532: D/NsdService(628): Register service
02-18 16:57:02.532: D/NsdService(628): registerService: 106 name: TuSync-0.57392, type: _tusync._tcp., host: /::, port: 57392
02-18 16:57:02.533: D/MDnsDS(187): serviceRegister(106, (null), TuSync-0.57392, _tusync._tcp., (null), (null), 57392, 0, <binary>)
02-18 16:57:02.533: D/MDnsDS(187): serviceRegister successful
02-18 16:57:02.534: D/NsdService(628): Register 1 106
02-18 16:57:04.083: D/MDnsDS(187): register succeeded for 106 as TuSync-0.57392
02-18 16:57:04.087: D/NsdService(628): SERVICE_REGISTERED Raw: 606 106 "TuSync-0.57392"
02-18 16:57:04.109: D/NsdService(628): Discover services
02-18 16:57:04.109: D/NsdService(628): discoverServices: 107 _tusync._tcp.
02-18 16:57:04.110: D/MDnsDS(187): discover((null), _tusync._tcp., (null), 107, 0)
02-18 16:57:04.110: D/MDnsDS(187): discover successful
02-18 16:57:04.110: D/NsdService(628): Discover 2 107_tusync._tcp.
02-18 16:57:04.333: D/MDnsDS(187): Discover found new serviceName TuSync-0.57392, regType _tusync._tcp. and domain local. for 107
02-18 16:57:04.334: D/NsdService(628): SERVICE_FOUND Raw: 603 107 "TuSync-0.57392" _tusync._tcp. local.
02-18 16:57:04.338: D/NsdService(628): Resolve service
02-18 16:57:04.338: D/NsdService(628): resolveService: 108 name: TuSync-0.57392, type: _tusync._tcp., host: null, port: 0
02-18 16:57:04.339: D/MDnsDS(187): resolveService(108, (null), TuSync-0.57392, _tusync._tcp., local.)
02-18 16:57:04.345: D/MDnsDS(187): startMonitoring 108
02-18 16:57:04.345: D/MDnsDS(187): resolveService successful
02-18 16:57:04.346: D/MDnsDS(187): resolve succeeded for 108 finding TuSync-0\.57392._tusync._tcp.local. at Android-3.local.:57392 with txtLen 1
02-18 16:57:04.347: D/NsdService(628): SERVICE_RESOLVED Raw: 608 108 "TuSync-0\\.57392._tusync._tcp.local." "Android-3.local." 57392 1
02-18 16:57:04.347: D/NsdService(628): stopResolveService: 108
02-18 16:57:04.347: D/MDnsDS(187): Stopping resolve with ref 0xb5c4734c
02-18 16:57:04.349: D/NsdService(628): getAdddrInfo: 109
02-18 16:57:04.349: D/MDnsDS(187): getAddrInfo(109, (null) 0, Android-3.local.)
02-18 16:57:04.350: D/MDnsDS(187): getAddrInfo successful
02-18 16:57:04.352: D/MDnsDS(187): getAddrInfo succeeded for 109: 109 "Android-3.local." 120 10.0.0.4
02-18 16:57:04.352: D/MDnsDS(187): getAddrInfo succeeded for 109: 109 "Android-3.local." 120 fe80::204:4bff:fe2c:6c87
02-18 16:57:04.354: D/NsdService(628): SERVICE_GET_ADDR_SUCCESS Raw: 612 109 "Android-3.local." 120 10.0.0.4
02-18 16:57:04.354: D/NsdService(628): stopGetAdddrInfo: 109
02-18 16:57:04.355: D/MDnsDS(187): Stopping getaddrinfo with ref 0xb5c472d4
02-18 16:57:04.364: E/NsdService(628): Unique id with no client mapping: 109
02-18 16:57:04.364: E/NsdService(628): Unhandled { when=-10ms what=393242 obj=com.android.server.NsdService$NativeEvent@86af300 target=com.android.internal.util.StateMachine$SmHandler }
02-18 16:57:04.627: D/MDnsDS(187): Discover found new serviceName TuSync-0.36230, regType _tusync._tcp. and domain local. for 107
02-18 16:57:04.632: D/MDnsDS(187): Discover found new serviceName TuSync-0.60493, regType _tusync._tcp. and domain local. for 107
02-18 16:57:04.633: D/NsdService(628): SERVICE_FOUND Raw: 603 107 "TuSync-0.36230" _tusync._tcp. local.
02-18 16:57:04.634: D/NsdService(628): SERVICE_FOUND Raw: 603 107 "TuSync-0.60493" _tusync._tcp. local.
02-18 16:57:04.635: D/NsdService(628): Resolve service
02-18 16:57:04.635: D/NsdService(628): resolveService: 110 name: TuSync-0.36230, type: _tusync._tcp., host: null, port: 0
02-18 16:57:04.636: D/MDnsDS(187): resolveService(110, (null), TuSync-0.36230, _tusync._tcp., local.)
02-18 16:57:04.637: D/MDnsDS(187): resolve succeeded for 110 finding TuSync-0\.36230._tusync._tcp.local. at Android.local.:36230 with txtLen 1
02-18 16:57:04.638: D/NsdService(628): Resolve service
02-18 16:57:04.638: D/NsdService(628): SERVICE_RESOLVED Raw: 608 110 "TuSync-0\\.36230._tusync._tcp.local." "Android.local." 36230 1
02-18 16:57:04.639: D/NsdService(628): stopResolveService: 110
02-18 16:57:04.639: D/MDnsDS(187): Stopping resolve with ref 0xb5c473c4
02-18 16:57:04.643: D/MDnsDS(187): getAddrInfo succeeded for 111: 111 "Android.local." 120 10.0.0.5
02-18 16:57:04.643: D/MDnsDS(187): getAddrInfo succeeded for 111: 111 "Android.local." 120 fe80::204:4bff:fe26:8483
02-18 16:57:04.644: D/NsdService(628): SERVICE_GET_ADDR_SUCCESS Raw: 612 111 "Android.local." 120 10.0.0.5
02-18 16:57:04.644: D/NsdService(628): stopGetAdddrInfo: 111
02-18 16:57:04.645: D/MDnsDS(187): Stopping getaddrinfo with ref 0xb5c47364
02-18 16:57:04.645: D/MDnsDS(187): Going to poll with pollCount 3
02-18 16:57:04.658: E/NsdService(628): Unique id with no client mapping: 111
02-18 16:57:04.658: E/NsdService(628): Unhandled { when=-14ms what=393242 obj=com.android.server.NsdService$NativeEvent@1d93a739 target=com.android.internal.util.StateMachine$SmHandler }
Some notes on the context:
- My NSD service type is _tusync._tcp.
- I create unique service names for all nodes in the format TuSync-0.[local port number] to prevent naming conflicts and ease debugging.
- In this test scenario, there are three devices. The logging device's IP is 10.0.0.4, port 57392.
The log shows, that the underlying MDnsDS
daemon correctly discovers and resolves all nodes. However, the NsdService
above does not propagate the resolution for all of them. There seems to be an ID conflict at 16:57:04.627, where both of the device's peers (TuSync-0.36230 and TuSync-0.60493) get assigned an internal ID of 107 (if I interpret the mechanisms correctly just by looking at the logs). The discoveryListener
I registered with the NsdManager
is notified upon the discovery of both nodes, however, resolving only works for one of them, the other triggers an error:
02-18 16:57:04.638: E/NsdHelper(6370): Resolve failed with error code:
3. Service: name: TuSync-0.60493, type: _tusync._tcp., host: null, port: 0
I have also experienced additional cases where, after the NsdService
emits a "SERVICE_FOUND Raw" message in the logs, my discovery listener is not notified. An exemplary log (heavily filtered; same test setup as above):
02-18 17:54:06.692: D/MDnsDS(187): Starting MDNSD
02-18 17:54:06.896: D/NsdService(628): registerService: 112 name: TuSync-0.57392, type: _tusync._tcp., host: /::, port: 57392
02-18 17:54:06.896: D/MDnsDS(187): serviceRegister(112, (null), TuSync-0.57392, _tusync._tcp., (null), (null), 57392, 0, <binary>)
02-18 17:54:06.896: D/MDnsDS(187): serviceRegister successful
02-18 17:54:08.802: D/NsdService(628): SERVICE_REGISTERED Raw: 606 112 "TuSync-0.57392"
02-18 17:54:08.820: D/NsdService(628): Discover services
02-18 17:54:09.050: D/MDnsDS(187): Discover found new serviceName TuSync-0.57392, regType _tusync._tcp. and domain local. for 113
02-18 17:54:09.050: D/NsdService(628): SERVICE_FOUND Raw: 603 113 "TuSync-0.57392" _tusync._tcp. local.
02-18 17:54:09.211: D/MDnsDS(187): Discover found new serviceName TuSync-0.60493, regType _tusync._tcp. and domain local. for 113
02-18 17:54:09.212: D/NsdService(628): SERVICE_FOUND Raw: 603 113 "TuSync-0.60493" _tusync._tcp. local.
02-18 17:54:09.215: D/NsdService(628): resolveService: 116 name: TuSync-0.60493, type: _tusync._tcp., host: null, port: 0
02-18 17:54:09.216: D/MDnsDS(187): resolveService(116, (null), TuSync-0.60493, _tusync._tcp., local.)
02-18 17:54:09.217: D/MDnsDS(187): resolve succeeded for 116 finding TuSync-0\.60493._tusync._tcp.local. at Android-2.local.:60493 with txtLen 1
02-18 17:54:09.219: D/NsdService(628): SERVICE_RESOLVED Raw: 608 116 "TuSync-0\\.60493._tusync._tcp.local." "Android-2.local." 60493 1
02-18 17:54:09.228: D/MDnsDS(187): getAddrInfo succeeded for 117: 117 "Android-2.local." 120 10.0.0.6
02-18 17:54:09.228: D/MDnsDS(187): getAddrInfo succeeded for 117: 117 "Android-2.local." 120 fe80::c643:8fff:fec5:5648
02-18 17:54:09.229: D/NsdService(628): SERVICE_GET_ADDR_SUCCESS Raw: 612 117 "Android-2.local." 120 10.0.0.6
02-18 17:54:09.244: D/MDnsDS(187): Discover found new serviceName TuSync-0.36230, regType _tusync._tcp. and domain local. for 113
02-18 17:54:09.251: E/NsdService(628): Unique id with no client mapping: 117
02-18 17:54:09.251: E/NsdService(628): Unhandled { when=-22ms what=393242 obj=com.android.server.NsdService$NativeEvent@1e992653 target=com.android.internal.util.StateMachine$SmHandler }
02-18 17:54:09.255: D/NsdService(628): SERVICE_FOUND Raw: 603 113 "TuSync-0.36230" _tusync._tcp. local.
In this case, the discovered peer 10.0.0.5 (port 36230) triggers no discoveryListener notification. After the last log message, nothing happens. So my logging node 10.0.0.4 only discovered one other peer, 10.0.0.6:60493.
The low amount of similar bug reports makes me wonder if I'm the only one with these problems or if the NsdManager is completely unstable and no-one uses it anyway?
For reference, here is the code of my helper class - it is similar to the Android NSD chat tutorial, but I tried to improve it because of some other bugs the tutorial seems to provoke.
public final class NsdHelper {
public static final String TAG = "NsdHelper";
private final Context mContext;
private final NsdManager mNsdManager;
private final String mBaseServiceName; // Base component of the service name, e.g. "service_xy"
private String mServiceName; // Service name of the local node, may be updated upon peer detection with service name conflicts, e.g. to "service_xy (2)"
private final String mServiceType;
private final NsdHandler mNsdHandler;
private MyRegistrationListener mRegistrationListener;
private final Object mRegistrationLock = new Object();
private MyDiscoveryListener mDiscoveryListener;
private final Object mDiscoveryLock = new Object();
private final Object mResolveLock = new Object();
private final Semaphore mResolveSemaphore;
public NsdHelper(Context context, String baseServiceName, String serviceName, String serviceType, NsdHandler nsdHandler) {
mContext = context;
mNsdManager = (NsdManager) context.getSystemService(Context.NSD_SERVICE);
mNsdHandler = nsdHandler;
mBaseServiceName = baseServiceName;
mServiceName = serviceName;
mServiceType = serviceType;
mResolveSemaphore = new Semaphore(10, true);
}
/*********************
* Lifecycle methods *
*********************/
public void registerLocalService(final int port) {
NsdServiceInfo localServiceInfo = new NsdServiceInfo();
localServiceInfo.setServiceName(mServiceName);
localServiceInfo.setServiceType(mServiceType);
localServiceInfo.setPort(port);
synchronized (mRegistrationLock) {
if (mRegistrationListener == null) {
mRegistrationListener = new MyRegistrationListener();
// try {
mNsdManager.registerService(
localServiceInfo, NsdManager.PROTOCOL_DNS_SD, mRegistrationListener);
/*} catch (Exception e) {
MLog.e(TAG, "Exception registering service; trying to unregister.", e);
unregisterLocalService();
mNsdHandler.onRegistrationFailed(localServiceInfo, 0);
}*/
} else {
MLog.w(TAG, "registerLocalService called while service registration already in progress or service already registered.");
}
}
}
public void unregisterLocalService() {
synchronized (mRegistrationLock) {
if (mRegistrationListener != null) {
// try {
mNsdManager.unregisterService(mRegistrationListener);
/*} catch (IllegalArgumentException e) {
MLog.w(TAG, "Exception trying to unregister registrationListener.");
}*/
mRegistrationListener = null;
} else {
MLog.w(TAG, "unregisterLocalService called while service not yet registered or already unregistered.");
}
}
}
public void startDiscovery() {
synchronized(mDiscoveryLock) {
if(mDiscoveryListener == null) {
mDiscoveryListener = new MyDiscoveryListener();
mNsdManager.discoverServices(
mServiceType, NsdManager.PROTOCOL_DNS_SD, mDiscoveryListener);
} else {
MLog.w(TAG, "StartDiscovery called while discovery is already in progress.");
}
}
}
public void stopDiscovery() {
synchronized (mDiscoveryLock) {
if (mDiscoveryListener != null) {
mNsdManager.stopServiceDiscovery(mDiscoveryListener);
mDiscoveryListener = null;
} else {
MLog.w(TAG, "StopDiscovery called while no discovery is in progress.");
}
}
}
public void tearDown() {
MLog.v(TAG, "NsdHelper: tearDown()");
stopDiscovery();
unregisterLocalService(); // TODO this causes an exception, when the listener is already unregistered
}
/**
* Returns the current service name of the service.
* @return
*/
public String getServiceName() {
return mServiceName;
}
/**
* Convenience method to initiate service resolution
* @param serviceInfo NsdServiceInfo object for the service to be resolved
*/
private void resolveService(NsdServiceInfo serviceInfo) {
try {
MLog.vv(TAG, "Resolving service: acquiring semaphore.");
mResolveSemaphore.acquire();
MLog.vv(TAG, "Resolving service: semaphore acquired.");
} catch (InterruptedException e) {
MLog.w(TAG, "resolveService: Waiting for acquisition of semaphore interrupted.");
}
mNsdManager.resolveService(serviceInfo, new MyResolveListener(serviceInfo.getServiceName()));
}
/*************
* Listeners *
*************/
private class MyDiscoveryListener implements NsdManager.DiscoveryListener {
@Override
public void onDiscoveryStarted(String regType) {
MLog.d(TAG, "Service discovery started");
mNsdHandler.onDiscoveryStarted();
}
@Override
public void onServiceFound(NsdServiceInfo serviceInfo) {
MLog.d(TAG, "Discovered service: " + serviceInfo);
// Protocol matches?
if (!serviceInfo.getServiceType().equals(mServiceType)) {
MLog.v(TAG, "Discovered: other serviceType: " + serviceInfo.getServiceType());
}
// Make sure, that service name matches, and just resolve remote host
else if (serviceInfo.getServiceName().contains(mBaseServiceName)){
MLog.d(TAG, "Discovered: correct serviceType: " + mBaseServiceName);
resolveService(serviceInfo);
}
else {
// Other service name, log anyway
MLog.d(TAG, "Discovered: service with different serviceName: " + serviceInfo.getServiceName() + ". Ignoring.");
}
}
@Override
public void onServiceLost(NsdServiceInfo service) {
MLog.e(TAG, "Service lost: " + service);
mNsdHandler.onRemotePeerLost(service);
}
@Override
public void onDiscoveryStopped(String serviceType) {
MLog.v(TAG, "Discovery stopped: " + serviceType);
mNsdHandler.onDiscoveryStopped();
}
@Override
public void onStartDiscoveryFailed(String serviceType, int errorCode) {
MLog.e(TAG, "Discovery starting failed. Error code: " + errorCode);
synchronized (mDiscoveryLock) {
mDiscoveryListener = null; // just throw away the discovery listener, explicit stopping of the discovery should not be needed according to
// https://code.google.com/p/android/issues/detail?id=99510&q=nsd&colspec=ID%20Type%20Status%20Owner%20Summary%20Stars
}
}
@Override
public void onStopDiscoveryFailed(String serviceType, int errorCode) {
MLog.e(TAG, "Discovery stopping failed. Error code: " + errorCode);
// try again
// mNsdManager.stopServiceDiscovery(this); // This should not be needed according to https://code.google.com/p/android/issues/detail?id=99510&q=nsd&colspec=ID%20Type%20Status%20Owner%20Summary%20Stars
}
};
private class MyRegistrationListener implements NsdManager.RegistrationListener {
@Override
public void onServiceRegistered(NsdServiceInfo nsdServiceInfo) {
MLog.d(TAG, "Service registered. NsdServiceInfo: " + nsdServiceInfo);
boolean nameChanged = false;
// Update service name of this node (might change due to automatic conflict resolution!)
if(!mServiceName.equals(nsdServiceInfo.getServiceName())){
mServiceName = nsdServiceInfo.getServiceName();
nameChanged = true;
MLog.d(TAG, "Local service name updated to: " + mServiceName);
}
// Notify
if (mNsdHandler != null) {
mNsdHandler.onRegistrationSuccess(nsdServiceInfo);
if (nameChanged) {
mNsdHandler.onLocalServiceNameChanged(mServiceName);
}
} else {
MLog.w(TAG, "onServiceRegistered: NsdHandler is null.");
}
}
@Override
public void onRegistrationFailed(NsdServiceInfo arg0, int arg1) {
MLog.w(TAG, "Service registration failed with error code " + arg1 + ".");
if (mNsdHandler == null) {
MLog.w(TAG, "onRegistrationFailed: NsdHandler is null.");
return;
}
mNsdHandler.onRegistrationFailed(arg0, arg1);
}
@Override
public void onServiceUnregistered(NsdServiceInfo arg0) {
MLog.d(TAG, "Service unregistered.");
if (mNsdHandler == null) {
MLog.w(TAG, "onServiceUnRegistered: NsdHandler is null.");
return;
}
}
@Override
public void onUnregistrationFailed(NsdServiceInfo serviceInfo, int errorCode) {
MLog.w(TAG, "Service unregistering failed.");
if (mNsdHandler == null) {
MLog.w(TAG, "onUnRegistrationFailed: NsdHandler is null.");
return;
}
}
};
private class MyResolveListener implements NsdManager.ResolveListener {
private final String mServiceName;
public MyResolveListener(String serviceName) {
mServiceName = serviceName;
}
@Override
public void onResolveFailed(final NsdServiceInfo serviceInfo, int errorCode) {
// Release resource
mResolveSemaphore.release();
MLog.e(TAG, "Resolve failed with error code: " + errorCode + ". Service: " + serviceInfo);
if((serviceInfo.getServiceName() != null) && (!serviceInfo.getServiceName().equals(mServiceName))) {
MLog.e(TAG, "Service name changed: " + mServiceName + " => " + serviceInfo.getServiceName());
}
}
@Override
public void onServiceResolved(final NsdServiceInfo serviceInfo) {
// Release resource
mResolveSemaphore.release();
MLog.v(TAG, "Resolve succeeded. Service: " + serviceInfo + ", Address: " + serviceInfo.getHost().getHostAddress() + ":" + serviceInfo.getPort());
if((serviceInfo.getServiceName() != null) && (!serviceInfo.getServiceName().equals(mServiceName))) {
MLog.w(TAG, "Service name changed: " + mServiceName + " => " + serviceInfo.getServiceName());
}
mNsdHandler.onNewRemotePeerResolved(serviceInfo);
}
};
/**
* Interface for handlers that deal just with essential NSD events.
* @author Alexander Fischl (alexander.fischl@semeion.net)
*/
public interface NsdHandler {
/**
* Called, when the NSD manager registered the service successfully.
* @param nsdServiceInfo
*/
public void onRegistrationSuccess(final NsdServiceInfo nsdServiceInfo);
/**
* Called, when the NSD registration was unsuccessful.
*/
public void onRegistrationFailed(final NsdServiceInfo nsdServiceInfo, final int errorCode);
/**
* Called, when the NSD manager discovers a new peer. Services registered on the
* local machine DO NOT trigger this call!
* @param nsdServiceInfo
*/
public void onNewRemotePeerDiscovered(final NsdServiceInfo nsdServiceInfo);
/**
* Called, when the NSD manager resolves a new peer, yielding the connection data.
* Services registered on the local machine DO NOT trigger this call!
* @param nsdServiceInfo
*/
public void onNewRemotePeerResolved(final NsdServiceInfo nsdServiceInfo);
/**
* Called, when the NSD manager loses an already discovered peer.
* @param nsdServiceInfo
*/
public void onRemotePeerLost(final NsdServiceInfo nsdServiceInfo);
/**
* Called, when the local service name needs to be updated (e.g. due to
* conflict resolution when the local service is registered, and the chosen service
* name is already taken by another node in the network.)
* @param newLocalServiceName
*/
public void onLocalServiceNameChanged(String newLocalServiceName);
/**
* Called, when the service discovery has successfully started.
*/
public void onDiscoveryStarted();
/**
* Called, when the service discovery was halted.
*/
public void onDiscoveryStopped();
}
}
Note, that I even implemented a semaphore that can be set to 1 to prevent resolving multiple services in parallel, as somebody else reported problems with parallel resolution. However, setting it to 1 does not work, as sometimes the ongoing resolution does neither succeed nor fail; which causes the semaphore not to be released and the NsdManager thread to get permanently stuck at the next resolve request.
Is anyone else experiencing such problems? I'd be glad if also people who are successfully employing the NsdManager would comment - that would at least mean I'm facing a problem I can fix :)
I already considered giving up on NSD and implementing my own broadcast/multicast discovery mechanism. This could theoretically be a breeze, but I've read that multicast on Android is a PITA too, because of some devices preventing it...