When my organization went through the Office 365 Wave 15 upgrade process, we worked closely with our Cloud Vantage Service Delivery Manager. We met on a regular basis, and our SDM fed us a constant stream of updates regarding the pending upgrade. These updates included information Microsoft had learned from previous tenant upgrades, as well as known issues we needed to prepare for.
One of the big things that we discussed at great length was the need to have “autodiscover” DNS records in place. This was very important because mailboxes would be moved to different mail servers in the Office 365 cloud as they migrated to Wave 15. If the “autodiscover” records weren’t in place, the Outlook clients wouldn’t be able to find the new mailbox servers for our users.
We made sure our “autodiscover” records were in place.
During and after the migration, we started having connectivity issues. For some users, Outlook would launch and then show “Connecting…” in the status bar. For some, waiting a few minutes seemed to allow Outlook to get its wits about it and start working. For others, we had to do a mailbox repair. Some even had to blow away their Outlook profile and create it fresh.
We opened a SevA support ticket with Microsoft through our Cloud Vantage contract and started working with an engineer to determine what might be going on. The engineer was very responsive and worked with us to gather additional information. At first we were told that there must be a problem with one of the cloud client access servers and that a priority ticket had been opened with the operations group at Microsoft. After a day or two of waiting, we heard that it really wasn’t a problem with the client access servers and that the support group needed additional logging information. The engineer gave me a document on how to enable Outlook logging and also how to enable tracing at the OS level. Unfortunately, the OS level tracing would only work on Windows 7 or better and we are still mostly a Windows XP shop. The day we received this information is also the day our upgrade to Wave 15 completed. It happened to be a Friday and toward the end of the day, reports of connectivity issues had dropped down to almost zero.
We kept the support ticket open with Microsoft over the following weekend, assuming that we would have additional connectivity calls and that we could probably gather additional information for Microsoft. We did lower the ticket to a SevB. No problems were reported over the weekend.
During the weekend, I did some searching on my own. Over on the Office 365 community site, I found a post that talked about the need to change some DNS records to resolve connectivity issues that the poster’s company was experiencing. Curious, I decided to go take a look at our Office 365 management portal to compare the DNS settings listed there to what I had in production. I found something very interesting. The MX record was different, and a new CNAME was listed. Examples are below.
Old MX domain-com.mail.eo.outlook.com.
New MX domain-com.mail.protection.outlook.com
New CNAME msoid.domain.com -> clientconfig.microsoftonline-p.net
So, on Sunday evening, I composed a quick email with this information and sent it out to our Service Delivery Manager and the support engineer. I wanted to see if we had actually missed something during our preparation for Wave 15.
Monday morning, I was told that these new DNS records didn’t really matter and wouldn’t have caused the issues we were seeing during the migration. It was recommended that I go ahead and put these records in place, just to be sure we matched what the Office 365 management portal was showing. I am preparing the necessary change control documentation on our end to make these changes. Incidentally, we haven’t had any real connectivity issues since the upgrade ended.
This, however, is not the end of the story. Today, when I logged into the Office 365 management portal, I noticed that the Service Health Dashboard was showing an issue with Exchange Online. I hadn’t had any reports of problems from our user community, so I clicked into the details to find out what might be going on. I found incident number EX3571 was in progress. Below, I’m pasting in what was the most interesting status update to me. It is dated September 30, 2013 2:32 PM.
Microsoft has received reports of an issue in which some customers served from the Americas are receiving Non-Delivery Reports when sending email. Investigation has determined that a configuration issue with the mail exchanger record (MX record) is causing NDRs. Engineers are continuing to roll back to the last known healthy state to mitigate impact. The known solution at this time is to update the MX record in the Office 365 user interface to: [domain name].mail.protection.outlook.com You can verify your exact MX record via the Office 365 portal. The next update will be provided by September 30th, 2013 at 9:30 PM UTC, or upon service restoration.
Notice the MX record that they recommend switching to. Apparently this change might have been a little more important than was originally conveyed. My organization did not experience any delivery issues today, but this does give me a bit more incentive to get our records up to date.
Going forward, I am going to review the DNS records in the Office 365 management portal on a fairly regular basis. Who knows when something will need to be changed or new records might be required.
I believe my next article will be about the upgrade of Wave 14 SharePoint sites to Wave 15. I ran the upgrade on my own “MySite” this evening and the changes are drastic. I didn’t lose any data, but it took a while to find my old “Personal Documents” document library.