[m365]2024-04-22~8 min read

What a Tenant Migration Actually Looks Like

Nobody posts about the migrations that go smoothly. So here's one that didn't.

I was consolidating two Microsoft 365 tenants after a merger. A few hundred users on each side. The goal: get everyone into a single tenant, preserve their data, and don't lose the CFO's email rules. The user data had been staged over several weeks — mailboxes, SharePoint, OneDrive migrated in batches. The Friday night cutover was the final piece: Active Directory and Azure AD. Move the identity backbone, redirect authentication, validate that everyone can sign in on Monday morning. The migration platform was supposed to handle the orchestration. I'd run test migrations. Everything checked out.

The cutover started at 8pm on a Friday. By 11pm, the platform was dead.

What "dead" actually means

Not "slow." Not "degraded." The migration console simply stopped accepting commands. The vendor confirmed an outage on their side — no ETA. A support ticket was filed somewhere. And then it was just me, my terminal, and a decision: wait for the vendor to come back online, or run the manual contingency I'd built the week before.

I ran the manual plan.

There was one other person on the call with me — the IT lead from the company we'd merged with. We'd built a solid working relationship over the course of the integration, and he'd volunteered to ride out the cutover night with me even though his part was technically done. He'd built that company's entire infrastructure from the ground up, which made him invaluable when things got weird. He didn't know my scripts inside and out — I'd designed the migration tooling — but he knew his environment cold. The AD cutover is tedious: sync users, verify attributes, test sign-in, fix mismatches, repeat. By 1am my brain had stopped working. I hit a wall on a frustrating blocker: old disabled user accounts from years ago, still sitting in the directory with deprecated attributes that nobody had cleaned up, were silently blocking a batch of user cutovers. I was going in circles trying to figure out what was tripping my scripts. He didn't know what was causing the error — my migration tooling was my design — but he knew his environment cold. When I described the accounts involved, he told me what they were: leftovers from before they'd standardised their attribute schema.

The fix wasn't to delete them — you couldn't. Trying to remove them just produced meaningless errors from Intune, the kind of opaque failure Microsoft platforms specialise in. The fix was to temporarily assign them licenses, which forced M365 to reprocess and reset their account state, clearing the deprecated attributes. Microsoft has a way of accumulating legacy quirks across different systems over the years, and sometimes the only way to unstick them is to make the platform re-evaluate an object by giving it something it wasn't expecting. We licensed the accounts, waited ten minutes for the state reset to propagate, and the cutover batch went through clean.

Why the AD cutover is the part that matters

The user data migration had been going fine for weeks. Staged batches. Mailboxes moved. SharePoint sites transferred. OneDrive data synced. All the stuff that shows up on project plans as "Phase 1 through Phase 4 — Complete." The users noticed nothing because the data was already there when they looked.

But none of it matters if they can't sign in. AD/AAD is the keystone: get identity wrong, and every migrated mailbox, every transferred SharePoint document, every Teams channel becomes inaccessible. The data is there. The person just can't reach it.

This is why the platform failure on cutover night was so dangerous. The staged data migration was complete. The only remaining task was the identity cutover — and it had to happen that night, because Monday morning was the hard deadline. If users couldn't authenticate to the new tenant on Monday, every department's workflow would stop.

The anatomy of the manual AD cutover

Here's what I had to do, step by step, as the platform was unavailable.

Attribute mapping. Every user object from the source tenant needs its attributes mapped correctly to the target. UPN, primary SMTP address, proxy addresses, group memberships, manager relationships, custom attributes. I had pre-built a CSV mapping — every source attribute on one side, every target attribute on the other. Validated it against both tenants earlier in the week. This single CSV was the most important file of the entire night.

User sync. For each batch of users, I ran a PowerShell pipeline: read the mapping CSV, build the target user object with correct attributes, set the immutable ID for cross-tenant matching, verify the user appears in the target directory, test sign-in. Each batch took about 20 minutes. Between batches, I validated the previous batch — did the users appear? Are their attributes correct? Can they authenticate? If anything was off, I fixed it before starting the next batch.

Service accounts and shared resources. Users are the visible part. Service accounts, shared mailboxes, room mailboxes, and application identities are the invisible part. Miss one and a line-of-business application breaks. Miss a shared mailbox and a customer service team loses their queue. I had a separate list of non-user identities that needed migration, with their own mapping and validation procedures. These took almost as long as the user migration because each one had unique dependencies and access patterns.

DNS and authentication cutover. The final step: update DNS records so authentication requests flow to the new tenant. This is the point of no return. Do it too early and users who haven't been migrated yet lose access. Do it too late and migrated users can't authenticate reliably. I waited until every user batch was validated before touching DNS. The TTL on the old records meant propagation took about an hour. During that hour, I watched the sign-in logs like a hawk, looking for any authentication failures that might indicate a missed user or bad mapping.

Post-cutover validation. After DNS propagated, I ran the full validation suite: can every user sign in? Are their mailboxes accessible? Are their rules intact? OneDrive? Teams? Each batch took 45 minutes to validate completely. Found three issues: a shared mailbox that silently failed to migrate, a handful of users with incorrect UPN mappings, a Teams channel that appeared migrated but had empty file storage. All fixed before 8am.

The real timeline

  • Weeks 1-3: Staged user data migration — mailboxes, SharePoint, OneDrive — in batches. Users continued working normally.
  • Tuesday: Finalized identity mapping CSV. Ran full validation against both tenants.
  • Wednesday: Built and tested the manual AD cutover PowerShell suite on a pilot group.
  • Thursday: Dry run of the contingency plan. Found and fixed seven edge cases in the attribute mapping.
  • Friday 8pm-11pm: AD/AAD cutover started on the migration platform. Identity sync in progress.
  • Friday 11pm: Platform outage. Switched entirely to manual PowerShell.
  • Saturday 12am-2am: User identity sync, batch by batch. Brain stopped working around 1am. Recovered.
  • Saturday 2am-3am: Service account and shared resource migration.
  • Saturday 3am-4am: DNS cutover. Watched authentication logs during propagation.
  • Saturday 4am-5am: Full validation sweep. Three issues found and fixed.
  • Saturday 5am-6am: Final sign-in tests for all users. All clear.
  • Saturday 6am: Done.

I made eggs at 4am while waiting for DNS propagation. They were fine.

What the post-mortem taught me

The staged data migration lulled everyone into thinking the hard part was over. It wasn't. The staged batches had gone smoothly enough that there was an assumption the cutover night would be routine — a few hours of platform-orchestrated tasks, then done. Nobody expected the platform to fail. I had the contingency only because I'd learned, from previous projects, that something always does.

First: the AD cutover is the actual cutover. Everything else is preparation. You can stage data for weeks. You can test and validate and rehearse. But the moment you redirect authentication, every single user depends on your identity mapping being correct. There's no partial success with AD. Either everyone can sign in, or the migration is a failure.

Second: having someone who knows the legacy environment is irreplaceable. I'd designed the migration. I knew my scripts cold. But I didn't build that company's infrastructure in 2016. He did. When those stale user accounts with deprecated attributes surfaced at 1am, I would have spent hours trying to figure it out from logs and documentation that probably didn't exist. He knew the history. He knew which accounts were safe to touch and which ones weren't. And he knew the weird M365 trick of using a license assignment to force a state reset — the kind of thing you only learn by living with a platform for years. He knew the answer in seconds because he'd been there when it was built. No amount of migration planning replaces institutional knowledge. Find the person who built it. Keep them close during cutover.

Third: the platform handled 80% of the preparatory work well. The remaining 20% — attribute mapping edge cases, service account dependencies, DNS timing, manual validation — those required human judgment. The platform migrates data. You migrate identity. One of those is harder than the other.

The one thing I'd tell anyone doing this

Build the manual AD cutover plan first. Use the platform to accelerate, not to replace. If you can't articulate every step of the identity migration in PowerShell — every attribute, every mapping, every validation — you don't understand it well enough to trust a tool to do it for you. The platform is an accelerator. You are the contingency.

Also: find the person who built the legacy environment and keep them in the room during cutover. You designed the migration. They designed the thing you're migrating. Both perspectives matter at 1am.

[m365][migration][identity]