Author Topic: Help! Need regex expression to clean my XMI file of nonprinting characters  (Read 106 times)

Richard Freggi

  • EA User
  • **
  • Posts: 37
  • Karma: +1/-1
    • View Profile
I am working on a large project, and I need to copy/paste text from several sources into element notes and tagged values.  Many sources are MS Word and Powerpoint documents.  This text contains a lot of formatting characters; I tried to clean them out by pasting into MS Notepad before pasting onto EA; using Excel CLEAN function, and using Notepad++ to find and replace any weird characters.  Still a lot of junk got thru.  The Sparx spell checker ignores these characters.  Main issue for now is that CSV export displays wrong then imported to MS excel: notes and tagged values fields go to the wrong cells and screw up everything else.  I need top quality CSV export because I use it to make tables/slides for presentations to management; also this is a large long term project.
(I export from EA as tab delimited as that is a character that just should not appear in any of the text.)

My preferred cleanup strategy would be to:
1. Export the whole model to XMI
2. Open the XMI with Notepad++
3. Use the 'search and replace' with regex to replace all this junk with empty strings. 
4. Save the cleaned XMI and import into EA in a new project and delete the old project

I googled around but cannot find the regex expression that will safely leave all XMI that EA needs safely untouched, but take out all the junk characters from notes and tagged values.  I tried several expressions but they either leave several junk characters in or they destroy some of the tabs and line feeds that were legitimate.  Example (hope this displays correctly):
Quote
  –  â
and many more

If anyone can provide to me I would be very grateful!  Thanks!!

p.s. Google also brought up an approach to use unix tr command as: tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-file.  Anyone has an idea if this would be a better strategy?
« Last Edit: November 10, 2017, 03:10:28 am by Richard Freggi »

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 7671
  • Karma: +156/-21
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Help! Need regex expression to clean my XMI file of nonprinting characters
« Reply #1 on: November 10, 2017, 04:22:09 am »
I don't think this is the best place to search for regex help.
Have you tried StackOverflow?

Geert

Geert Bellekens

  • EA Guru
  • *****
  • Posts: 7671
  • Karma: +156/-21
  • Make EA work for YOU!
    • View Profile
    • Enterprise Architect Consultant and Value Added Reseller
Re: Help! Need regex expression to clean my XMI file of nonprinting characters
« Reply #2 on: November 10, 2017, 03:14:30 pm »
There nothing you can do on the CSV export part. The implementation in EA is flawed and doesn't deal well with newlines.
What I did in similar circumstances was write a script that exported content right into Excel.

Geert

Richard Freggi

  • EA User
  • **
  • Posts: 37
  • Karma: +1/-1
    • View Profile
Re: Help! Need regex expression to clean my XMI file of nonprinting characters
« Reply #3 on: November 11, 2017, 02:47:03 am »
Geert - thanks.  Spent another few hours today trying to clean and making progress with the regex to get rid of junk.  I'll post when I have something that works reliably.
I still have problems with CSV export handling of new lines in 'Memo' fields, I guess it's an intrinsic limitation of the tool as you say, thanks for pointing it out. 

I'll study up on SQL queries when I have time - should be better than the CSV export utility.