how to execute python operator to read from example set

April 14, 2018, 7:22 pm

≫ Next: Split Concat data into columns

≪ Previous: Convert IOObjectCllection to Document

I try to run execute python operator to read from my example set before i passing the input into the Keras Model.

but it appear
Exception:Java.lang.illegalArgumentException
Message Smiley Surprised bject type not supported

Can you please help me?

↧

Split Concat data into columns

April 16, 2018, 8:44 am

≫ Next: please check input data and parameters! All my process the same issue.

≪ Previous: how to execute python operator to read from example set

Dear

It is second time I am really disappointed with another operator. See here I splitted my concat data into separate rows it should separate into more than three columns but it only separate into three columns which makes rest of the data left out. I don't know how to overcome this problem please hep me.

Thank you,https://drive.google.com/open?id=1XvGZKqBWT1LDV0kFWDOKecGUbvxDpcWK

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process"><process expanded="true"><operator activated="true" class="read_excel" compatibility="8.0.001" expanded="true" height="68" name="Read Excel" width="90" x="112" y="187"><parameter key="excel_file" value="C:\Users\shahida1\Desktop\language.xlsx"/><parameter key="imported_cell_range" value="A1:B14065"/><parameter key="first_row_as_names" value="false"/><list key="annotations"><parameter key="0" value="Name"/></list><list key="data_set_meta_data_information"><parameter key="0" value="User Sys ID.true.integer.id"/><parameter key="1" value="Language.true.polynominal.attribute"/></list></operator><operator activated="true" class="split" compatibility="8.0.001" expanded="true" height="82" name="Split" width="90" x="246" y="289"><parameter key="attribute_filter_type" value="single"/><parameter key="attribute" value="Language"/><parameter key="attributes" value="Language"/><parameter key="split_pattern" value="[-!&quot;#$%&amp;'()*+,./:;&lt;=&gt;?@\[\\\]_`{|}~]"/></operator><operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="380" y="289"/><operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="136"><parameter key="attribute_filter_type" value="subset"/><parameter key="attributes" value="Language_1|Language_2|Language_3"/></operator><connect from_op="Read Excel" from_port="output" to_op="Split" to_port="example set input"/><connect from_op="Split" from_port="example set output" to_op="Multiply" to_port="input"/><connect from_op="Multiply" from_port="output 1" to_op="Select Attributes" to_port="example set input"/><connect from_op="Multiply" from_port="output 2" to_port="result 2"/><connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="0"/><portSpacing port="sink_result 2" spacing="0"/><portSpacing port="sink_result 3" spacing="0"/></process></operator></process>

↧

please check input data and parameters! All my process the same issue.

April 16, 2018, 11:24 am

≫ Next: How to Segment Text Effectively Using TextTiling with Python 3.x

≪ Previous: Split Concat data into columns

any solutions pls ?

↧

How to Segment Text Effectively Using TextTiling with Python 3.x

April 16, 2018, 6:46 pm

≫ Next: Local trend mining in social networks, Google Trends, etc.

≪ Previous: please check input data and parameters! All my process the same issue.

I have written these line of codes

f = open("1.ref","r")
alice=f.read()
tt = nltk.tokenize.TextTilingTokenizer()
tiles = tt.tokenize(alice[0:2000]) 
print(tiles) #Total text in a single valued list

I need to consider full text.If I omit [0:2000] ,then getting an error

TypeError: slice indices must be integers or None or have an index method

While printing tiles ,I am getting full text.I need to show segmented text.

↧

Local trend mining in social networks, Google Trends, etc.

April 17, 2018, 2:47 am

≫ Next: IOObject is not giving the same result in java as in Rapidminer

≪ Previous: How to Segment Text Effectively Using TextTiling with Python 3.x

Hi RapidMiners,

as part of an internship I am doing I need to search for trending topics and news that are geographically limited. I've already had some success with the Twitter operators, although I get a lot of repeated terms (in my case I was searching in Dortmund, and most tweets were related to Borussia Dortmund; I am not a football fan :@).

I would appreciate any leads that you can give me for Facebook, Google Trends or another source. I've experimented a bit with the Facebook extension, but after some time it stopped delivering results (empty example set). I don't know if it is a bug or not, I can still get results using the Graph Explorer.

I also looked at Google Trends, but it is not as much fine grained as I was hoping: when limiting the search for regions, sometimes the number of results is zero or close to zero.

As I said, I would appreciate your feedback!

Kind regards,

Sebastian

↧

IOObject is not giving the same result in java as in Rapidminer

April 17, 2018, 2:52 am

≫ Next: Sentiment Analysis

≪ Previous: Local trend mining in social networks, Google Trends, etc.

I am trying to use .rpm file in java to get some classification results in my application. The problem is my java program is not giving the same result as in Rapidminer GUI. Instead it turns out that it is only giving attribute names from the result. Following is my my process:

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process"><parameter key="logverbosity" value="init"/><parameter key="random_seed" value="2001"/><parameter key="send_mail" value="never"/><parameter key="notification_email" value=""/><parameter key="process_duration_for_mail" value="30"/><parameter key="encoding" value="SYSTEM"/><process expanded="true"><operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve livestock data(animal att)" width="90" x="45" y="85"><parameter key="repository_entry" value="//Local Repository/livestock data(animal att)"/></operator><operator activated="true" class="concurrency:cross_validation" compatibility="8.1.001" expanded="true" height="145" name="Cross Validation" width="90" x="380" y="187"><parameter key="split_on_batch_attribute" value="false"/><parameter key="leave_one_out" value="true"/><parameter key="number_of_folds" value="2"/><parameter key="sampling_type" value="shuffled sampling"/><parameter key="use_local_random_seed" value="false"/><parameter key="local_random_seed" value="1992"/><parameter key="enable_parallel_execution" value="true"/><process expanded="true"><operator activated="true" class="naive_bayes" compatibility="8.1.001" expanded="true" height="82" name="Naive Bayes" width="90" x="45" y="34"><parameter key="laplace_correction" value="true"/></operator><connect from_port="training set" to_op="Naive Bayes" to_port="training set"/><connect from_op="Naive Bayes" from_port="model" to_port="model"/><portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/><portSpacing port="sink_through 1" spacing="0"/></process><process expanded="true"><operator activated="true" class="apply_model" compatibility="8.1.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34"><list key="application_parameters"/><parameter key="create_view" value="false"/></operator><operator activated="true" class="performance_classification" compatibility="8.1.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"><parameter key="main_criterion" value="first"/><parameter key="accuracy" value="true"/><parameter key="classification_error" value="false"/><parameter key="kappa" value="false"/><parameter key="weighted_mean_recall" value="false"/><parameter key="weighted_mean_precision" value="false"/><parameter key="spearman_rho" value="false"/><parameter key="kendall_tau" value="false"/><parameter key="absolute_error" value="false"/><parameter key="relative_error" value="false"/><parameter key="relative_error_lenient" value="false"/><parameter key="relative_error_strict" value="false"/><parameter key="normalized_absolute_error" value="false"/><parameter key="root_mean_squared_error" value="false"/><parameter key="root_relative_squared_error" value="false"/><parameter key="squared_error" value="false"/><parameter key="correlation" value="false"/><parameter key="squared_correlation" value="false"/><parameter key="cross-entropy" value="false"/><parameter key="margin" value="false"/><parameter key="soft_margin_loss" value="false"/><parameter key="logistic_loss" value="false"/><parameter key="skip_undefined_labels" value="true"/><parameter key="use_example_weights" value="true"/><list key="class_weights"/></operator><connect from_port="model" to_op="Apply Model" to_port="model"/><connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/><connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/><connect from_op="Performance" from_port="performance" to_port="performance 1"/><portSpacing port="source_model" spacing="0"/><portSpacing port="source_test set" spacing="0"/><portSpacing port="source_through 1" spacing="0"/><portSpacing port="sink_test set results" spacing="0"/><portSpacing port="sink_performance 1" spacing="0"/><portSpacing port="sink_performance 2" spacing="0"/></process></operator><connect from_op="Retrieve livestock data(animal att)" from_port="output" to_op="Cross Validation" to_port="example set"/><connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="0"/><portSpacing port="sink_result 2" spacing="0"/><description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="247" y="334">Type your comment</description></process></operator></process>

And this is my java code:

RapidMiner.setExecutionMode(RapidMiner.ExecutionMode.COMMAND_LINE);
RapidMiner.init();
Process pr = new Process(new File("C:\\Users\\Ameer Abdullah\\.RapidMiner\\repositories\\Local Repository\\naivebayesmodel.rmp"));
Operator op = pr.getOperator("Read CSV");
IOContainer container = pr.run();
for (int i = 0; i < container.size(); i++) {
    IOObject ioObject = container.getElementAt(i);   
    System.out.println(i + ioObject.toString());
 }

How can i get the confidence values of each possible label in java?

↧

Sentiment Analysis

April 17, 2018, 3:48 am

≫ Next: I want to loop on couples of attributes and create a new one given by an operation on them

≪ Previous: IOObject is not giving the same result in java as in Rapidminer

We are analyzing customer responses for a consumer product. The data has been pulled from web based responses, all in German, and placed in an Excel file. We are then extracting only the text review column from the Excel file and feeding this into Aylien´s sentiment analysis process. We have 1015 records, not all that much to read, but the system simply goes into a spin cycle and never concludes. We have waited over 40 minutes thinking that perhaps the German text was causing problems, but this is too long and we have tested the setup with a single record of German text and it appears to work fine. We are obviously missing something. Can someone help?

↧

I want to loop on couples of attributes and create a new one given by an operation on them

April 17, 2018, 1:07 pm

≫ Next: I want to loop on couples of attributes and create a new one given by an operation on them

≪ Previous: Sentiment Analysis

Hi everyone! I've a problem on Rapidminer 5.Attached is my dataset.

From the fourth column over i have to generate a new attribute starting from the couple x-1,x-0, y-1,y-0.

For example A2A.MI-1 and A2A.MI-0 have to became a new attribute where I got the quotien between them.

How can i solve? I tried loop on attributed and script but I failed.

Thank you very much

↧

I want to loop on couples of attributes and create a new one given by an operation on them

April 17, 2018, 1:07 pm

≫ Next: extract URL from text

≪ Previous: I want to loop on couples of attributes and create a new one given by an operation on them

Hi everyone! I've a problem on Rapidminer 5.Attached is my dataset.

From the fourth column over i have to generate a new attribute starting from the couple x-1,x-0, y-1,y-0.

For example A2A.MI-1 and A2A.MI-0 have to became a new attribute where I got the quotien between them.

How can i solve? I tried loop on attributed and script but I failed.

Thank you very much

↧

extract URL from text

April 18, 2018, 6:02 am

≫ Next: Order of Performing nested K-fold cross validation

≪ Previous: I want to loop on couples of attributes and create a new one given by an operation on them

Dear community,

I am trying to extract an URL from a text. Not only do I want to parse Twitter posts for mentioned URLs but also other news content.

I then want to feed the get page operator with the URLs - I am fine with this part but I have not made it to extract URLs so far. Tried it with extract information already...

Help is much appreciated!

Thanks,

Julian

↧

Order of Performing nested K-fold cross validation

April 18, 2018, 6:45 am

≫ Next: Web Query, (getting started)

≪ Previous: extract URL from text

I have been looking at the following tutorial on correct model validation:

https://rapidminer.com/resource/correct-model-validation/

I'm looking at the section on contamination through feature selection when doing K-fold cross validation. In the section on Accidental Contamination, near the bottom in example 3), it is suggesting to use nested K-fold validation to search for features in a similar way to that which is being suggested in example 2) for the choice of hyperparameters.

My question is: Is there any best practice around whether to do the nested k-fold validation for feature selection first, then to use the selected features for the nested validation on the hyperparameters, or vice versa? I am imagining it will be too computationally expensive to nest all 3 techniques within one another.

Can anyone advise on this?

Thank you

↧

Web Query, (getting started)

April 18, 2018, 7:19 am

≫ Next: Group misspelled search term data together

≪ Previous: Order of Performing nested K-fold cross validation

Hello,

I am working for a Real Esate and study Arquitecture. Therefore, I know Nothing about programing. I am trying to do web query with lots of data and excel is just not good enough. My question may be simple, but I can´t solve it.

Here we go.

I have all my URLs written in an excel sheet. I Added the .xlsx to the board. checked. Then i use "Get pages". And after I run the process, the web pages appear with the 200 code (checked), everything seems to be OK, but, there is no content lenght.

I'd love to know how to solve this.

Thanks

↧

Group misspelled search term data together

April 18, 2018, 11:30 am

≫ Next: how use lda and lsi and mappclusterin on lable

≪ Previous: Web Query, (getting started)

Hi, I am trying to figure out a way to group similar search terms together that are misspelled.

Ex.

Search Terms Value

Adidas 500

Addidas 70

Adidass 50

Addidas NMD 25

Nike 500

nikke 30

↧

how use lda and lsi and mappclusterin on lable

April 18, 2018, 11:56 pm

≫ Next: DB2 ODBC

≪ Previous: Group misspelled search term data together

Hello
I want to use clustering of the text based on meanings and content and then assess the clusters of these concepts (LSA, LDA, Map clustering on lable), but I do not know how to use the operators?
Does anyone have an example?
Thankful
My looks

↧

DB2 ODBC

April 19, 2018, 1:04 am

≫ Next: reading Facebook open graph into database

≪ Previous: how use lda and lsi and mappclusterin on lable

Good Morning,

I am trying to connect to a DB2 database via ODBC but I keep getting "No suitable driver found" error. Can anyone explain what's the configuration to make it work?

Thanks

↧

reading Facebook open graph into database

April 19, 2018, 5:32 am

≫ Next: .wtf The excel sheet.

≪ Previous: DB2 ODBC

I am wantingto read Facebook opengraph information into a clients DB. I have formatted the input into Facebook and I am processing the recursive requests. The problem I am having is that I am just not able to get the JSON into a format that I can read into the DB. Multple splits and transposes later and I am still no where. Is there an easier way to read this format of code in:

{     "posts":{        "data":[           {              "comments":{                 "data":[                    {                       "created_time":"2018-04-19T06:09:10+0000",                     "from":{                          "name":"User 1",                        "id":"12345"                     },                     "message":"Thank you",                     "id":"12345_12345"                  },                  {                       "created_time":"2018-04-19T06:11:50+0000",                     "from":{                          "name":"User 2",                        "id":"567890"                     },                     "message":"for your help",                     "id":"12345_567890"                  },               ],            }         }      ]   }
}

The format it comes from in the open graph means that there can be sub comments of comments which is where I am fowling up as I end up with a matrix instead of a data table.

↧

.wtf The excel sheet.

April 19, 2018, 8:18 am

≫ Next: running process that was shared

≪ Previous: reading Facebook open graph into database

Here we go... Let's see what happens. I did it in .csv and .htm

↧

running process that was shared

April 19, 2018, 12:47 pm

≫ Next: Keeping some data in column separate from processed data

≪ Previous: .wtf The excel sheet.

here's the scoop, I created a process to group and store to word vector. I also had someone create my second process which is to upload that word vector and run it through X-Means and then convert to an excel file. Few things: When I drag my word vector data and drop into retrieve word vector on this second process, I am getting a problems. Can someone see what is going on and also inform me how I would create a excel file for this data from the process in place. Really appreciated! I will follow this up with the word vector

↧

Keeping some data in column separate from processed data

April 19, 2018, 4:26 pm

≫ Next: Text Mining Create Association Rules

≪ Previous: running process that was shared

I have an ID number associated with each row of text that is being processed. The ID number is in a separate column from the body of text. I need to reference this ID in another program through word search so I don't want it to change. Everytime I run my process and run it through write excel, I lose my unique ID number and it changed to the number row it is in. Any help would be greatly appreciated!

↧

Text Mining Create Association Rules

April 22, 2018, 8:45 am

≫ Next: How to properly connect to mongo atlas (cloud) ?

≪ Previous: Keeping some data in column separate from processed data

Hi experts,

I‘ve X web pages and each web page has an ID. Now I‘d like to compute for each single web page with my sub operator „Word Association“ association rules, so that I can get association rule graphs for each page.

At the moment I only compute association rules over all X web pages.

I‘ve tried to loop my sub operator with Loop Collection, Loop cluster (ID) or a normal loop with a macro (ID). Has maybe someone a hint for me?

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="subprocess" compatibility="8.1.003" expanded="true" height="82" name="Crawler Spon 10 pages" width="90" x="45" y="544">
        <process expanded="true">
          <operator activated="true" class="web:crawl_web_modern" compatibility="7.3.000" expanded="true" height="68" name="Crawl Web (2)" width="90" x="112" y="34">
            <parameter key="url" value="http://www.spiegel.de"/>
            <list key="crawling_rules">
              <parameter key="store_with_matching_url" value=".+www.spiegel.+"/>
              <parameter key="follow_link_with_matching_url" value=".+spiegel.+|.+de.+"/>
            </list>
            <parameter key="max_crawl_depth" value="10"/>
            <parameter key="retrieve_as_html" value="true"/>
            <parameter key="add_content_as_attribute" value="true"/>
            <parameter key="max_pages" value="10"/>
            <parameter key="delay" value="100"/>
            <parameter key="max_concurrent_connections" value="200"/>
            <parameter key="max_connections_per_host" value="100"/>
            <parameter key="user_agent" value="Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0"/>
          </operator>
          <operator activated="true" class="web:retrieve_webpages" compatibility="7.3.000" expanded="true" height="68" name="Get Pages (2)" width="90" x="246" y="34">
            <parameter key="link_attribute" value="Link"/>
            <parameter key="page_attribute" value="link"/>
            <parameter key="random_user_agent" value="true"/>
          </operator>
          <connect from_op="Crawl Web (2)" from_port="example set" to_op="Get Pages (2)" to_port="Example Set"/>
          <connect from_op="Get Pages (2)" from_port="Example Set" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="246" y="544">
        <parameter key="create_word_vector" value="false"/>
        <parameter key="keep_text" value="true"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="web:extract_html_text_content" compatibility="7.3.000" expanded="true" height="68" name="Extract Content" width="90" x="179" y="34">
            <parameter key="ignore_non_html_tags" value="false"/>
          </operator>
          <connect from_port="document" to_op="Extract Content" to_port="document"/>
          <connect from_op="Extract Content" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="68" name="R-Script-Pairwise-Count" width="90" x="514" y="646">
        <parameter key="script" value="library(dplyr)&#10;library(tidytext)&#10;library(widyr)&#10;&#10;rm_main = function(data)&#10;{&#10;korpus &lt;- data_frame(id =data$id, text = data$text)&#10;&#10;print(korpus)&#10;&#10;woerter &lt;- korpus %&gt;%&#10; unnest_tokens(word, text)%&gt;%&#10; group_by(id)%&gt;%&#10; count(word, sort =TRUE)%&gt;%&#10; filter(n&gt;=10)&#10; print(woerter)&#10;woerter &lt;- as.data.table(woerter)&#10;&#10;cooccurre &lt;- korpus %&gt;%&#10;  unnest_tokens(word, text)%&gt;%&#10;  pairwise_count(word, id, sort = TRUE)%&gt;%&#10; # filter(n&gt;=10)&#10; print(cooccurre)&#10;&#10; cooccurre &lt;- as.data.frame(cooccurre)&#10; &#10; return(list(woerter, cooccurre))&#10;}&#10;"/>
      </operator>
      <operator activated="false" class="r_scripting:execute_r" compatibility="8.1.000" expanded="true" height="68" name="R-Script-Bigram" width="90" x="514" y="544">
        <parameter key="script" value="library(dplyr)&#10;library(tidytext)&#10;library(widyr)&#10;&#10;rm_main = function(data)&#10;{&#10;korpus &lt;- data_frame(id =data$id, text = data$text)&#10;&#10;print(korpus)&#10;&#10;woerter &lt;- korpus %&gt;%&#10; unnest_tokens(word, text)%&gt;%&#10; group_by(id)%&gt;%&#10; count(word, sort =TRUE)%&gt;%&#10; filter(n&gt;=10)&#10; print(woerter)&#10;woerter &lt;- as.data.table(woerter)&#10;&#10;cooccurre &lt;- korpus %&gt;%&#10;  unnest_tokens(bigram, text, token= &quot;ngrams&quot;, n= 2)%&gt;%&#10;  count(bigram, sort = TRUE)&#10;  #pairwise_count(word, id, sort = TRUE)%&gt;%&#10; # filter(n&gt;=10)&#10; print(cooccurre)&#10;&#10; cooccurre &lt;- as.data.frame(cooccurre)&#10;&#10; return(list(woerter, cooccurre))&#10;}&#10;"/>
      </operator>
      <operator activated="false" class="retrieve" compatibility="8.1.003" expanded="true" height="68" name="Retrieve 10-Rohseiten-Spiegel" width="90" x="45" y="34">
        <parameter key="repository_entry" value="../data/10-Rohseiten-Spiegel"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="8.1.003" expanded="true" height="124" name="Prepare Data" width="90" x="246" y="34">
        <process expanded="true">
          <operator activated="true" class="set_role" compatibility="8.1.003" expanded="true" height="82" name="Set Role (2)" width="90" x="45" y="34">
            <parameter key="attribute_name" value="text"/>
            <list key="set_additional_roles">
              <parameter key="Title" value="regular"/>
            </list>
          </operator>
          <operator activated="true" class="generate_id" compatibility="8.1.003" expanded="true" height="82" name="Generate ID" width="90" x="45" y="187"/>
          <operator activated="true" class="order_attributes" compatibility="8.1.003" expanded="true" height="82" name="Reorder Attributes" width="90" x="45" y="340">
            <parameter key="attribute_ordering" value="Title|text"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.1.003" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="493">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Title|text"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="8.1.003" expanded="true" height="103" name="Filter Examples" width="90" x="246" y="34">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="Title.is_not_missing."/>
            </list>
            <parameter key="filters_logic_and" value="false"/>
            <parameter key="filters_check_metadata" value="false"/>
          </operator>
          <operator activated="true" class="set_macros" compatibility="8.1.003" expanded="true" height="82" name="Set Macros" width="90" x="246" y="187">
            <list key="macros">
              <parameter key="attribute_id" value="id"/>
            </list>
          </operator>
          <operator activated="true" class="multiply" compatibility="8.1.003" expanded="true" height="103" name="Multiply uncut" width="90" x="380" y="187"/>
          <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="cut in sentences" width="90" x="581" y="34">
            <parameter key="create_word_vector" value="false"/>
            <parameter key="keep_text" value="true"/>
            <list key="specify_weights"/>
            <process expanded="true">
              <operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document" width="90" x="112" y="34">
                <parameter key="query_type" value="Regular Region"/>
                <list key="string_machting_queries"/>
                <list key="regular_expression_queries"/>
                <list key="regular_region_queries">
                  <parameter key="sentences" value="\\\.\\s[A-Z]| \\!\\s[A-Z]|\\?\\s[A-Z].\\\.|\\!|\\?"/>
                </list>
                <list key="xpath_queries"/>
                <list key="namespaces"/>
                <list key="index_queries"/>
                <list key="jsonpath_queries"/>
                <process expanded="true">
                  <connect from_port="segment" to_port="document 1"/>
                  <portSpacing port="source_segment" spacing="0"/>
                  <portSpacing port="sink_document 1" spacing="0"/>
                  <portSpacing port="sink_document 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="document" to_op="Cut Document" to_port="document"/>
              <connect from_op="Cut Document" from_port="documents" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">for r-scripts&lt;br&gt;tidy text&lt;br/&gt;bigram&lt;br/&gt;pairwise count</description>
          </operator>
          <operator activated="true" class="multiply" compatibility="8.1.003" expanded="true" height="103" name="Multiply" width="90" x="782" y="34"/>
          <connect from_port="in 1" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Reorder Attributes" to_port="example set input"/>
          <connect from_op="Reorder Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Set Macros" to_port="through 1"/>
          <connect from_op="Set Macros" from_port="through 1" to_op="Multiply uncut" to_port="input"/>
          <connect from_op="Multiply uncut" from_port="output 1" to_op="cut in sentences" to_port="example set"/>
          <connect from_op="Multiply uncut" from_port="output 2" to_port="out 2"/>
          <connect from_op="cut in sentences" from_port="example set" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_port="out 1"/>
          <connect from_op="Multiply" from_port="output 2" to_port="out 3"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
          <portSpacing port="sink_out 4" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="subprocess" compatibility="8.1.003" expanded="true" height="124" name="RM Co-occurrence (3)" width="90" x="715" y="85">
        <process expanded="true">
          <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (4)" width="90" x="112" y="136">
            <parameter key="prune_method" value="percentual"/>
            <parameter key="prune_below_percent" value="0.01"/>
            <parameter key="prune_above_percent" value="100.0"/>
            <list key="specify_weights"/>
            <process expanded="true">
              <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Non-letters (3)" width="90" x="112" y="34"/>
              <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Linguistic (3)" width="90" x="246" y="34">
                <parameter key="mode" value="linguistic sentences"/>
                <parameter key="language" value="German"/>
              </operator>
              <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (3)" width="90" x="514" y="34">
                <parameter key="min_chars" value="2"/>
              </operator>
              <operator activated="false" class="text:filter_stopwords_german" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (3)" width="90" x="380" y="34"/>
              <operator activated="false" class="text:stem_porter" compatibility="8.1.000" expanded="true" height="68" name="Stem (3)" width="90" x="648" y="34"/>
              <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (3)" width="90" x="782" y="34"/>
              <connect from_port="document" to_op="Tokenize Non-letters (3)" to_port="document"/>
              <connect from_op="Tokenize Non-letters (3)" from_port="document" to_op="Tokenize Linguistic (3)" to_port="document"/>
              <connect from_op="Tokenize Linguistic (3)" from_port="document" to_op="Filter Tokens (3)" to_port="document"/>
              <connect from_op="Filter Tokens (3)" from_port="document" to_op="Transform Cases (3)" to_port="document"/>
              <connect from_op="Transform Cases (3)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="text_to_nominal" compatibility="8.1.003" expanded="true" height="82" name="Text to Nominal (3)" width="90" x="246" y="34"/>
          <operator activated="true" class="numerical_to_binominal" compatibility="8.1.003" expanded="true" height="82" name="Numerical to Binominal (3)" width="90" x="380" y="34"/>
          <operator activated="true" class="fp_growth" compatibility="8.1.003" expanded="true" height="82" name="FP-Growth (3)" width="90" x="514" y="34">
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="min_support" value="0.2"/>
            <parameter key="max_items" value="2"/>
          </operator>
          <operator activated="true" class="create_association_rules" compatibility="8.1.003" expanded="true" height="82" name="Create Association Rules (3)" width="90" x="715" y="136">
            <parameter key="min_confidence" value="0.01"/>
            <parameter key="gain_theta" value="1.0"/>
          </operator>
          <connect from_port="in 1" to_op="Process Documents from Data (4)" to_port="example set"/>
          <connect from_op="Process Documents from Data (4)" from_port="example set" to_op="Text to Nominal (3)" to_port="example set input"/>
          <connect from_op="Process Documents from Data (4)" from_port="word list" to_port="out 3"/>
          <connect from_op="Text to Nominal (3)" from_port="example set output" to_op="Numerical to Binominal (3)" to_port="example set input"/>
          <connect from_op="Numerical to Binominal (3)" from_port="example set output" to_op="FP-Growth (3)" to_port="example set"/>
          <connect from_op="FP-Growth (3)" from_port="example set" to_port="out 1"/>
          <connect from_op="FP-Growth (3)" from_port="frequent sets" to_op="Create Association Rules (3)" to_port="item sets"/>
          <connect from_op="Create Association Rules (3)" from_port="rules" to_port="out 2"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
          <portSpacing port="sink_out 4" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="concurrency:loop" compatibility="8.1.003" expanded="true" height="124" name="Loop" width="90" x="715" y="391">
        <parameter key="number_of_iterations" value="1"/>
        <parameter key="iteration_macro" value="%{attribute_id}"/>
        <parameter key="enable_parallel_execution" value="false"/>
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="8.1.003" expanded="true" height="124" name="RM Co-occurrence (2)" width="90" x="179" y="34">
            <process expanded="true">
              <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (3)" width="90" x="112" y="136">
                <parameter key="prune_method" value="percentual"/>
                <parameter key="prune_below_percent" value="0.01"/>
                <parameter key="prune_above_percent" value="100.0"/>
                <list key="specify_weights"/>
                <process expanded="true">
                  <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Non-letters (2)" width="90" x="112" y="34"/>
                  <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Linguistic (2)" width="90" x="246" y="34">
                    <parameter key="mode" value="linguistic sentences"/>
                    <parameter key="language" value="German"/>
                  </operator>
                  <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="514" y="34">
                    <parameter key="min_chars" value="2"/>
                  </operator>
                  <operator activated="false" class="text:filter_stopwords_german" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="380" y="34"/>
                  <operator activated="false" class="text:stem_porter" compatibility="8.1.000" expanded="true" height="68" name="Stem (2)" width="90" x="648" y="34"/>
                  <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="782" y="34"/>
                  <connect from_port="document" to_op="Tokenize Non-letters (2)" to_port="document"/>
                  <connect from_op="Tokenize Non-letters (2)" from_port="document" to_op="Tokenize Linguistic (2)" to_port="document"/>
                  <connect from_op="Tokenize Linguistic (2)" from_port="document" to_op="Filter Tokens (2)" to_port="document"/>
                  <connect from_op="Filter Tokens (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
                  <connect from_op="Transform Cases (2)" from_port="document" to_port="document 1"/>
                  <portSpacing port="source_document" spacing="0"/>
                  <portSpacing port="sink_document 1" spacing="0"/>
                  <portSpacing port="sink_document 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="text_to_nominal" compatibility="8.1.003" expanded="true" height="82" name="Text to Nominal (2)" width="90" x="246" y="34"/>
              <operator activated="true" class="numerical_to_binominal" compatibility="8.1.003" expanded="true" height="82" name="Numerical to Binominal (2)" width="90" x="380" y="34"/>
              <operator activated="true" class="fp_growth" compatibility="8.1.003" expanded="true" height="82" name="FP-Growth (2)" width="90" x="514" y="34">
                <parameter key="find_min_number_of_itemsets" value="false"/>
                <parameter key="min_support" value="0.2"/>
                <parameter key="max_items" value="2"/>
              </operator>
              <operator activated="true" class="create_association_rules" compatibility="8.1.003" expanded="true" height="82" name="Create Association Rules (2)" width="90" x="715" y="85">
                <parameter key="min_confidence" value="0.01"/>
                <parameter key="gain_theta" value="1.0"/>
              </operator>
              <connect from_port="in 1" to_op="Process Documents from Data (3)" to_port="example set"/>
              <connect from_op="Process Documents from Data (3)" from_port="example set" to_op="Text to Nominal (2)" to_port="example set input"/>
              <connect from_op="Process Documents from Data (3)" from_port="word list" to_port="out 3"/>
              <connect from_op="Text to Nominal (2)" from_port="example set output" to_op="Numerical to Binominal (2)" to_port="example set input"/>
              <connect from_op="Numerical to Binominal (2)" from_port="example set output" to_op="FP-Growth (2)" to_port="example set"/>
              <connect from_op="FP-Growth (2)" from_port="example set" to_port="out 1"/>
              <connect from_op="FP-Growth (2)" from_port="frequent sets" to_op="Create Association Rules (2)" to_port="item sets"/>
              <connect from_op="Create Association Rules (2)" from_port="rules" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
              <portSpacing port="sink_out 4" spacing="0"/>
            </process>
          </operator>
          <connect from_port="input 1" to_op="RM Co-occurrence (2)" to_port="in 1"/>
          <connect from_op="RM Co-occurrence (2)" from_port="out 1" to_port="output 1"/>
          <connect from_op="RM Co-occurrence (2)" from_port="out 2" to_port="output 2"/>
          <connect from_op="RM Co-occurrence (2)" from_port="out 3" to_port="output 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
          <portSpacing port="sink_output 3" spacing="0"/>
          <portSpacing port="sink_output 4" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="collect" compatibility="8.1.003" expanded="true" height="68" name="Collect" width="90" x="514" y="238"/>
      <operator activated="false" class="loop_collection" compatibility="8.1.003" expanded="true" height="124" name="Loop Collection" width="90" x="715" y="238">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="8.1.003" expanded="true" height="124" name="RM Co-occurrence (4)" width="90" x="179" y="34">
            <process expanded="true">
              <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data (6)" width="90" x="112" y="136">
                <parameter key="prune_method" value="percentual"/>
                <parameter key="prune_below_percent" value="0.01"/>
                <parameter key="prune_above_percent" value="100.0"/>
                <list key="specify_weights"/>
                <process expanded="true">
                  <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Non-letters (4)" width="90" x="112" y="34"/>
                  <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize Linguistic (4)" width="90" x="246" y="34">
                    <parameter key="mode" value="linguistic sentences"/>
                    <parameter key="language" value="German"/>
                  </operator>
                  <operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (4)" width="90" x="514" y="34">
                    <parameter key="min_chars" value="2"/>
                  </operator>
                  <operator activated="false" class="text:filter_stopwords_german" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (4)" width="90" x="380" y="34"/>
                  <operator activated="false" class="text:stem_porter" compatibility="8.1.000" expanded="true" height="68" name="Stem (4)" width="90" x="648" y="34"/>
                  <operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases (4)" width="90" x="782" y="34"/>
                  <connect from_port="document" to_op="Tokenize Non-letters (4)" to_port="document"/>
                  <connect from_op="Tokenize Non-letters (4)" from_port="document" to_op="Tokenize Linguistic (4)" to_port="document"/>
                  <connect from_op="Tokenize Linguistic (4)" from_port="document" to_op="Filter Tokens (4)" to_port="document"/>
                  <connect from_op="Filter Tokens (4)" from_port="document" to_op="Transform Cases (4)" to_port="document"/>
                  <connect from_op="Transform Cases (4)" from_port="document" to_port="document 1"/>
                  <portSpacing port="source_document" spacing="0"/>
                  <portSpacing port="sink_document 1" spacing="0"/>
                  <portSpacing port="sink_document 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="text_to_nominal" compatibility="8.1.003" expanded="true" height="82" name="Text to Nominal (5)" width="90" x="246" y="34"/>
              <operator activated="true" class="numerical_to_binominal" compatibility="8.1.003" expanded="true" height="82" name="Numerical to Binominal (5)" width="90" x="380" y="34"/>
              <operator activated="true" class="fp_growth" compatibility="8.1.003" expanded="true" height="82" name="FP-Growth (5)" width="90" x="514" y="34">
                <parameter key="find_min_number_of_itemsets" value="false"/>
                <parameter key="min_support" value="0.2"/>
                <parameter key="max_items" value="2"/>
              </operator>
              <operator activated="true" class="create_association_rules" compatibility="8.1.003" expanded="true" height="82" name="Create Association Rules (5)" width="90" x="715" y="136">
                <parameter key="min_confidence" value="0.01"/>
                <parameter key="gain_theta" value="1.0"/>
              </operator>
              <connect from_port="in 1" to_op="Process Documents from Data (6)" to_port="example set"/>
              <connect from_op="Process Documents from Data (6)" from_port="example set" to_op="Text to Nominal (5)" to_port="example set input"/>
              <connect from_op="Process Documents from Data (6)" from_port="word list" to_port="out 3"/>
              <connect from_op="Text to Nominal (5)" from_port="example set output" to_op="Numerical to Binominal (5)" to_port="example set input"/>
              <connect from_op="Numerical to Binominal (5)" from_port="example set output" to_op="FP-Growth (5)" to_port="example set"/>
              <connect from_op="FP-Growth (5)" from_port="example set" to_port="out 1"/>
              <connect from_op="FP-Growth (5)" from_port="frequent sets" to_op="Create Association Rules (5)" to_port="item sets"/>
              <connect from_op="Create Association Rules (5)" from_port="rules" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
              <portSpacing port="sink_out 4" spacing="0"/>
            </process>
          </operator>
          <connect from_port="single" to_op="RM Co-occurrence (4)" to_port="in 1"/>
          <connect from_op="RM Co-occurrence (4)" from_port="out 1" to_port="output 1"/>
          <connect from_op="RM Co-occurrence (4)" from_port="out 2" to_port="output 2"/>
          <connect from_op="RM Co-occurrence (4)" from_port="out 3" to_port="output 3"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
          <portSpacing port="sink_output 3" spacing="0"/>
          <portSpacing port="sink_output 4" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Crawler Spon 10 pages" from_port="out 1" to_op="Process Documents from Data (2)" to_port="example set"/>
      <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Prepare Data" to_port="in 1"/>
      <connect from_op="Prepare Data" from_port="out 1" to_port="result 1"/>
      <connect from_op="Prepare Data" from_port="out 2" to_op="RM Co-occurrence (3)" to_port="in 1"/>
      <connect from_op="RM Co-occurrence (3)" from_port="out 1" to_port="result 2"/>
      <connect from_op="RM Co-occurrence (3)" from_port="out 2" to_port="result 3"/>
      <connect from_op="RM Co-occurrence (3)" from_port="out 3" to_port="result 4"/>
      <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
      <description align="center" color="yellow" colored="false" height="286" resized="true" width="434" x="10" y="480">Crawler &lt;br/&gt;</description>
      <description align="center" color="yellow" colored="false" height="278" resized="true" width="173" x="477" y="488">R-Scripts&lt;br/&gt;</description>
    </process>
  </operator>
</process>

Kind regards

Tobias

↧